True that. In fact I know there are several tests that I was not able to change to use the Mersenne RNG since their outcome depends too much on the exact output of an RNG. So you still see some "new Random(1234); // TODO" out there.
This is probably a bigger priority to fix. I would have fixed it, but don't have the expertise. Relevant authors really should look at these TODOs and try to fix the tests. Then it makes fine sense to have the test framework pick a new seed and record it in the logs on every run. Before the above is fixed, it won't do much since those tests will certainly fail on (most?) new seeds. Sean On Tue, Sep 4, 2012 at 5:19 PM, Ted Dunning <[email protected]> wrote: > Tidy, yes. But better, no. > > The Lucene project has made an art out of randomizing configurations for > tests. Thus, the many thousands of people out there doing tests will all > be testing different combinations of things and when a failure happens, > that seed can be codified into the standard tests. > > This is a bit different with some of the randomized tests in Mahout. For > these, there is generally a (weak) statistical guarantee about the result. > For instance, it might be that the test should succeed 99.9% of the time. > To avoid spurious worries, after qualifying the test to fail no more than > expected, the seed is frozen so that things sit still. Most of the errors > that we are after will trigger a hard failure so we don't lose much power > this way and still have stability. > > A good example of this is a random number generator that is supposed to > sample from a particular distribution. If you draw 10,000 deviates from > this generator, you can write a very simple test based on the cumulative > distribution function. Simple that is except for the fact that to put > sharp bounds on the test will cause a non-negligible probability of failure > for a working version of the software. On the other hand, putting loose > bounds will increase the likelihood that the test will succeed if somebody > breaks the code. Increasing the number of samples makes the useful bounds > much tighter and allows lower probability of false success for bad code, > but it increases the test time. > > There is little way around this Heisen-situation. So we freeze the tests. > > There are other types of tests where randomization doesn't change the > guarantees that the code makes whatsoever. This often occurs in tinker-toy > software where you can plug together all kinds of components > interchangeably. That is real different from the random number > distribution problem. > > On Tue, Sep 4, 2012 at 12:26 AM, Sean Owen <[email protected]> wrote: > >> I think this approach is even tidier than just recording the RNG seed >> for later reuse. >>
