On Sun, Jun 3, 2018 at 11:20 PM, Ralf Gommers <ralf.gomm...@gmail.com> wrote:
> > > On Sun, Jun 3, 2018 at 6:54 PM, <josef.p...@gmail.com> wrote: > >> >> >> On Sun, Jun 3, 2018 at 9:08 PM, Robert Kern <robert.k...@gmail.com> >> wrote: >> >>> On Sun, Jun 3, 2018 at 5:46 PM <josef.p...@gmail.com> wrote: >>> >>>> >>>> >>>> On Sun, Jun 3, 2018 at 8:21 PM, Robert Kern <robert.k...@gmail.com> >>>> wrote: >>>> >>>>> >>>>> The list of ``StableRandom`` methods should be chosen to support unit >>>>>> tests: >>>>>> >>>>>> * ``.randint()`` >>>>>> * ``.uniform()`` >>>>>> * ``.normal()`` >>>>>> * ``.standard_normal()`` >>>>>> * ``.choice()`` >>>>>> * ``.shuffle()`` >>>>>> * ``.permutation()`` >>>>>> >>>>> >>>>> https://github.com/numpy/numpy/pull/11229#discussion_r192604311 >>>>> @bashtage writes: >>>>> > standard_gamma and standard_exponential are important enough to be >>>>> included here IMO. >>>>> >>>>> "Importance" was not my criterion, only whether they are used in unit >>>>> test suites. This list was just off the top of my head for methods that I >>>>> think were actually used in test suites, so I'd be happy to be shown live >>>>> tests that use other methods. I'd like to be a *little* conservative about >>>>> what methods we stick in here, but we don't have to be *too* conservative, >>>>> since we are explicitly never going to be modifying these. >>>>> >>>> >>>> That's one area where I thought the selection is too narrow. >>>> We should be able to get a stable stream from the uniform for some >>>> distributions. >>>> >>>> However, according to the Wikipedia description Poisson doesn't look >>>> easy. I just wrote a unit test for statsmodels using Poisson random numbers >>>> with hard coded numbers for the regression tests. >>>> >>> >>> I'd really rather people do this than use StableRandom; this is best >>> practice, as I see it, if your tests involve making precise comparisons to >>> expected results. >>> >> >> I hardcoded the results not the random data. So the unit tests rely on a >> reproducible stream of Poisson random numbers. >> I don't want to save 500 (100 or 1000) observations in a csv file for >> every variation of the unit test that I run. >> > > I agree, hardcoding numbers in every place where seeded random numbers are > now used is quite unrealistic. > > It may be worth having a look at test suites for scipy, statsmodels, > scikit-learn, etc. and estimate how much work this NEP causes those > projects. If the devs of those packages are forced to do large scale > migrations from RandomState to StableState, then why not instead keep > RandomState and just add a new API next to it? > > As a quick and imperfect test, I monkey-patched numpy so that a call to numpy.random.seed(m) actually uses m+1000 as the seed. I ran the tests using the `runtests.py` script: *seed+1000, using 'python runtests.py -n' in the source directory:* 236 failed, 12881 passed, 1248 skipped, 585 deselected, 84 xfailed, 7 xpassed Most of the failures are in scipy.stats: *seed+1000, using 'python runtests.py -n -s stats' in the source directory:* 203 failed, 1034 passed, 4 skipped, 370 deselected, 4 xfailed, 1 xpassed Changing the amount added to the seed or running the tests using the function `scipy.test("full")` gives different (but similar magnitude) results: *seed+1000, using 'import scipy; scipy.test("full")' in an ipython shell:* 269 failed, 13359 passed, 1271 skipped, 134 xfailed, 8 xpassed *seed+1, using 'python runtests.py -n' in the source directory:* 305 failed, 12812 passed, 1248 skipped, 585 deselected, 84 xfailed, 7 xpassed I suspect many of the tests will be easy to update, so fixing 300 or so tests does not seem like a monumental task. I haven't looked into why there are 585 deselected tests; maybe there are many more tests lurking there that will have to be updated. Warren Ralf > > > >> >> >>> >>> StableRandom is intended as a crutch so that the pain of moving existing >>> unit tests away from the deprecated RandomState is less onerous. I'd really >>> rather people write better unit tests! >>> >>> In particular, I do not want to add any of the integer-domain >>> distributions (aside from shuffle/permutation/choice) as these are the ones >>> that have the platform-dependency issues with respect to 32/64-bit `long` >>> integers. They'd be unreliable for unit tests even if we kept them stable >>> over time. >>> >>> >>>> I'm not sure which other distributions are common enough and not easily >>>> reproducible by transformation. E.g. negative binomial can be reproduces by >>>> a gamma-poisson mixture. >>>> >>>> On the other hand normal can be easily recreated from standard_normal. >>>> >>> >>> I was mostly motivated by making it a bit easier to mechanically replace >>> uses of randn(), which is probably even more common than normal() and >>> standard_normal() in unit tests. >>> >>> >>>> Would it be difficult to keep this list large, given that it should be >>>> frozen, low maintenance code ? >>>> >>> >>> I admit that I had in mind non-statistical unit tests. That is, tests >>> that didn't depend on the precise distribution of the inputs. >>> >> >> The problem is that the unit test in `stats` rely on precise inputs (up >> to some numerical noise). >> For example p-values themselves are uniformly distributed if the >> hypothesis test works correctly. That mean if I don't have control over the >> inputs, then my p-value could be anything in (0, 1). So either we need a >> real dataset, save all the random numbers in a file or have a reproducible >> set of random numbers. >> >> 95% of the unit tests that I write are for statistics. A large fraction >> of them don't rely on the exact distribution, but do rely on a random >> numbers that are "good enough". >> For example, when writing unit test, then I get every once in a while or >> sometimes more often a "bad" stream of random numbers, for which >> convergence might fail or where the estimated numbers are far away from the >> true numbers, so test tolerance would have to be very high. >> If I pick one of the seeds that looks good, then I can have tighter unit >> test tolerance to insure results are good in a nice case. >> >> The problem is that we cannot write robust unit tests for regression >> tests without stable inputs. >> E.g. I verified my results with a Monte Carlo with 5000 replications and >> 1000 Poisson observations in each. >> Results look close to expected and won't depend much on the exact stream >> of random variables. >> But the Monte Carlo for each variant of the test took about 40 seconds. >> Doing this for all option combination and dataset specification takes too >> long to be feasible in a unit test suite. >> So I rely on numpy's stable random numbers and hard code the results for >> a specific random sample in the regression unit tests. >> >> Josef >> >> >> >>> >>> -- >>> Robert Kern >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion@python.org >>> https://mail.python.org/mailman/listinfo/numpy-discussion >>> >>> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion@python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > >
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion