Re: [Numpy-discussion] NEP: Random Number Generator Policy

Ralf Gommers Sun, 10 Jun 2018 17:47:07 -0700

On Sun, Jun 3, 2018 at 9:23 PM, Warren Weckesser <warren.weckes...@gmail.com
> wrote:


>
>
> On Sun, Jun 3, 2018 at 11:20 PM, Ralf Gommers <ralf.gomm...@gmail.com>
> wrote:
>
>>
>>
>> On Sun, Jun 3, 2018 at 6:54 PM, <josef.p...@gmail.com> wrote:
>>
>>>
>>>
>>> On Sun, Jun 3, 2018 at 9:08 PM, Robert Kern <robert.k...@gmail.com>
>>> wrote:
>>>
>>>> On Sun, Jun 3, 2018 at 5:46 PM <josef.p...@gmail.com> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Sun, Jun 3, 2018 at 8:21 PM, Robert Kern <robert.k...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>> The list of ``StableRandom`` methods should be chosen to support unit
>>>>>>> tests:
>>>>>>>
>>>>>>>     * ``.randint()``
>>>>>>>     * ``.uniform()``
>>>>>>>     * ``.normal()``
>>>>>>>     * ``.standard_normal()``
>>>>>>>     * ``.choice()``
>>>>>>>     * ``.shuffle()``
>>>>>>>     * ``.permutation()``
>>>>>>>
>>>>>>
>>>>>> https://github.com/numpy/numpy/pull/11229#discussion_r192604311
>>>>>> @bashtage writes:
>>>>>> > standard_gamma and standard_exponential are important enough to be
>>>>>> included here IMO.
>>>>>>
>>>>>> "Importance" was not my criterion, only whether they are used in unit
>>>>>> test suites. This list was just off the top of my head for methods that I
>>>>>> think were actually used in test suites, so I'd be happy to be shown live
>>>>>> tests that use other methods. I'd like to be a *little* conservative 
>>>>>> about
>>>>>> what methods we stick in here, but we don't have to be *too* 
>>>>>> conservative,
>>>>>> since we are explicitly never going to be modifying these.
>>>>>>
>>>>>
>>>>> That's one area where I thought the selection is too narrow.
>>>>> We should be able to get a stable stream from the uniform for some
>>>>> distributions.
>>>>>
>>>>> However, according to the Wikipedia description Poisson doesn't look
>>>>> easy. I just wrote a unit test for statsmodels using Poisson random 
>>>>> numbers
>>>>> with hard coded numbers for the regression tests.
>>>>>
>>>>
>>>> I'd really rather people do this than use StableRandom; this is best
>>>> practice, as I see it, if your tests involve making precise comparisons to
>>>> expected results.
>>>>
>>>
>>> I hardcoded the results not the random data. So the unit tests rely on a
>>> reproducible stream of Poisson random numbers.
>>> I don't want to save 500 (100 or 1000) observations in a csv file for
>>> every variation of the unit test that I run.
>>>
>>
>> I agree, hardcoding numbers in every place where seeded random numbers
>> are now used is quite unrealistic.
>>
>> It may be worth having a look at test suites for scipy, statsmodels,
>> scikit-learn, etc. and estimate how much work this NEP causes those
>> projects. If the devs of those packages are forced to do large scale
>> migrations from RandomState to StableState, then why not instead keep
>> RandomState and just add a new API next to it?
>>
>>
>
> As a quick and imperfect test, I monkey-patched numpy so that a call to
> numpy.random.seed(m) actually uses m+1000 as the seed.  I ran the tests
> using the `runtests.py` script:
>
> *seed+1000, using 'python runtests.py -n' in the source directory:*
>
>   236 failed, 12881 passed, 1248 skipped, 585 deselected, 84 xfailed, 7
> xpassed
>
>
> Most of the failures are in scipy.stats:
>
> *seed+1000, using 'python runtests.py -n -s stats' in the source
> directory:*
>
>   203 failed, 1034 passed, 4 skipped, 370 deselected, 4 xfailed, 1 xpassed
>
>
> Changing the amount added to the seed or running the tests using the
> function `scipy.test("full")` gives different (but similar magnitude)
> results:
>
> *seed+1000, using 'import scipy; scipy.test("full")' in an ipython shell:*
>
>   269 failed, 13359 passed, 1271 skipped, 134 xfailed, 8 xpassed
>
> *seed+1, using 'python runtests.py -n' in the source directory:*
>
>   305 failed, 12812 passed, 1248 skipped, 585 deselected, 84 xfailed, 7
> xpassed
>
>
> I suspect many of the tests will be easy to update, so fixing 300 or so
> tests does not seem like a monumental task.
>

It's all not monumental, but it adds up quickly. In addition to changing
tests, one will also need compatibility code when supporting multiple numpy
versions (e.g. scipy when get a copy of RandomStable in
scipy/_lib/_numpy_compat.py).

A quick count of just np.random.seed occurrences with ``$ grep -roh
--include \*.py np.random.seed . | wc -w`` for some packages:
numpy: 77
scipy: 462
matplotlib: 204
statsmodels: 461
pymc3: 36
scikit-image: 63
scikit-learn: 69
keras: 46
pytorch: 0
tensorflow: 368
astropy: 24

And note, these are *not* incorrect/broken usages, this is code that works
and has done so for years.

Conclusion: the current proposal will cause work for the vast majority of
libraries that depends on numpy. The total amount of that work will
certainly not be counted in person-days/weeks, and more likely in years
than months. So I'm not convinced yet that the current proposal is the best
way forward.

Ralf


I haven't looked into why there are 585 deselected tests; maybe there are
> many more tests lurking there that will have to be updated.
>
> Warren
>
>
>
> Ralf
>>
>>
>>
>>>
>>>
>>>>
>>>> StableRandom is intended as a crutch so that the pain of moving
>>>> existing unit tests away from the deprecated RandomState is less onerous.
>>>> I'd really rather people write better unit tests!
>>>>
>>>> In particular, I do not want to add any of the integer-domain
>>>> distributions (aside from shuffle/permutation/choice) as these are the ones
>>>> that have the platform-dependency issues with respect to 32/64-bit `long`
>>>> integers. They'd be unreliable for unit tests even if we kept them stable
>>>> over time.
>>>>
>>>>
>>>>> I'm not sure which other distributions are common enough and not
>>>>> easily reproducible by transformation. E.g. negative binomial can be
>>>>> reproduces by a gamma-poisson mixture.
>>>>>
>>>>> On the other hand normal can be easily recreated from standard_normal.
>>>>>
>>>>
>>>> I was mostly motivated by making it a bit easier to mechanically
>>>> replace uses of randn(), which is probably even more common than normal()
>>>> and standard_normal() in unit tests.
>>>>
>>>>
>>>>> Would it be difficult to keep this list large, given that it should be
>>>>> frozen, low maintenance code ?
>>>>>
>>>>
>>>> I admit that I had in mind non-statistical unit tests. That is, tests
>>>> that didn't depend on the precise distribution of the inputs.
>>>>
>>>
>>> The problem is that the unit test in `stats` rely on precise inputs (up
>>> to some numerical noise).
>>> For example p-values themselves are uniformly distributed if the
>>> hypothesis test works correctly. That mean if I don't have control over the
>>> inputs, then my p-value could be anything in (0, 1). So either we need a
>>> real dataset, save all the random numbers in a file or have a reproducible
>>> set of random numbers.
>>>
>>> 95% of the unit tests that I write are for statistics. A large fraction
>>> of them don't rely on the exact distribution, but do rely on a random
>>> numbers that are "good enough".
>>> For example, when writing unit test, then I get every once in a while or
>>> sometimes more often a "bad" stream of random numbers, for which
>>> convergence might fail or where the estimated numbers are far away from the
>>> true numbers, so test tolerance would have to be very high.
>>> If I pick one of the seeds that looks good, then I can have tighter unit
>>> test tolerance to insure results are good in a nice case.
>>>
>>> The problem is that we cannot write robust unit tests for regression
>>> tests without stable inputs.
>>> E.g. I verified my results with a Monte Carlo with 5000 replications and
>>> 1000 Poisson observations in each.
>>> Results look close to expected and won't depend much on the exact stream
>>> of random variables.
>>> But the Monte Carlo for each variant of the test took about 40 seconds.
>>> Doing this for all option combination and dataset specification takes too
>>> long to be feasible in a unit test suite.
>>> So I rely on numpy's stable random numbers and hard code the results for
>>> a specific random sample in the regression unit tests.
>>>
>>> Josef
>>>
>>>
>>>
>>>>
>>>> --
>>>> Robert Kern
>>>>
>>>> _______________________________________________
>>>> NumPy-Discussion mailing list
>>>> NumPy-Discussion@python.org
>>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>>
>>>>
>>>
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@python.org
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>
>>>
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
>>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] NEP: Random Number Generator Policy

Reply via email to