On Sun, Jun 10, 2018 at 11:15 PM, Robert Kern <robert.k...@gmail.com> wrote:
> On Sun, Jun 10, 2018 at 8:11 PM Ralf Gommers <ralf.gomm...@gmail.com> > wrote: > > > > On Sun, Jun 10, 2018 at 5:57 PM, Robert Kern <robert.k...@gmail.com> > wrote: > >> > >> On Sun, Jun 10, 2018 at 5:47 PM Ralf Gommers <ralf.gomm...@gmail.com> > wrote: > >> > > >> > On Sun, Jun 3, 2018 at 9:23 PM, Warren Weckesser < > warren.weckes...@gmail.com> wrote: > >> > >> >> I suspect many of the tests will be easy to update, so fixing 300 or > so tests does not seem like a monumental task. > >> > > >> > It's all not monumental, but it adds up quickly. In addition to > changing tests, one will also need compatibility code when supporting > multiple numpy versions (e.g. scipy when get a copy of RandomStable in > scipy/_lib/_numpy_compat.py). > >> > > >> > A quick count of just np.random.seed occurrences with ``$ grep -roh > --include \*.py np.random.seed . | wc -w`` for some packages: > >> > numpy: 77 > >> > scipy: 462 > >> > matplotlib: 204 > >> > statsmodels: 461 > >> > pymc3: 36 > >> > scikit-image: 63 > >> > scikit-learn: 69 > >> > keras: 46 > >> > pytorch: 0 > >> > tensorflow: 368 > >> > astropy: 24 > >> > > >> > And note, these are *not* incorrect/broken usages, this is code that > works and has done so for years. > >> > >> Yes, some of them are incorrect and broken. Failure can be difficult to > detect. This module from keras is particularly problematic: > >> > >> https://github.com/keras-team/keras-preprocessing/blob/ > master/keras_preprocessing/image.py > > > > You have to appreciate that we're not all thinking at lightning speed > and in the same direction. If there is a difficult to detect problem, it > may be useful to give a brief code example (or even line of reasoning) of > how this actually breaks something. > > Ahem. Sorry. That wasn't the code I was thinking of. It's merely > hazardous, not broken by itself. However, if you used any of the `seed=` > arguments that are helpfully(?) provided, you are almost certainly writing > broken code. If you must use np.random.seed() to get reproducibility, you > need to call it exactly once at the start of your code (or maybe once for > each process) and let it ride. > > This is the impossible-to-use-correctly code that I was thinking of, which > got partially fixed after I pointed out the problem. > > https://github.com/keras-team/keras/pull/8325/files > > The intention of this code is to shuffle two same-length sequences in the > same way. So now if I write my code well to call np.random.seed() once at > the start of my program, this function comes along and obliterates that > with a fixed seed just so it can reuse the seed again to replicate the > shuffle. > Yes, that's a big no-no. There are situations conceivable where a library has to set a seed, but I think the right pattern in that case would be something like old_state = np.random.get_state() np.random.seed(some_int) do_stuff() np.random.set_state(**old._state) > Puzzlingly, the root sin of unconditionally and unavoidably reseeding for > some of these functions is still there even though I showed how and why to > avoid it. This is one reason why I was skeptical that merely documenting > RandomState or StableRandom to only be used for unit tests would work. :-) > Well, no matter what we do, I'm sure that there'll be lots of people who will still get it wrong:) > >> > Conclusion: the current proposal will cause work for the vast > majority of libraries that depends on numpy. The total amount of that work > will certainly not be counted in person-days/weeks, and more likely in > years than months. So I'm not convinced yet that the current proposal is > the best way forward. > >> > >> The mere usage of np.random.seed() doesn't imply that these packages > actually require stream-compatibility. Some might, for sure, like where > they are used in the unit tests, but that's not what you counted. At best, > these numbers just mean that we can't eliminate np.random.seed() in a new > system right away. > > > > Well, mere usage has been called an antipattern (also on your behalf), > plus for scipy over half of the usages do give test failures (Warren's > quick test). So I'd say that counting usages is a decent proxy for the work > that has to be done. > > Sure. But with my new proposal, we don't have to change it (as much as I'd > like to!). I'll draft up a PR to modify my NEP accordingly. > Sounds good! Cheers, Ralf
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion