Re: [python-committers] My cavalier and aggressive manner, API change and bugs introduced for basically zero benefit

Andrew Dalke Fri, 20 Jan 2017 15:21:07 -0800

As more of an ex-Python committer, I have little to say about current 
processes. I will make one observation:

On Jan 20, 2017, at 11:45 AM, Victor Stinner <victor.stin...@gmail.com> wrote:
> I introduced a regression in random.Random.seed(): ...

> IMHO the regression is not "catastrophic". Only few developers
> instanciate random.Random themself, random.Random must not be used for
> security, etc. I let others decide if this bug was catastrophic or
> not.

I do not like this comment. I feel like it has narrow understanding of who uses
Python, of who who depends on this API, and the history of how important it is
to have a good RNG seed even for non-security.

Scientific programming uses random numbers. A quick grep of the source code on
my machine shows the following example from Biopython's
Bio/GA/Mutation/Simple.py (comments and docstrings removed for clarity):

class SinglePositionMutation(object):
def __init__(self, mutation_rate=0.001):
self._mutation_rate = mutation_rate
self._mutation_rand = random.Random()
self._switch_rand = random.Random()
self._pos_rand = random.Random()

If I understand the bug correctly, under Python 3.6 under Windows these will
likely have the same seed. This may (or may not) affect downstream statistics.
This may (or may not) require a publication to be retracted. This may (or may
not) be a disaster to the researcher who now has to re-do all of the
calculations and see if there is a problem.

It's certainly hard to track down all the people who might be affected and beg
for their forgiveness, which is what it sounds like you will be doing if you
really believe your development philosophy, and aren't using it as an excuse to
skimp on testing and coordination.

Back in the 1990s there were a number of papers showing the importance of
having a good RNG for physical simulations, even if it isn't cryptographically
strong, and the importance of having no correlations between multiple RNGs.
Basically, most programmers aren't good enough to identify bad default RNGs and
seeds. The general solution put a good-enough default solution in the
underlying language implementations. Which all modern languages, including
Python, have done. Well, except now.

I do not think it's relevant to comment that only a few developers call
Random() directly. The guideline should be the number of people who might be
affected. There are many more Biopython users than developers, making the
actual severity many times larger than that developer comment implies.

Finally, there's a long history of small, seemingly orthogonal changes in RNG
code making large reductions in entropy, such as
https://en.wikinews.org/wiki/Predictable_random_number_generator_discovered_in_the_Debian_version_of_OpenSSL
caused by following suggestions made by Valgrind and Purify; the sort that
might be called "trivial (short, non controversal)". Anyone touching RNG code
has to be aware that it's a sensitive area, and pay extra attention to testing.

In the bug report at http://bugs.python.org/issue29085 you write "Too bad that
there is no simple way to write an unit test for that."

Here's a possible test based on the reproducible in the report:

Random = random.Random
for i in range(1000):
rng1 = Random()
rng2 = Random()
assert rng1.getrandbits(128) != rng2.getrandbits(128), "RNG
initialization failed"

(I do not have a Windows box to test this. It passes 500,000 tests on my Mac.)

If the two Random() calls are in the same clock quantum and so get the same
seed then they will have the same bit pattern. The odds of two independent
seeds giving the same 128-bit pattern is incredibly low.

For this one bug, I agree with the interpretation that it was handled with a
cavalier attitude. I don't feel like it's being treated with the seriousness it
should. As far as I can tell there isn't even a regression to prevent something
like this from happening again, only a handwaving that such a test is not
simple (and why does a unit test for an important requirement have to be simple
in order to be added?).

I've forwarded this Python 3.6 bug to the Biopython developers, at
https://github.com/biopython/biopython/issues/1044 . I don't think it's a
serious problem, but don't know enough to be certain. In any case, the likely
easy fix for them is to reuse the same RNG instead of creating independent
ones. However, I expect there are other projects which will be more affected by
this bug.

Cheers,

Andrew
da...@dalkescientific.com

_______________________________________________
python-committers mailing list
python-committers@python.org
https://mail.python.org/mailman/listinfo/python-committers
Code of Conduct: https://www.python.org/psf/codeofconduct/

Re: [python-committers] My cavalier and aggressive manner, API change and bugs introduced for basically zero benefit

Reply via email to