As more of an ex-Python committer, I have little to say about current 
processes. I will make one observation:

On Jan 20, 2017, at 11:45 AM, Victor Stinner <victor.stin...@gmail.com> wrote:
> I introduced a regression in random.Random.seed(): ...

> IMHO the regression is not "catastrophic". Only few developers
> instanciate random.Random themself, random.Random must not be used for
> security, etc. I let others decide if this bug was catastrophic or
> not.

I do not like this comment. I feel like it has narrow understanding of who uses 
Python, of who who depends on this API, and the history of how important it is 
to have a good RNG seed even for non-security.

Scientific programming uses random numbers. A quick grep of the source code on 
my machine shows the following example from Biopython's 
Bio/GA/Mutation/Simple.py (comments and docstrings removed for clarity):

class SinglePositionMutation(object):
    def __init__(self, mutation_rate=0.001):
        self._mutation_rate = mutation_rate
        self._mutation_rand = random.Random()
        self._switch_rand = random.Random()
        self._pos_rand = random.Random()

If I understand the bug correctly, under Python 3.6 under Windows these will 
likely have the same seed. This may (or may not) affect downstream statistics. 
This may (or may not) require a publication to be retracted. This may (or may 
not) be a disaster to the researcher who now has to re-do all of the 
calculations and see if there is a problem.

It's certainly hard to track down all the people who might be affected and beg 
for their forgiveness, which is what it sounds like you will be doing if you 
really believe your development philosophy, and aren't using it as an excuse to 
skimp on testing and coordination. 

Back in the 1990s there were a number of papers showing the importance of 
having a good RNG for physical simulations, even if it isn't cryptographically 
strong, and the importance of having no correlations between multiple RNGs. 
Basically, most programmers aren't good enough to identify bad default RNGs and 
seeds. The general solution put a good-enough default solution in the 
underlying language implementations. Which all modern languages, including 
Python, have done. Well, except now.

I do not think it's relevant to comment that only a few developers call 
Random() directly. The guideline should be the number of people who might be 
affected. There are many more Biopython users than developers, making the 
actual severity many times larger than that developer comment implies.

Finally, there's a long history of small, seemingly orthogonal changes in RNG 
code making large reductions in entropy, such as 
https://en.wikinews.org/wiki/Predictable_random_number_generator_discovered_in_the_Debian_version_of_OpenSSL
 caused by following suggestions made by Valgrind and Purify; the sort that 
might be called "trivial (short, non controversal)". Anyone touching RNG code 
has to be aware that it's a sensitive area, and pay extra attention to testing.

In the bug report at http://bugs.python.org/issue29085 you write "Too bad that 
there is no simple way to write an unit test for that."

Here's a possible test based on the reproducible in the report:

    Random = random.Random
    for i in range(1000):
        rng1 = Random()
        rng2 = Random()
        assert rng1.getrandbits(128) != rng2.getrandbits(128), "RNG 
initialization failed"

(I do not have a Windows box to test this. It passes 500,000 tests on my Mac.)

If the two Random() calls are in the same clock quantum and so get the same 
seed then they will have the same bit pattern. The odds of two independent 
seeds giving the same 128-bit pattern is incredibly low.

For this one bug, I agree with the interpretation that it was handled with a 
cavalier attitude. I don't feel like it's being treated with the seriousness it 
should. As far as I can tell there isn't even a regression to prevent something 
like this from happening again, only a handwaving that such a test is not 
simple (and why does a unit test for an important requirement have to be simple 
in order to be added?).

I've forwarded this Python 3.6 bug to the Biopython developers, at 
https://github.com/biopython/biopython/issues/1044 . I don't think it's a 
serious problem, but don't know enough to be certain. In any case, the likely 
easy fix for them is to reuse the same RNG instead of creating independent 
ones. However, I expect there are other projects which will be more affected by 
this bug.

Cheers,


                                Andrew
                                da...@dalkescientific.com


_______________________________________________
python-committers mailing list
python-committers@python.org
https://mail.python.org/mailman/listinfo/python-committers
Code of Conduct: https://www.python.org/psf/codeofconduct/

Reply via email to