Re: on the prng behind random.random()

2018-11-20 Thread Christian Gollwitzer

Am 19.11.18 um 22:05 schrieb Robert Girault:

Chris Angelico  writes:


On Tue, Nov 20, 2018 at 7:31 AM Robert Girault  wrote:

Nice.  So Python's random.random() does indeed use mt19937.  Since it's
been broken for years, why isn't it replaced by something newer like
ChaCha20?  Is it due to backward compatibility?  That would make sense.


What exactly do you mean by "broken"?


I mean the fact that with 624 samples from the generator, you can
determine the rest of the sequence completely.


As far as I understand it, this is true only if you see the full 32bit 
output number from the Mersenne Twister for those 624 outputs. If 
however you create a list:


[random.randrange(10) for i in range(624)]

I don't think you can predict what follows.

Christian
--
https://mail.python.org/mailman/listinfo/python-list


Re: on the prng behind random.random()

2018-11-19 Thread Dan Sommers

On 11/19/18 6:49 PM, Robert Girault wrote:

> I think I disagree with your take here.  With mt19937, given ANY seed,
> I can eventually predict all the sequence without having to query the
> oracle any further.

Even if that's true, and I use mt19937 inside my program, you don't
[usually|necessarily] have access to the raw output from it.

> If you're just writing a toy software, even K&R PRNG works just fine.
> If you're writing a weather simulation, I suppose you need real
> random-like properties and still need your generator to be reproducible.
> If you're using random Quicksort, you do need unpredictability and
> reproducibility.  If you're writing a crypto application, then you need
> something way stronger.  We need all of them ...

Agreed.  Mostly.  IIRC, though, your question was about *replacing*
mt19937, not adding a new RNG.

Please use the right tool for the job at hand.

> ... But mt19937 is now useful only in toy software.

It has "real random-like" properties (for certain definitions of "real"
and "random-like") and it's reproducible.  Therefore, it's good for
weather simulations, too.
--
https://mail.python.org/mailman/listinfo/python-list


Re: on the prng behind random.random()

2018-11-19 Thread Chris Angelico
On Tue, Nov 20, 2018 at 10:51 AM Robert Girault  wrote:
> If you're just writing a toy software, even K&R PRNG works just fine.
> If you're writing a weather simulation, I suppose you need real
> random-like properties and still need your generator to be reproducible.
> If you're using random Quicksort, you do need unpredictability and
> reproducibility.  If you're writing a crypto application, then you need
> something way stronger.  We need all of them.  But mt19937 is now useful
> only in toy software.

I disagree. Yes, in a crypto-sensitive situation, you can't depend on
the Twister... but you shouldn't be relying on *any* PRNG for that.
There are plenty of situations where you need something unpredictable
but it doesn't have to be THAT safe. Your example of picking a random
pivot for quicksort is a perfect example. Let's suppose I am sorting
by that method... how are you going to get 624 consecutive outputs? If
you can provide a custom comparison function, you can DOS the search
just by making that inefficient. If you can't, how are you going to
reconstruct the randomness? Is this REALLY a viable attack vector?

It's different if, say, you're operating a virtual casino, and letting
people watch the roulette wheel spins. (Though even then,
reconstructing the twister's state from a series of 1-in-38 results
isn't going to be trivial.) But it's overly paranoid to say that every
single PRNG needs to be cryptographically secure.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: on the prng behind random.random()

2018-11-19 Thread Robert Girault
Dennis Lee Bieber  writes:

> On Mon, 19 Nov 2018 19:05:44 -0200, Robert Girault  declaimed
> the following:
>
>>I mean the fact that with 624 samples from the generator, you can
>>determine the rest of the sequence completely.
>
>   Being able to predict the sequence after a large sampling does not mean
> that the /distribution of values/ is not (pseudo-) random.

The problem with determining its sequence is that it might defeat its
purpose.  If you use mt19937 to select a pivot in random Quicksort for
example (where you plan to spend n lg n time in sorting), we can
frustrate your plans and force it into n^2 every time, an effective DoS
attack on your software.

>   After all, pretty much all random number generators will produce the
> same sequence if given the same starting seed... You are, in effect,
> treating your 624 samples as a very large seed...

I think I disagree with your take here.  With mt19937, given ANY seed, I
can eventually predict all the sequence without having to query the
oracle any further.

If you're just writing a toy software, even K&R PRNG works just fine.
If you're writing a weather simulation, I suppose you need real
random-like properties and still need your generator to be reproducible.
If you're using random Quicksort, you do need unpredictability and
reproducibility.  If you're writing a crypto application, then you need
something way stronger.  We need all of them.  But mt19937 is now useful
only in toy software.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: on the prng behind random.random()

2018-11-19 Thread Ian Kelly
On Mon, Nov 19, 2018 at 2:12 PM Robert Girault  wrote:
>
> Chris Angelico  writes:
>
> > On Tue, Nov 20, 2018 at 7:31 AM Robert Girault  wrote:
> >> Nice.  So Python's random.random() does indeed use mt19937.  Since it's
> >> been broken for years, why isn't it replaced by something newer like
> >> ChaCha20?  Is it due to backward compatibility?  That would make sense.
> >
> > What exactly do you mean by "broken"?
>
> I mean the fact that with 624 samples from the generator, you can
> determine the rest of the sequence completely.
>
> Sorry about mentioning ChaCha20.  That was misleading.  I should've said
> something newer like mrtg32k3a or xorshift*.

If you wish to propose replacing it, that topic is probably best
brought up at python-dev.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: on the prng behind random.random()

2018-11-19 Thread Robert Girault
Chris Angelico  writes:

> On Tue, Nov 20, 2018 at 7:31 AM Robert Girault  wrote:
>> Nice.  So Python's random.random() does indeed use mt19937.  Since it's
>> been broken for years, why isn't it replaced by something newer like
>> ChaCha20?  Is it due to backward compatibility?  That would make sense.
>
> What exactly do you mean by "broken"? 

I mean the fact that with 624 samples from the generator, you can
determine the rest of the sequence completely.

Sorry about mentioning ChaCha20.  That was misleading.  I should've said
something newer like mrtg32k3a or xorshift*.

> If you're generating random numbers for any sort of security purpose,
> you probably should look at this:
>
> https://docs.python.org/3/library/secrets.html
>
> (New in 3.6, though, hence the "probably". If you need to support 3.5
> or older - including 2.7 - then you can't use that.)

Thanks for the reference!  

I'm not particularly interested in security at the moment, but I would
like an expert's confirmation that some of these algorithms arent't
replaced due to backward compatibility.  We could easily replace them,
but I think we shouldn't: some people still depend on these algorithms
for their experiment.

Are there other reasons?
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: on the prng behind random.random()

2018-11-19 Thread Chris Angelico
On Tue, Nov 20, 2018 at 7:31 AM Robert Girault  wrote:
> Nice.  So Python's random.random() does indeed use mt19937.  Since it's
> been broken for years, why isn't it replaced by something newer like
> ChaCha20?  Is it due to backward compatibility?  That would make sense.

What exactly do you mean by "broken"? If you're generating random
numbers for any sort of security purpose, you probably should look at
this:

https://docs.python.org/3/library/secrets.html

(New in 3.6, though, hence the "probably". If you need to support 3.5
or older - including 2.7 - then you can't use that.)

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: on the prng behind random.random()

2018-11-19 Thread Robert Girault
Peter Otten <__pete...@web.de> writes:

> Robert Girault wrote:
>
>> Looking at its source code, it seems the PRNG behind random.random() is
>> Mersenne Twister, but I'm not sure.  It also seems that random.random()
>> is using /dev/urandom.  Can someone help me to read that source code?
>> 
>> I'm talking about CPython, by the way.  I'm reading
>> 
>>   https://github.com/python/cpython/blob/master/Lib/random.py
>> 
>> The initial comment clearly says it's Mersenne Twister, but the only
>> random() function there seems to call _urandom(), which I suppose is an
>> interface to /dev/urandom.
>> 
>> What am I missing here?
>
> There's a class random.Random which is instantiated at the end of the file, 
> and random() is bound to the corresponding method:
>
> _inst = Random()
> ...
> random = _inst.random
>
> The Random class inherits from _random.Random [...]

Thanks.  I missed that.

> which is implemented in C and does most of the actual work. If you can
> read C:
>
> https://github.com/python/cpython/blob/master/Modules/_randommodule.c
>
> The most relevant part seems to be genrand_int32() which is wrapped by 
> random_random() that actually implenents the _random.Random.random() method.

Nice.  So Python's random.random() does indeed use mt19937.  Since it's
been broken for years, why isn't it replaced by something newer like
ChaCha20?  Is it due to backward compatibility?  That would make sense.

Do you know who broke mt19937 and when?  I'd love to read the reference.
Thank you!
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: on the prng behind random.random()

2018-11-19 Thread Peter Otten
Robert Girault wrote:

> Looking at its source code, it seems the PRNG behind random.random() is
> Mersenne Twister, but I'm not sure.  It also seems that random.random()
> is using /dev/urandom.  Can someone help me to read that source code?
> 
> I'm talking about CPython, by the way.  I'm reading
> 
>   https://github.com/python/cpython/blob/master/Lib/random.py
> 
> The initial comment clearly says it's Mersenne Twister, but the only
> random() function there seems to call _urandom(), which I suppose is an
> interface to /dev/urandom.
> 
> What am I missing here?

There's a class random.Random which is instantiated at the end of the file, 
and random() is bound to the corresponding method:

_inst = Random()
...
random = _inst.random

The Random class inherits from _random.Random which is implemented in C and 
does most of the actual work. If you can read C:

https://github.com/python/cpython/blob/master/Modules/_randommodule.c

The most relevant part seems to be genrand_int32() which is wrapped by 
random_random() that actually implenents the _random.Random.random() method.

-- 
https://mail.python.org/mailman/listinfo/python-list