Re: [Math] About the refactoring of RNGs (Was: [01/18] [math] MATH-1307)

Phil Steitz Tue, 29 Dec 2015 08:38:58 -0800

On 12/29/15 2:33 AM, Luc Maisonobe wrote:
> Hi all,
>
> Le 29/12/2015 09:21, Thomas Neidhart a écrit :
>> On 12/29/2015 04:33 AM, Phil Steitz wrote:
>>> On 12/28/15 8:08 PM, Gilles wrote:
>>>> On Mon, 28 Dec 2015 11:08:56 -0700, Phil Steitz wrote:
>>>>> The significant refactoring to eliminate the (standard) next(int)
>>>>> included in these changes has the possibility of introducing subtle
>>>>> bugs or performance issues.  Please run some tests to verify that
>>>>> the same sequences are generated by the 3_X code
>>>> IIUC our unit tests of the RNGs, this is covered.
>>> No.  Not sufficient.  What you have done is changed the internal
>>> implementation of all of the Bitstream generators.  I am not
>>> convinced that you have not broken anything.  I will have to do the
>>> testing myself.  I see no point in fiddling with the internals of
>>> this code that has had a lot of eyeballs and testing on it.  I was
>>> not personally looking forward to researching the algorithms to make
>>> sure any invariants may be broken by these changes; but I am now
>>> going to have to do this.  I have to ask why.  Please at some point
>>> read [1], especially the sections on "Avoid Flexibility Syndrom" and
>>> "Value Laziness as a Virtue."  Gratuitous refactoring drains
>>> community energy. 
>> +1, on top of that I think we should aim to refactor the parts that
>> really need refactoring and try to keep the number of incompatibilities
>> to the 3_X branch as minimal as possible.
>>
>> Thomas
>>
>>>>> and the refactored
>>>>> code and benchmarks to show there is no loss in performance.
>>>> Given that there are exactly two operations _less_ (a subtraction
>>>> and a shift), it would be surprising.
>>>>
>>>>> It
>>>>> would also be good to have some additional review of this code by
>>>>> PRNG experts.
>>>> The "nextInt()" code is exactly the same as the "next(int)" modulo
>>>> the little change above (in the last line of the "nextInt/next"
>>>> code).
>>>>
>>>> That change in "nextInt/next" implied similarly tiny recodings in
>>>> the generic methods "nextDouble()", "nextBoolean()", ... which, apart
>>>> from that, were copied from "BitsStreamGenerator".
>>>>
>>>> [However tiny a change, I had made a mistake... and dozens of tests
>>>> started to fail. Found the typo and all was quiet again...]
>>>>
>>>> About "next(int)" being standard, it would be interesting to know
>>>> what that means.
> In all the papers I have read concerning pseudo random number
> generation, the basic model was based on small chunks of bits,
> much of the time the size of an int because this is what computer
> manages directly (they have no provision to manage chunks of 5 or
> 11 bits for example).


That's what I thought.  Internally, is it correct to assume that the
generation is always in int-sized blocks so Gilles is correct that
the bits parameter is only used for output truncation?
>
> Deriving other primitive types from this (boolean, long, double) is
> really an add-on. I even asked an expert about the (Pierre L'Ecuyer
> if I remember well) about some explanations for converting to double
> (which is simply done by multiplying by a constant representing the
> weight of the least significant bit in order to constrain the range to
> [0; 1]). His answer was that this ensured the theoretical mathematical
> proofs that apply to uniform distribution still apply, as only this
> case (uniformity over a multi-dimensional integral grid) has been
> studied. It seems nothing has been studied about using the exponential
> features of floating point representation in relationship with
> double random number generation directly.

Makes sense.  The - excellent - tests included with the Well,
Mersenne and Isaac generators verify (as Gilles states) that
nextInt() itself is likely unaffected by the changes above.  What I
am worried about is the conversions.  Do those changes look correct
to you?  That is what needs to be carefully reviewed and tested. 
>
> Hence everybody starts from int, and the mathematicians proved us
> this method works and some properties are preserved (multi-dimensional
> independance, long period, ...) that are essential typically for
> Monte-Carlo analyses.
>
> I know nothing about random number generation for secure application
> like cryptograpgy, except that it requires completely different
> properties, often completely opposite to what is needed for
> Monte-Carlo analysis. As an example, it should be impossible to
> reproduce a secure sequence (it cannot be deterministic), whereas in
> Monte-Carlo we *want* it to be reproducible if we reuse the same seed.
>
>>> Have a look at the source code for the JDK generators, for example.
>>>> As I indicated quite clearly in one of my first posts about this
>>>> refactoring
>>>> 1. all the CM implementations generate random bits in batches
>>>>    of 32 bits, and
>>>> 2. before returning, the "next(int bits)" method was truncating
>>>>    the generated "int":
>>>>      return x >>> (32 - bits);
>>>>
>>>> In all implementations, that was the only place where the "bits"
>>>> parameter was used, from which I concluded that the randomness
>>>> provider does not care if the request was to create less than 32
>>>> random bits.
>>>> Taking "nextBoolean()" for example, it looks like a waste of 31
>>>> bits (or am I missing something?).
>>> Quite possibly, yes, you are missing something.
> I would guess it is linked to performance consideration. Pseudo
> random number generation is sometimes put under very heavy stress
> with billions of numbers generated. It should run extremelly fast,
> and the algorithms have been designed to have tremendously long periods
> (things like 2^19937 -1). With such long periods, we can waste 31
> bits each time we produce 1 bit if it saves some overhead.
>
> best regards,
> Luc
>
>>>> Of course, if some implementation were able to store the bits not
>>>> requested by the last call to "next(int)", then I'd understand that
>>>> we must really provide access to a "next(int)" method.
>>>>
>>>> Perhaps that the overhead of such bookkeeping is why the practical
>>>> algorithms chose to store integers rather than bits (?).
>>>>
>>>> As you dismissed my request about CM being able to care for a RNG
>>>> implementation based on a "long", I don't quite understand the
>>>> caring for a "next(int)" that serves no more purpose (as of current
>>>> CM).
>>>>
>>> This change is
>>>> Gilles
>>>>
>>>>
>>>>> Phil
>>>>>
>>>>> On 12/28/15 10:23 AM, er...@apache.org wrote:
>>>>>> Repository: commons-math
>>>>>> Updated Branches:
>>>>>>   refs/heads/master 7b62d0155 -> 81585a3c4
>>>>>>
>>>>>>
>>>>>> MATH-1307
>>>>>>
>>>>>> New base class for RNG implementations.
>>>>>> The source of randomness is provided through the "nextInt()"
>>>>>> method (to be defined in subclasses).
>>>>>>
>>>>>>
>>>>>> [...]
>>> [1] http://www.apachecon.com/eu2007/materials/ac2006.2.pdf
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
>>>> For additional commands, e-mail: dev-h...@commons.apache.org
>>>>
>>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
>>> For additional commands, e-mail: dev-h...@commons.apache.org
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
>> For additional commands, e-mail: dev-h...@commons.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>
>



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [Math] About the refactoring of RNGs (Was: [01/18] [math] MATH-1307)

Reply via email to