On 12/29/15 2:33 AM, Luc Maisonobe wrote: > Hi all, > > Le 29/12/2015 09:21, Thomas Neidhart a écrit : >> On 12/29/2015 04:33 AM, Phil Steitz wrote: >>> On 12/28/15 8:08 PM, Gilles wrote: >>>> On Mon, 28 Dec 2015 11:08:56 -0700, Phil Steitz wrote: >>>>> The significant refactoring to eliminate the (standard) next(int) >>>>> included in these changes has the possibility of introducing subtle >>>>> bugs or performance issues. Please run some tests to verify that >>>>> the same sequences are generated by the 3_X code >>>> IIUC our unit tests of the RNGs, this is covered. >>> No. Not sufficient. What you have done is changed the internal >>> implementation of all of the Bitstream generators. I am not >>> convinced that you have not broken anything. I will have to do the >>> testing myself. I see no point in fiddling with the internals of >>> this code that has had a lot of eyeballs and testing on it. I was >>> not personally looking forward to researching the algorithms to make >>> sure any invariants may be broken by these changes; but I am now >>> going to have to do this. I have to ask why. Please at some point >>> read [1], especially the sections on "Avoid Flexibility Syndrom" and >>> "Value Laziness as a Virtue." Gratuitous refactoring drains >>> community energy. >> +1, on top of that I think we should aim to refactor the parts that >> really need refactoring and try to keep the number of incompatibilities >> to the 3_X branch as minimal as possible. >> >> Thomas >> >>>>> and the refactored >>>>> code and benchmarks to show there is no loss in performance. >>>> Given that there are exactly two operations _less_ (a subtraction >>>> and a shift), it would be surprising. >>>> >>>>> It >>>>> would also be good to have some additional review of this code by >>>>> PRNG experts. >>>> The "nextInt()" code is exactly the same as the "next(int)" modulo >>>> the little change above (in the last line of the "nextInt/next" >>>> code). >>>> >>>> That change in "nextInt/next" implied similarly tiny recodings in >>>> the generic methods "nextDouble()", "nextBoolean()", ... which, apart >>>> from that, were copied from "BitsStreamGenerator". >>>> >>>> [However tiny a change, I had made a mistake... and dozens of tests >>>> started to fail. Found the typo and all was quiet again...] >>>> >>>> About "next(int)" being standard, it would be interesting to know >>>> what that means. > In all the papers I have read concerning pseudo random number > generation, the basic model was based on small chunks of bits, > much of the time the size of an int because this is what computer > manages directly (they have no provision to manage chunks of 5 or > 11 bits for example).
That's what I thought. Internally, is it correct to assume that the generation is always in int-sized blocks so Gilles is correct that the bits parameter is only used for output truncation? > > Deriving other primitive types from this (boolean, long, double) is > really an add-on. I even asked an expert about the (Pierre L'Ecuyer > if I remember well) about some explanations for converting to double > (which is simply done by multiplying by a constant representing the > weight of the least significant bit in order to constrain the range to > [0; 1]). His answer was that this ensured the theoretical mathematical > proofs that apply to uniform distribution still apply, as only this > case (uniformity over a multi-dimensional integral grid) has been > studied. It seems nothing has been studied about using the exponential > features of floating point representation in relationship with > double random number generation directly. Makes sense. The - excellent - tests included with the Well, Mersenne and Isaac generators verify (as Gilles states) that nextInt() itself is likely unaffected by the changes above. What I am worried about is the conversions. Do those changes look correct to you? That is what needs to be carefully reviewed and tested. > > Hence everybody starts from int, and the mathematicians proved us > this method works and some properties are preserved (multi-dimensional > independance, long period, ...) that are essential typically for > Monte-Carlo analyses. > > I know nothing about random number generation for secure application > like cryptograpgy, except that it requires completely different > properties, often completely opposite to what is needed for > Monte-Carlo analysis. As an example, it should be impossible to > reproduce a secure sequence (it cannot be deterministic), whereas in > Monte-Carlo we *want* it to be reproducible if we reuse the same seed. > >>> Have a look at the source code for the JDK generators, for example. >>>> As I indicated quite clearly in one of my first posts about this >>>> refactoring >>>> 1. all the CM implementations generate random bits in batches >>>> of 32 bits, and >>>> 2. before returning, the "next(int bits)" method was truncating >>>> the generated "int": >>>> return x >>> (32 - bits); >>>> >>>> In all implementations, that was the only place where the "bits" >>>> parameter was used, from which I concluded that the randomness >>>> provider does not care if the request was to create less than 32 >>>> random bits. >>>> Taking "nextBoolean()" for example, it looks like a waste of 31 >>>> bits (or am I missing something?). >>> Quite possibly, yes, you are missing something. > I would guess it is linked to performance consideration. Pseudo > random number generation is sometimes put under very heavy stress > with billions of numbers generated. It should run extremelly fast, > and the algorithms have been designed to have tremendously long periods > (things like 2^19937 -1). With such long periods, we can waste 31 > bits each time we produce 1 bit if it saves some overhead. > > best regards, > Luc > >>>> Of course, if some implementation were able to store the bits not >>>> requested by the last call to "next(int)", then I'd understand that >>>> we must really provide access to a "next(int)" method. >>>> >>>> Perhaps that the overhead of such bookkeeping is why the practical >>>> algorithms chose to store integers rather than bits (?). >>>> >>>> As you dismissed my request about CM being able to care for a RNG >>>> implementation based on a "long", I don't quite understand the >>>> caring for a "next(int)" that serves no more purpose (as of current >>>> CM). >>>> >>> This change is >>>> Gilles >>>> >>>> >>>>> Phil >>>>> >>>>> On 12/28/15 10:23 AM, er...@apache.org wrote: >>>>>> Repository: commons-math >>>>>> Updated Branches: >>>>>> refs/heads/master 7b62d0155 -> 81585a3c4 >>>>>> >>>>>> >>>>>> MATH-1307 >>>>>> >>>>>> New base class for RNG implementations. >>>>>> The source of randomness is provided through the "nextInt()" >>>>>> method (to be defined in subclasses). >>>>>> >>>>>> >>>>>> [...] >>> [1] http://www.apachecon.com/eu2007/materials/ac2006.2.pdf >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >>>> For additional commands, e-mail: dev-h...@commons.apache.org >>>> >>>> >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >>> For additional commands, e-mail: dev-h...@commons.apache.org >>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >> For additional commands, e-mail: dev-h...@commons.apache.org >> >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org