Re: [math] Improving tests and performance of RandomGenerator implementations

2011-08-05 Thread Phil Steitz
On 8/3/11 10:05 AM, Luc Maisonobe wrote:
> Le 03/08/2011 18:15, Phil Steitz a écrit :
>> On 8/3/11 9:02 AM, sebb wrote:
>>> On 3 August 2011 09:06, Luc Maisonobe 
>>> wrote:
 Le 03/08/2011 09:38, Luc Maisonobe a écrit :
> Le 01/08/2011 22:40, Luc Maisonobe a écrit :
>> Hi Phil,
>>
>> Le 01/08/2011 20:39, Phil Steitz a écrit :
>>> On 8/1/11 1:31 AM, luc.maison...@free.fr wrote:
 Hi Phil,

 - Mail original -
> In my own applications, I noticed what appears to be poor
> performance in the nextInt(int) method of the Mersenne
> twister,
> which I was using to *improve* speed. I think that for
> small n, the
> default implementation in BistreamGenerator may be running
> too many
> iterations.
 Mersenne twister uses a quite large pool. It creates
 pseudo-random bits
 by twisting it and creates large bunches at a time (624
 words at a
 time).
 Hence when you ask for large sets, you should have several
 calls that
 return fast, and one call that takes a longer time to
 generate another
 large pool.

 So good performances are obtained for generating large
 sets, not
 small sets.

 Well generators should be faster and are preferred over
 Mersenne
 twister now,
 which is now an old generator. Well generators also have
 large pools,
 but they
 don't generate bits in large batches in advance, they do
 generate a
 few words
 at a time.
>>> Yeah, I know. Both are faster than the JDK, though, even for
>>> just
>>> 32-bit chunks in my tests at least.
>>>
>>> One thing I have been thinking about is exposing nextInt[],
>>> nextDouble[], nextGaussian[] etc methods that take advantage
>>> of the
>>> pools. So you basically generate a large block of bits use
>>> this to
>>> fill the output arrays.
>> Seems a very good idea. Most of the time, people generate
>> only one kind
>> of numbers several times, so it really does make sense.
>>
> I am still figuring out how the code works, but I
> thought it would be good to run some benchmarks - using
> Gilles' new
> stuff - against the Harmony implementation in
> java.util.Random of
> this method. That led me to notice that there are no unit
> tests for
> BitstreamGenerator. I propose that we add
> 0) RandomGeneratorAbstractTest with an abstract makeGenerator
> method including fixed seed tests for all RandomGenerator
> methods
> 1) BitstreamGeneratorTest extending
> RandomGeneratorAbstractTest
> implementing makeGenerator with a BitStreamGenerator that
> uses the
> JDK generator for next(int)
> 2) Make the test classes for Mersenne and Weil generators
> extend
> RandomGeneratorAbstractTest, moving redundant tests up
> into the base
> class
>
> Sound reasonable?
 +1

> Also, any recollection why we are using a
> different implementation in BitStreamGenerator for
> next(int) than
> Harmony and the JDK use?
 I don't understand what you mean. next(int) is used to
 generate the raw
 bits and is the heart of each generator. Each generator has
 its own
 implementation. Replacing next(int) by the JDK generation
 would imply
 dropping completely Mersenne twister and Well generators.
>>> I am sorry. I meant nextInt(int). It is that code that seems
>>> to be
>>> slow in BitStreamGenerator and different from the JDK and
>>> Harmony.
>> Could you point me at some code ? There are many pitfalls in
>> netInt(int)
>> if one wants to make sure the generator is uniform, which
>> explain the
>> strange implementation, with the mask computation and the
>> loop. By the
>> way, even this implementation would benefit from your
>> proposed array
>> generation, as the mask could be computed only once.
> I have looked at the implementation for JDK and Harmony and am
> a little
> puzzled.
>
> The trick for the power of two (i.e. if ((n&  -n) == n)) is
> not useful
> for the very elaborate generators like Mersenne twister or
> Well. Both
> are proven to be equidistributed even for the low order bits.
> They are
> based on linear recurrences but not linear congruences and do
> not suffer
> from the drawbacks of the latter.
>
> What puzzles me more is the loop. It is documented as avoiding
> the
> uneven distributions, but at first glance the modulo operation
> bothers
> me. As documentation explicitly states it is designed for
> this

Re: [math] Improving tests and performance of RandomGenerator implementations

2011-08-03 Thread Luc Maisonobe

Le 03/08/2011 18:15, Phil Steitz a écrit :

On 8/3/11 9:02 AM, sebb wrote:

On 3 August 2011 09:06, Luc Maisonobe  wrote:

Le 03/08/2011 09:38, Luc Maisonobe a écrit :

Le 01/08/2011 22:40, Luc Maisonobe a écrit :

Hi Phil,

Le 01/08/2011 20:39, Phil Steitz a écrit :

On 8/1/11 1:31 AM, luc.maison...@free.fr wrote:

Hi Phil,

- Mail original -

In my own applications, I noticed what appears to be poor
performance in the nextInt(int) method of the Mersenne twister,
which I was using to *improve* speed. I think that for small n, the
default implementation in BistreamGenerator may be running too many
iterations.

Mersenne twister uses a quite large pool. It creates pseudo-random bits
by twisting it and creates large bunches at a time (624 words at a
time).
Hence when you ask for large sets, you should have several calls that
return fast, and one call that takes a longer time to generate another
large pool.

So good performances are obtained for generating large sets, not
small sets.

Well generators should be faster and are preferred over Mersenne
twister now,
which is now an old generator. Well generators also have large pools,
but they
don't generate bits in large batches in advance, they do generate a
few words
at a time.

Yeah, I know. Both are faster than the JDK, though, even for just
32-bit chunks in my tests at least.

One thing I have been thinking about is exposing nextInt[],
nextDouble[], nextGaussian[] etc methods that take advantage of the
pools. So you basically generate a large block of bits use this to
fill the output arrays.

Seems a very good idea. Most of the time, people generate only one kind
of numbers several times, so it really does make sense.


I am still figuring out how the code works, but I
thought it would be good to run some benchmarks - using Gilles' new
stuff - against the Harmony implementation in java.util.Random of
this method. That led me to notice that there are no unit tests for
BitstreamGenerator. I propose that we add
0) RandomGeneratorAbstractTest with an abstract makeGenerator
method including fixed seed tests for all RandomGenerator methods
1) BitstreamGeneratorTest extending RandomGeneratorAbstractTest
implementing makeGenerator with a BitStreamGenerator that uses the
JDK generator for next(int)
2) Make the test classes for Mersenne and Weil generators extend
RandomGeneratorAbstractTest, moving redundant tests up into the base
class

Sound reasonable?

+1


Also, any recollection why we are using a
different implementation in BitStreamGenerator for next(int) than
Harmony and the JDK use?

I don't understand what you mean. next(int) is used to generate the raw
bits and is the heart of each generator. Each generator has its own
implementation. Replacing next(int) by the JDK generation would imply
dropping completely Mersenne twister and Well generators.

I am sorry. I meant nextInt(int). It is that code that seems to be
slow in BitStreamGenerator and different from the JDK and Harmony.

Could you point me at some code ? There are many pitfalls in netInt(int)
if one wants to make sure the generator is uniform, which explain the
strange implementation, with the mask computation and the loop. By the
way, even this implementation would benefit from your proposed array
generation, as the mask could be computed only once.

I have looked at the implementation for JDK and Harmony and am a little
puzzled.

The trick for the power of two (i.e. if ((n&  -n) == n)) is not useful
for the very elaborate generators like Mersenne twister or Well. Both
are proven to be equidistributed even for the low order bits. They are
based on linear recurrences but not linear congruences and do not suffer
from the drawbacks of the latter.

What puzzles me more is the loop. It is documented as avoiding the
uneven distributions, but at first glance the modulo operation bothers
me. As documentation explicitly states it is designed for this, it is
most probably true, I simply don't understand how yet.

So our current implementation is slow, then go ahead and change it to
the one you showed me. I would simply suggest to get rid of the ((n&
-n) == n) test. I'll try to understand the condition in the while loop
to understand how it rejects uneven distributions, just out of curiosity
for myself.

OK, I finally understood the algorithm and how it rejects the largest
incomplete numbers from k*n to (2^31)-1 where k*n is the largest multiple of
n that fits in a positive integer. The trick lies in the addition of (n-1)
which overflows the integer and wraps the result back to negative values. It
is smart.

+1 to use it.

Provided that the algorithm is documented ...


Yeah, I was going to try to decipher it (and the current impl) and
provide some doc.  One other thing to consider in this decision is
do we have to worry about encumberance.  The Harmony impl looks very
similar to what is described in the JDK javadoc.  I wonder if
SunOracle might have claim to it.

Where did you get the current impl, Lu

Re: [math] Improving tests and performance of RandomGenerator implementations

2011-08-03 Thread Phil Steitz
On 8/3/11 9:02 AM, sebb wrote:
> On 3 August 2011 09:06, Luc Maisonobe  wrote:
>> Le 03/08/2011 09:38, Luc Maisonobe a écrit :
>>> Le 01/08/2011 22:40, Luc Maisonobe a écrit :
 Hi Phil,

 Le 01/08/2011 20:39, Phil Steitz a écrit :
> On 8/1/11 1:31 AM, luc.maison...@free.fr wrote:
>> Hi Phil,
>>
>> - Mail original -
>>> In my own applications, I noticed what appears to be poor
>>> performance in the nextInt(int) method of the Mersenne twister,
>>> which I was using to *improve* speed. I think that for small n, the
>>> default implementation in BistreamGenerator may be running too many
>>> iterations.
>> Mersenne twister uses a quite large pool. It creates pseudo-random bits
>> by twisting it and creates large bunches at a time (624 words at a
>> time).
>> Hence when you ask for large sets, you should have several calls that
>> return fast, and one call that takes a longer time to generate another
>> large pool.
>>
>> So good performances are obtained for generating large sets, not
>> small sets.
>>
>> Well generators should be faster and are preferred over Mersenne
>> twister now,
>> which is now an old generator. Well generators also have large pools,
>> but they
>> don't generate bits in large batches in advance, they do generate a
>> few words
>> at a time.
> Yeah, I know. Both are faster than the JDK, though, even for just
> 32-bit chunks in my tests at least.
>
> One thing I have been thinking about is exposing nextInt[],
> nextDouble[], nextGaussian[] etc methods that take advantage of the
> pools. So you basically generate a large block of bits use this to
> fill the output arrays.
 Seems a very good idea. Most of the time, people generate only one kind
 of numbers several times, so it really does make sense.

>>> I am still figuring out how the code works, but I
>>> thought it would be good to run some benchmarks - using Gilles' new
>>> stuff - against the Harmony implementation in java.util.Random of
>>> this method. That led me to notice that there are no unit tests for
>>> BitstreamGenerator. I propose that we add
>>> 0) RandomGeneratorAbstractTest with an abstract makeGenerator
>>> method including fixed seed tests for all RandomGenerator methods
>>> 1) BitstreamGeneratorTest extending RandomGeneratorAbstractTest
>>> implementing makeGenerator with a BitStreamGenerator that uses the
>>> JDK generator for next(int)
>>> 2) Make the test classes for Mersenne and Weil generators extend
>>> RandomGeneratorAbstractTest, moving redundant tests up into the base
>>> class
>>>
>>> Sound reasonable?
>> +1
>>
>>> Also, any recollection why we are using a
>>> different implementation in BitStreamGenerator for next(int) than
>>> Harmony and the JDK use?
>> I don't understand what you mean. next(int) is used to generate the raw
>> bits and is the heart of each generator. Each generator has its own
>> implementation. Replacing next(int) by the JDK generation would imply
>> dropping completely Mersenne twister and Well generators.
> I am sorry. I meant nextInt(int). It is that code that seems to be
> slow in BitStreamGenerator and different from the JDK and Harmony.
 Could you point me at some code ? There are many pitfalls in netInt(int)
 if one wants to make sure the generator is uniform, which explain the
 strange implementation, with the mask computation and the loop. By the
 way, even this implementation would benefit from your proposed array
 generation, as the mask could be computed only once.
>>> I have looked at the implementation for JDK and Harmony and am a little
>>> puzzled.
>>>
>>> The trick for the power of two (i.e. if ((n & -n) == n)) is not useful
>>> for the very elaborate generators like Mersenne twister or Well. Both
>>> are proven to be equidistributed even for the low order bits. They are
>>> based on linear recurrences but not linear congruences and do not suffer
>>> from the drawbacks of the latter.
>>>
>>> What puzzles me more is the loop. It is documented as avoiding the
>>> uneven distributions, but at first glance the modulo operation bothers
>>> me. As documentation explicitly states it is designed for this, it is
>>> most probably true, I simply don't understand how yet.
>>>
>>> So our current implementation is slow, then go ahead and change it to
>>> the one you showed me. I would simply suggest to get rid of the ((n &
>>> -n) == n) test. I'll try to understand the condition in the while loop
>>> to understand how it rejects uneven distributions, just out of curiosity
>>> for myself.
>> OK, I finally understood the algorithm and how it rejects the largest
>> incomplete numbers from k*n to (2^31)-1 where k*n is the largest multiple of
>> n that fits in a positive integer. The trick lies in the addit

Re: [math] Improving tests and performance of RandomGenerator implementations

2011-08-03 Thread sebb
On 3 August 2011 09:06, Luc Maisonobe  wrote:
> Le 03/08/2011 09:38, Luc Maisonobe a écrit :
>>
>> Le 01/08/2011 22:40, Luc Maisonobe a écrit :
>>>
>>> Hi Phil,
>>>
>>> Le 01/08/2011 20:39, Phil Steitz a écrit :

 On 8/1/11 1:31 AM, luc.maison...@free.fr wrote:
>
> Hi Phil,
>
> - Mail original -
>>
>> In my own applications, I noticed what appears to be poor
>> performance in the nextInt(int) method of the Mersenne twister,
>> which I was using to *improve* speed. I think that for small n, the
>> default implementation in BistreamGenerator may be running too many
>> iterations.
>
> Mersenne twister uses a quite large pool. It creates pseudo-random bits
> by twisting it and creates large bunches at a time (624 words at a
> time).
> Hence when you ask for large sets, you should have several calls that
> return fast, and one call that takes a longer time to generate another
> large pool.
>
> So good performances are obtained for generating large sets, not
> small sets.
>
> Well generators should be faster and are preferred over Mersenne
> twister now,
> which is now an old generator. Well generators also have large pools,
> but they
> don't generate bits in large batches in advance, they do generate a
> few words
> at a time.

 Yeah, I know. Both are faster than the JDK, though, even for just
 32-bit chunks in my tests at least.

 One thing I have been thinking about is exposing nextInt[],
 nextDouble[], nextGaussian[] etc methods that take advantage of the
 pools. So you basically generate a large block of bits use this to
 fill the output arrays.
>>>
>>> Seems a very good idea. Most of the time, people generate only one kind
>>> of numbers several times, so it really does make sense.
>>>
>
>> I am still figuring out how the code works, but I
>> thought it would be good to run some benchmarks - using Gilles' new
>> stuff - against the Harmony implementation in java.util.Random of
>> this method. That led me to notice that there are no unit tests for
>> BitstreamGenerator. I propose that we add
>> 0) RandomGeneratorAbstractTest with an abstract makeGenerator
>> method including fixed seed tests for all RandomGenerator methods
>> 1) BitstreamGeneratorTest extending RandomGeneratorAbstractTest
>> implementing makeGenerator with a BitStreamGenerator that uses the
>> JDK generator for next(int)
>> 2) Make the test classes for Mersenne and Weil generators extend
>> RandomGeneratorAbstractTest, moving redundant tests up into the base
>> class
>>
>> Sound reasonable?
>
> +1
>
>> Also, any recollection why we are using a
>> different implementation in BitStreamGenerator for next(int) than
>> Harmony and the JDK use?
>
> I don't understand what you mean. next(int) is used to generate the raw
> bits and is the heart of each generator. Each generator has its own
> implementation. Replacing next(int) by the JDK generation would imply
> dropping completely Mersenne twister and Well generators.

 I am sorry. I meant nextInt(int). It is that code that seems to be
 slow in BitStreamGenerator and different from the JDK and Harmony.
>>>
>>> Could you point me at some code ? There are many pitfalls in netInt(int)
>>> if one wants to make sure the generator is uniform, which explain the
>>> strange implementation, with the mask computation and the loop. By the
>>> way, even this implementation would benefit from your proposed array
>>> generation, as the mask could be computed only once.
>>
>> I have looked at the implementation for JDK and Harmony and am a little
>> puzzled.
>>
>> The trick for the power of two (i.e. if ((n & -n) == n)) is not useful
>> for the very elaborate generators like Mersenne twister or Well. Both
>> are proven to be equidistributed even for the low order bits. They are
>> based on linear recurrences but not linear congruences and do not suffer
>> from the drawbacks of the latter.
>>
>> What puzzles me more is the loop. It is documented as avoiding the
>> uneven distributions, but at first glance the modulo operation bothers
>> me. As documentation explicitly states it is designed for this, it is
>> most probably true, I simply don't understand how yet.
>>
>> So our current implementation is slow, then go ahead and change it to
>> the one you showed me. I would simply suggest to get rid of the ((n &
>> -n) == n) test. I'll try to understand the condition in the while loop
>> to understand how it rejects uneven distributions, just out of curiosity
>> for myself.
>
> OK, I finally understood the algorithm and how it rejects the largest
> incomplete numbers from k*n to (2^31)-1 where k*n is the largest multiple of
> n that fits in a positive integer. The trick lies in the addition of (n-1)
> which overflows the integer and wraps th

Re: [math] Improving tests and performance of RandomGenerator implementations

2011-08-03 Thread Luc Maisonobe

Le 03/08/2011 09:38, Luc Maisonobe a écrit :

Le 01/08/2011 22:40, Luc Maisonobe a écrit :

Hi Phil,

Le 01/08/2011 20:39, Phil Steitz a écrit :

On 8/1/11 1:31 AM, luc.maison...@free.fr wrote:

Hi Phil,

- Mail original -

In my own applications, I noticed what appears to be poor
performance in the nextInt(int) method of the Mersenne twister,
which I was using to *improve* speed. I think that for small n, the
default implementation in BistreamGenerator may be running too many
iterations.

Mersenne twister uses a quite large pool. It creates pseudo-random bits
by twisting it and creates large bunches at a time (624 words at a
time).
Hence when you ask for large sets, you should have several calls that
return fast, and one call that takes a longer time to generate another
large pool.

So good performances are obtained for generating large sets, not
small sets.

Well generators should be faster and are preferred over Mersenne
twister now,
which is now an old generator. Well generators also have large pools,
but they
don't generate bits in large batches in advance, they do generate a
few words
at a time.


Yeah, I know. Both are faster than the JDK, though, even for just
32-bit chunks in my tests at least.

One thing I have been thinking about is exposing nextInt[],
nextDouble[], nextGaussian[] etc methods that take advantage of the
pools. So you basically generate a large block of bits use this to
fill the output arrays.


Seems a very good idea. Most of the time, people generate only one kind
of numbers several times, so it really does make sense.




I am still figuring out how the code works, but I
thought it would be good to run some benchmarks - using Gilles' new
stuff - against the Harmony implementation in java.util.Random of
this method. That led me to notice that there are no unit tests for
BitstreamGenerator. I propose that we add
0) RandomGeneratorAbstractTest with an abstract makeGenerator
method including fixed seed tests for all RandomGenerator methods
1) BitstreamGeneratorTest extending RandomGeneratorAbstractTest
implementing makeGenerator with a BitStreamGenerator that uses the
JDK generator for next(int)
2) Make the test classes for Mersenne and Weil generators extend
RandomGeneratorAbstractTest, moving redundant tests up into the base
class

Sound reasonable?

+1


Also, any recollection why we are using a
different implementation in BitStreamGenerator for next(int) than
Harmony and the JDK use?

I don't understand what you mean. next(int) is used to generate the raw
bits and is the heart of each generator. Each generator has its own
implementation. Replacing next(int) by the JDK generation would imply
dropping completely Mersenne twister and Well generators.


I am sorry. I meant nextInt(int). It is that code that seems to be
slow in BitStreamGenerator and different from the JDK and Harmony.


Could you point me at some code ? There are many pitfalls in netInt(int)
if one wants to make sure the generator is uniform, which explain the
strange implementation, with the mask computation and the loop. By the
way, even this implementation would benefit from your proposed array
generation, as the mask could be computed only once.


I have looked at the implementation for JDK and Harmony and am a little
puzzled.

The trick for the power of two (i.e. if ((n & -n) == n)) is not useful
for the very elaborate generators like Mersenne twister or Well. Both
are proven to be equidistributed even for the low order bits. They are
based on linear recurrences but not linear congruences and do not suffer
from the drawbacks of the latter.

What puzzles me more is the loop. It is documented as avoiding the
uneven distributions, but at first glance the modulo operation bothers
me. As documentation explicitly states it is designed for this, it is
most probably true, I simply don't understand how yet.

So our current implementation is slow, then go ahead and change it to
the one you showed me. I would simply suggest to get rid of the ((n &
-n) == n) test. I'll try to understand the condition in the while loop
to understand how it rejects uneven distributions, just out of curiosity
for myself.


OK, I finally understood the algorithm and how it rejects the largest 
incomplete numbers from k*n to (2^31)-1 where k*n is the largest 
multiple of n that fits in a positive integer. The trick lies in the 
addition of (n-1) which overflows the integer and wraps the result back 
to negative values. It is smart.


+1 to use it.

Luc



Luc



Luc




Phil


Mersenne twister and Well should be fast for generating large sets, but
most importantly they have very good and *proven* properties
(equidistribution
on large dimensions, null correlation, maximal period ...). These
properties
are essential for example in Monte-Carlo simulations with lots of
variables that
must be independent or have controlled correlations.

Luc


The Harmony impl is almost identical to
what is documented in the JDK javadoc.

Phil

---

Re: [math] Improving tests and performance of RandomGenerator implementations

2011-08-03 Thread Luc Maisonobe

Le 01/08/2011 22:40, Luc Maisonobe a écrit :

Hi Phil,

Le 01/08/2011 20:39, Phil Steitz a écrit :

On 8/1/11 1:31 AM, luc.maison...@free.fr wrote:

Hi Phil,

- Mail original -

In my own applications, I noticed what appears to be poor
performance in the nextInt(int) method of the Mersenne twister,
which I was using to *improve* speed. I think that for small n, the
default implementation in BistreamGenerator may be running too many
iterations.

Mersenne twister uses a quite large pool. It creates pseudo-random bits
by twisting it and creates large bunches at a time (624 words at a
time).
Hence when you ask for large sets, you should have several calls that
return fast, and one call that takes a longer time to generate another
large pool.

So good performances are obtained for generating large sets, not
small sets.

Well generators should be faster and are preferred over Mersenne
twister now,
which is now an old generator. Well generators also have large pools,
but they
don't generate bits in large batches in advance, they do generate a
few words
at a time.


Yeah, I know. Both are faster than the JDK, though, even for just
32-bit chunks in my tests at least.

One thing I have been thinking about is exposing nextInt[],
nextDouble[], nextGaussian[] etc methods that take advantage of the
pools. So you basically generate a large block of bits use this to
fill the output arrays.


Seems a very good idea. Most of the time, people generate only one kind
of numbers several times, so it really does make sense.




I am still figuring out how the code works, but I
thought it would be good to run some benchmarks - using Gilles' new
stuff - against the Harmony implementation in java.util.Random of
this method. That led me to notice that there are no unit tests for
BitstreamGenerator. I propose that we add
0) RandomGeneratorAbstractTest with an abstract makeGenerator
method including fixed seed tests for all RandomGenerator methods
1) BitstreamGeneratorTest extending RandomGeneratorAbstractTest
implementing makeGenerator with a BitStreamGenerator that uses the
JDK generator for next(int)
2) Make the test classes for Mersenne and Weil generators extend
RandomGeneratorAbstractTest, moving redundant tests up into the base
class

Sound reasonable?

+1


Also, any recollection why we are using a
different implementation in BitStreamGenerator for next(int) than
Harmony and the JDK use?

I don't understand what you mean. next(int) is used to generate the raw
bits and is the heart of each generator. Each generator has its own
implementation. Replacing next(int) by the JDK generation would imply
dropping completely Mersenne twister and Well generators.


I am sorry. I meant nextInt(int). It is that code that seems to be
slow in BitStreamGenerator and different from the JDK and Harmony.


Could you point me at some code ? There are many pitfalls in netInt(int)
if one wants to make sure the generator is uniform, which explain the
strange implementation, with the mask computation and the loop. By the
way, even this implementation would benefit from your proposed array
generation, as the mask could be computed only once.


I have looked at the implementation for JDK and Harmony and am a little 
puzzled.


The trick for the power of two (i.e. if ((n & -n) == n)) is not useful 
for the very elaborate generators like Mersenne twister or Well. Both 
are proven to be equidistributed even for the low order bits. They are 
based on linear recurrences but not linear congruences and do not suffer 
from the drawbacks of the latter.


What puzzles me more is the loop. It is documented as avoiding the 
uneven distributions, but at first glance the modulo operation bothers 
me. As documentation explicitly states it is designed for this, it is 
most probably true, I simply don't understand how yet.


So our current implementation is slow, then go ahead and change it to 
the one you showed me. I would simply suggest to get rid of the ((n & 
-n) == n) test. I'll try to understand the condition in the while loop 
to understand how it rejects uneven distributions, just out of curiosity 
for myself.


Luc



Luc




Phil


Mersenne twister and Well should be fast for generating large sets, but
most importantly they have very good and *proven* properties
(equidistribution
on large dimensions, null correlation, maximal period ...). These
properties
are essential for example in Monte-Carlo simulations with lots of
variables that
must be independent or have controlled correlations.

Luc


The Harmony impl is almost identical to
what is documented in the JDK javadoc.

Phil

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.

Re: [math] Improving tests and performance of RandomGenerator implementations

2011-08-01 Thread Phil Steitz
On 8/1/11 1:40 PM, Luc Maisonobe wrote:
> Hi Phil,
>
> Le 01/08/2011 20:39, Phil Steitz a écrit :
>> On 8/1/11 1:31 AM, luc.maison...@free.fr wrote:
>>> Hi Phil,
>>>
>>> - Mail original -
 In my own applications, I noticed what appears to be poor
 performance in the nextInt(int) method of the Mersenne twister,
 which I was using to *improve* speed.  I think that for small
 n, the
 default implementation in BistreamGenerator may be running too
 many
 iterations.
>>> Mersenne twister uses a quite large pool. It creates
>>> pseudo-random bits
>>> by twisting it and creates large bunches at a time (624 words at
>>> a time).
>>> Hence when you ask for large sets, you should have several calls
>>> that
>>> return fast, and one call that takes a longer time to generate
>>> another
>>> large pool.
>>>
>>> So good performances are obtained for generating large sets, not
>>> small sets.
>>>
>>> Well generators should be faster and are preferred over Mersenne
>>> twister now,
>>> which is now an old generator. Well generators also have large
>>> pools, but they
>>> don't generate bits in large batches in advance, they do
>>> generate a few words
>>> at a time.
>>
>> Yeah, I know.  Both are faster than the JDK, though, even for just
>> 32-bit chunks in my tests at least.
>>
>> One thing I have been thinking about is exposing nextInt[],
>> nextDouble[], nextGaussian[] etc methods that take advantage of the
>> pools.  So you basically generate a large block of bits use this to
>> fill the output arrays.
>
> Seems a very good idea. Most of the time, people generate only one
> kind of numbers several times, so it really does make sense.

What I am not certain of and maybe you can clarify is beyond
optimizing the chunking, does this really buy you anything in the
common use case where the same generator instance is used repeatedly
to fill an array.  Won't the generator leverage the pool in this
case anyway?  The additional info on how many bits are going to be
required should in theory be useful, but I am not sure how much it
will get you.

>
>>>
 I am still figuring out how the code works, but I
 thought it would be good to run some benchmarks - using Gilles'
 new
 stuff - against the Harmony implementation in java.util.Random of
 this method.  That led me to notice that there are no unit
 tests for
 BitstreamGenerator.  I propose that we add
   0) RandomGeneratorAbstractTest with an abstract makeGenerator
 method including fixed seed tests for all RandomGenerator methods
   1) BitstreamGeneratorTest extending RandomGeneratorAbstractTest
 implementing makeGenerator with a BitStreamGenerator that uses the
 JDK generator for next(int)
   2) Make the test classes for Mersenne and Weil generators extend
 RandomGeneratorAbstractTest, moving redundant tests up into the
 base
 class

 Sound reasonable?
>>> +1
>>>
 Also, any recollection why we are using a
 different implementation in BitStreamGenerator for next(int) than
 Harmony and the JDK use?
>>> I don't understand what you mean. next(int) is used to generate
>>> the raw
>>> bits and is the heart of each generator. Each generator has its own
>>> implementation. Replacing next(int) by the JDK generation would
>>> imply
>>> dropping completely Mersenne twister and Well generators.
>>
>> I am sorry.  I meant nextInt(int).  It is that code that seems to be
>> slow in BitStreamGenerator and different from the JDK and Harmony.
>
> Could you point me at some code ? There are many pitfalls in
> netInt(int) if one wants to make sure the generator is uniform,
> which explain the strange implementation, with the mask
> computation and the loop. By the way, even this implementation
> would benefit from your proposed array generation, as the mask
> could be computed only once.

See the JDK javadoc [1], which I assume matches the internal
implementation and the Harmony code [2].  I know the pitfalls you
are referring to, since I played a good bit with my own code to just
translate bits when I needed to optimize performance in one of my
own apps.  I don't quite get the code we have now.  I am a little
baffled by the shifts to create the mask.  I agree that for this
case, you could definitely get a boost if you know n in advance and
are filling an array.

Thanks for looking at this.

Phil

[1]
http://download.oracle.com/javase/6/docs/api/java/util/Random.html#nextInt%28int%29
[2] http://s.apache.org/ZWt
>
> Luc
>
>
>>
>> Phil
>>>
>>> Mersenne twister and Well should be fast for generating large
>>> sets, but
>>> most importantly they have very good and *proven* properties
>>> (equidistribution
>>> on large dimensions, null correlation, maximal period ...).
>>> These properties
>>> are essential for example in Monte-Carlo simulations with lots
>>> of variables that
>>> must be independent or have controlled correlations.
>>>
>>> Luc
>>>
 The Harmony impl is almost ident

Re: [math] Improving tests and performance of RandomGenerator implementations

2011-08-01 Thread Luc Maisonobe

Hi Phil,

Le 01/08/2011 20:39, Phil Steitz a écrit :

On 8/1/11 1:31 AM, luc.maison...@free.fr wrote:

Hi Phil,

- Mail original -

In my own applications, I noticed what appears to be poor
performance in the nextInt(int) method of the Mersenne twister,
which I was using to *improve* speed.  I think that for small n, the
default implementation in BistreamGenerator may be running too many
iterations.

Mersenne twister uses a quite large pool. It creates pseudo-random bits
by twisting it and creates large bunches at a time (624 words at a time).
Hence when you ask for large sets, you should have several calls that
return fast, and one call that takes a longer time to generate another
large pool.

So good performances are obtained for generating large sets, not small sets.

Well generators should be faster and are preferred over Mersenne twister now,
which is now an old generator. Well generators also have large pools, but they
don't generate bits in large batches in advance, they do generate a few words
at a time.


Yeah, I know.  Both are faster than the JDK, though, even for just
32-bit chunks in my tests at least.

One thing I have been thinking about is exposing nextInt[],
nextDouble[], nextGaussian[] etc methods that take advantage of the
pools.  So you basically generate a large block of bits use this to
fill the output arrays.


Seems a very good idea. Most of the time, people generate only one kind 
of numbers several times, so it really does make sense.





I am still figuring out how the code works, but I
thought it would be good to run some benchmarks - using Gilles' new
stuff - against the Harmony implementation in java.util.Random of
this method.  That led me to notice that there are no unit tests for
BitstreamGenerator.  I propose that we add
  0) RandomGeneratorAbstractTest with an abstract makeGenerator
method including fixed seed tests for all RandomGenerator methods
  1) BitstreamGeneratorTest extending RandomGeneratorAbstractTest
implementing makeGenerator with a BitStreamGenerator that uses the
JDK generator for next(int)
  2) Make the test classes for Mersenne and Weil generators extend
RandomGeneratorAbstractTest, moving redundant tests up into the base
class

Sound reasonable?

+1


Also, any recollection why we are using a
different implementation in BitStreamGenerator for next(int) than
Harmony and the JDK use?

I don't understand what you mean. next(int) is used to generate the raw
bits and is the heart of each generator. Each generator has its own
implementation. Replacing next(int) by the JDK generation would imply
dropping completely Mersenne twister and Well generators.


I am sorry.  I meant nextInt(int).  It is that code that seems to be
slow in BitStreamGenerator and different from the JDK and Harmony.


Could you point me at some code ? There are many pitfalls in netInt(int) 
if one wants to make sure the generator is uniform, which explain the 
strange implementation, with the mask computation and the loop. By the 
way, even this implementation would benefit from your proposed array 
generation, as the mask could be computed only once.


Luc




Phil


Mersenne twister and Well should be fast for generating large sets, but
most importantly they have very good and *proven* properties (equidistribution
on large dimensions, null correlation, maximal period ...). These properties
are essential for example in Monte-Carlo simulations with lots of variables that
must be independent or have controlled correlations.

Luc


The Harmony impl is almost identical to
what is documented in the JDK javadoc.

Phil

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org





-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [math] Improving tests and performance of RandomGenerator implementations

2011-08-01 Thread Phil Steitz
On 8/1/11 1:31 AM, luc.maison...@free.fr wrote:
> Hi Phil,
>
> - Mail original -
>> In my own applications, I noticed what appears to be poor
>> performance in the nextInt(int) method of the Mersenne twister,
>> which I was using to *improve* speed.  I think that for small n, the
>> default implementation in BistreamGenerator may be running too many
>> iterations.
> Mersenne twister uses a quite large pool. It creates pseudo-random bits
> by twisting it and creates large bunches at a time (624 words at a time).
> Hence when you ask for large sets, you should have several calls that
> return fast, and one call that takes a longer time to generate another
> large pool.
>
> So good performances are obtained for generating large sets, not small sets.
>
> Well generators should be faster and are preferred over Mersenne twister now,
> which is now an old generator. Well generators also have large pools, but they
> don't generate bits in large batches in advance, they do generate a few words
> at a time.

Yeah, I know.  Both are faster than the JDK, though, even for just
32-bit chunks in my tests at least.

One thing I have been thinking about is exposing nextInt[],
nextDouble[], nextGaussian[] etc methods that take advantage of the
pools.  So you basically generate a large block of bits use this to
fill the output arrays.
>
>> I am still figuring out how the code works, but I
>> thought it would be good to run some benchmarks - using Gilles' new
>> stuff - against the Harmony implementation in java.util.Random of
>> this method.  That led me to notice that there are no unit tests for
>> BitstreamGenerator.  I propose that we add
>>  0) RandomGeneratorAbstractTest with an abstract makeGenerator
>> method including fixed seed tests for all RandomGenerator methods
>>  1) BitstreamGeneratorTest extending RandomGeneratorAbstractTest
>> implementing makeGenerator with a BitStreamGenerator that uses the
>> JDK generator for next(int)
>>  2) Make the test classes for Mersenne and Weil generators extend
>> RandomGeneratorAbstractTest, moving redundant tests up into the base
>> class
>>
>> Sound reasonable?
> +1
>
>> Also, any recollection why we are using a
>> different implementation in BitStreamGenerator for next(int) than
>> Harmony and the JDK use?
> I don't understand what you mean. next(int) is used to generate the raw
> bits and is the heart of each generator. Each generator has its own
> implementation. Replacing next(int) by the JDK generation would imply
> dropping completely Mersenne twister and Well generators.

I am sorry.  I meant nextInt(int).  It is that code that seems to be
slow in BitStreamGenerator and different from the JDK and Harmony.

Phil
>
> Mersenne twister and Well should be fast for generating large sets, but
> most importantly they have very good and *proven* properties (equidistribution
> on large dimensions, null correlation, maximal period ...). These properties
> are essential for example in Monte-Carlo simulations with lots of variables 
> that
> must be independent or have controlled correlations.
>
> Luc
>
>> The Harmony impl is almost identical to
>> what is documented in the JDK javadoc.
>>
>> Phil
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
>> For additional commands, e-mail: dev-h...@commons.apache.org
>>
>>
> -
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>
>


-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [math] Improving tests and performance of RandomGenerator implementations

2011-08-01 Thread luc . maisonobe
Hi Phil,

- Mail original -
> In my own applications, I noticed what appears to be poor
> performance in the nextInt(int) method of the Mersenne twister,
> which I was using to *improve* speed.  I think that for small n, the
> default implementation in BistreamGenerator may be running too many
> iterations.

Mersenne twister uses a quite large pool. It creates pseudo-random bits
by twisting it and creates large bunches at a time (624 words at a time).
Hence when you ask for large sets, you should have several calls that
return fast, and one call that takes a longer time to generate another
large pool.

So good performances are obtained for generating large sets, not small sets.

Well generators should be faster and are preferred over Mersenne twister now,
which is now an old generator. Well generators also have large pools, but they
don't generate bits in large batches in advance, they do generate a few words
at a time.

> I am still figuring out how the code works, but I
> thought it would be good to run some benchmarks - using Gilles' new
> stuff - against the Harmony implementation in java.util.Random of
> this method.  That led me to notice that there are no unit tests for
> BitstreamGenerator.  I propose that we add
>  0) RandomGeneratorAbstractTest with an abstract makeGenerator
> method including fixed seed tests for all RandomGenerator methods
>  1) BitstreamGeneratorTest extending RandomGeneratorAbstractTest
> implementing makeGenerator with a BitStreamGenerator that uses the
> JDK generator for next(int)
>  2) Make the test classes for Mersenne and Weil generators extend
> RandomGeneratorAbstractTest, moving redundant tests up into the base
> class
> 
> Sound reasonable?

+1

> Also, any recollection why we are using a
> different implementation in BitStreamGenerator for next(int) than
> Harmony and the JDK use?

I don't understand what you mean. next(int) is used to generate the raw
bits and is the heart of each generator. Each generator has its own
implementation. Replacing next(int) by the JDK generation would imply
dropping completely Mersenne twister and Well generators.

Mersenne twister and Well should be fast for generating large sets, but
most importantly they have very good and *proven* properties (equidistribution
on large dimensions, null correlation, maximal period ...). These properties
are essential for example in Monte-Carlo simulations with lots of variables that
must be independent or have controlled correlations.

Luc

> The Harmony impl is almost identical to
> what is documented in the JDK javadoc.
> 
> Phil
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
> 
> 

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



[math] Improving tests and performance of RandomGenerator implementations

2011-07-31 Thread Phil Steitz
In my own applications, I noticed what appears to be poor
performance in the nextInt(int) method of the Mersenne twister,
which I was using to *improve* speed.  I think that for small n, the
default implementation in BistreamGenerator may be running too many
iterations.  I am still figuring out how the code works, but I
thought it would be good to run some benchmarks - using Gilles' new
stuff - against the Harmony implementation in java.util.Random of
this method.  That led me to notice that there are no unit tests for
BitstreamGenerator.  I propose that we add
 0) RandomGeneratorAbstractTest with an abstract makeGenerator
method including fixed seed tests for all RandomGenerator methods
 1) BitstreamGeneratorTest extending RandomGeneratorAbstractTest
implementing makeGenerator with a BitStreamGenerator that uses the
JDK generator for next(int)
 2) Make the test classes for Mersenne and Weil generators extend
RandomGeneratorAbstractTest, moving redundant tests up into the base
class

Sound reasonable?  Also, any recollection why we are using a
different implementation in BitStreamGenerator for next(int) than
Harmony and the JDK use?  The Harmony impl is almost identical to
what is documented in the JDK javadoc.

Phil

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org