On 9/20/10 6:30 AM, luc.maison...@free.fr wrote:
Hi all,
As those subscribed to the commit list may have noticed, I have
added several new pseudo-random number generators to
commons-math.
I should not have add such a feature without discussing on the
list before. It was a wrong move and I apologize about it. So I
would like to start a discussion about this now.
Chalk it up to network latency ;)
Up to 2.1, we have few pseudo random number generators. We have
an interface RandomGenerator implemented by three classes: -
JDKRandomGenerator that extends the JDK provided generator -
AbstractRandomGenerator as a helper for users generators -
BitStreamGenerator which in turn is extended only by
MersenneTwister
The JDK provided generator is a simple one that can be used only
for very simple needs. The Mersenne Twister is on the other hand
a fast generator with good properties well suited for Monte-Carlo
simulation. It is equidistributed for generating vectors up to
dimension 623 and has a huge period: 2^19937 - 1.
Since Mersenne-Twister inception in 1997, some new generators
have been created, retaining the good properties of Mersenne
twister but removing some of its (few) drawbacks. The main one is
that if initialized with a bits pool containing lots of zeroes,
the pool will take a very long time time to stabilize with a
roughly balanced number of zeros and ones.
I would like to add such generators (well, I already did but can
withdraw my commit). The ones I want to add are the WELL
generators (Well Equidistributed Long period Linear) created by
François Panneton, Pierre L'Ecuyer and Makoto Matsumoto. They are
described in their 2006 paper: Improved * Long-Period Generators
Based on Linear Recurrences Modulo 2, ransactions on Mathematical
Software, 32, 1 (2006) which is available
at<http://www.iro.umontreal.ca/~lecuyer/myftp/papers/wellrng.pdf>.
The paper describes several generators from one family. The most
interesting ones are WELL1024 (a small one), WELL19937 and
WELL44497 (two large ones with huge periods that are both
Mersenne primes). The two large ones exist in a basic version
that is not Maximally Equidistributed (i.e. equidistributed in
all possible dimensions and number of bits up to some threshold)
and also in an extended version that add some tempering to the
output to get an ME generator.
The paper does not put any restriction on the algorithm. The
reference implementation in C on the other hand is limited to
non-commercial use only. In my implementation, I did not refer to
this C implementation at all. I only compiled it on my personal
computer to generate reference values and copied the values in
the test cases. So this implementation is completely independent
(and in fact the code is rather different since I used an
abstract class for the global algorithm and one derived class for
each specific generator. I rely on the JVM optimizer to do all
the inlining and constants simplification that they did manually
in the reference implementation.
Sounds OK, but we should verify somehow that the algorithm itself is
not patented. Does the C source say anything about that? Might be
good to ask about this on legal-discuss or ask one of the authors.
I know this could apply to any of the algorithms we implement, but
the ones that are >100 yrs old are a little safer ;)
Another difference with the reference implementation is that our
BitStreamGenerator abstract class requires the generator to
provide only bits blocks and not double, and it computes the
double by itself. This computation uses the full IEEE754 width
for the doubles, i.e. it uses 53 bits. The generators are
guaranteed to be equidistributed on large vectors of 32 bits (up
to dimension 1390 for the 44497 version, since 1390 * 32<
44497). I guess this should work well also using chunks of 53
bits (but of course with smaller dimensions), but it is not
mathematically proven.
So what do other think about this ? Should this really be
included (in which case I will open a JIRA issue and resolve it
immediately) or should it be removed from commons-math ? Is the
independent implementation and the off-line reference data
generation sufficient to say this code is unencumbered ? Is the
53 bits use for double generation a good thing or should we
reduce it to 32 bits to stay within the proven behavior ?
+1 for including assuming all is OK with IP
Phil
Once again, sorry for this too late discussion, I really should
have started it before. Luc
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org