Re: [Python-Dev] random number generator state

2009-08-16 Thread Scott David Daniels

Raymond Hettinger wrote:

[Scott David Daniels]

I find I have a need in randomized testing for a shorter version
of getstate, even if it _is_ slower to restore.  [blah about big state]


Sounds like you could easily wrap the generator to get this.
It would slow you down but would give the information you want.

Well, I was thinking that this might be generally useful for randomized
testing.


I think it would be a mistake to complexify the API to accomodate
short states -- I'm not even sure than they are generally useful
(recording my initial seed and how many cycles I've run through
is only helpful for sequences short enough that I'm willing to rerun
them).

Right, that was what I was asking about.  The complexity of the change
grew on me; I hadn't realized at the outset it would be more than adding
a counter internally.  Consider me officially dissuaded.

I'm curious what your use case is.  Why not just record the the sequence 
as generated -- I don't see any analytic value to
just knowing the initial seed and cycle count. 

I'm building data structures controlled by an rng, and then performing
sequences of (again randomly controlled) operations on those data
structures, check all invariants at each step.  I then lather, rinse,
repeat recording the start of each failing experiment.  In the morning I
come in and look for commonality in the cases I see.  Having the short
state means I  means I can easily rebuild the data structure and command
list to see what is going on.  I prune commands, simplify the tree, and
thus isolate the problem I found.

I did enough experimenting to see that if I simply provide access to run
N cycles of the block, I can actually do 2**32 cycles in feasible time,
so I have a pair of counters, and the code should take long enough for
eternity to show up before the wrap.

My short state is:
seed, block_index, cycles_low, cycles_high, floating

(block_index + 625 * (cycles_low + (cycles_high  32)) is the position,
and could be done as such; the pieces reflect the least-expensive cost
in performance to the rng. floating is simply the same final floating
piece that the state keeps now.


Ability to print out a short state implies that you are using only a
small subset of possible states (i.e. the ones you can get to with
a short seed). 

Well, as you see above, I do capture the seed.  I realize that the time-
constructed seeds are distinct from identically provided values as small
ints, and I also mark when the rng gets called by set_state to indicate
that I then know nothing about the seed.
 mt[0] = 0x8000UL; /* MSB is 1; assuring non-zero initial array */
 but probably should be:
 mt[0] |= 0x8000UL; /* MSB is 1; assuring non-zero initial array*/
 Please file a bug report for this and assign to me  Also, our
 tests for MT exactly reproduce their published test sequence.

I've been assured it is not a bug, and I filed no report since I had 
just arrived at the point of suspicion.


To summarize, I am officially dissuaded, and will post a recipe if I
get something nice working.

--Scott David Daniels
scott.dani...@acm.org

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] random number generator state

2009-08-16 Thread Greg Ewing

Scott David Daniels wrote:


No, I don't really need MT.  The others would be fine.
I'd love further details.


The one I've been working with is due to Pierre L'Ecuyer [1]
and is known as MRG32k3a. It's a combined multiple recursive
linear congruential generator with 6 words of state. The
formulas are

r1[i] = (a12 * r1[i-2] + a13 * r1[i-3]) % m1
r2[i] = (a21 * r2[i-1] + a23 * r2[i-3]) % m2
r[i] = (r1[i] - r2[i]) * m1

where

m1 = 2**32 - 209
m2 = 2**32 - 22835

a12 = 1403580
a13 = -810728
a21 = 527612
a23 = -1370589

If you consider the state to be made up of two 3-word
state vectors, then there are two 3x3 matrices which
map a given state onto the next state. So to jump
ahead n steps in the sequence, you raise these matrices
to the power of n.

I've attached some code implementing this generator
together with the jumping-ahead. (Sorry it's in C++,
I hadn't discovered Python when I wrote it.)

[1] Pierre L'Ecuyer, Good Parameters and Implementations for
Combined Multiple Recursive Random Number Generators,
Operations Research v47 no1 Jan-Feb 1999
http://www.iro.umontreal.ca/~lecuyer/myftp/papers/combmrg2.ps

--
Greg
/*
 *   cmr_random_generator.C
 *   ==
 *
 *   Combined Multiple Recursive random number generator.
 *
 *   This is an implementation of Pierre L'Ecuyer's
 *   MRG32k3a generator, described in:
 *
 * Pierre L'Ecuyer, Good Parameters and Implementations for
 * Combined Multiple Recursive Random Number Generators,
 * Operations Research v47 no1 Jan-Feb 1999 
 */

#include cmr_random_generator.H

static const double
norm = 2.328306549295728e-10,
m1   = 4294967087.0,
m2   = 429493.0,
a12  =1403580.0,
a13  =-810728.0,
a21  = 527612.0,
a23  =   -1370589.0;

static double a[2][3][3] = {
  {
{0.0, 1.0, 0.0},
{0.0, 0.0, 1.0},
{a13, a12, 0.0}
  },
  {
{0.0, 1.0, 0.0},
{0.0, 0.0, 1.0},
{a23, 0.0, a21}
  }
};

static double m[2] = {
  m1,
  m2
};

static double init_s[2][3] = {
  {1.0, 1.0, 1.0},
  {1.0, 1.0, 1.0}
};

inline static double mod(double x, double m) {
  long k = (long)(x / m);
  x -= k * m;
  if (x  0.0)
x += m;
  return x;
}

/*
 *   Initialisation
 */

CMRRandomGenerator::CMRRandomGenerator() {
  for (int i = 0; i = 1; i++)
for (int j = 0; j = 2; j++)
  s[i][j] = init_s[i][j];
}

/*
 *   Advance CMRG one step and return next number
 */

double CMRRandomGenerator::Next() {
  double p1 = mod(a12 * s[0][1] + a13 * s[0][0], m1);
  s[0][0] = s[0][1];
  s[0][1] = s[0][2];
  s[0][2] = p1;
  double p2 = mod(a21 * s[1][2] + a23 * s[1][0], m2);
  s[1][0] = s[1][1];
  s[1][1] = s[1][2];
  s[1][2] = p2;
  double p = p1 - p2;
  if (p  0.0)
p += m1;
  return (p + 1) * norm;
}

typedef unsigned long long Int64;
typedef Int64 CMRG_Vector[3];
typedef Int64 CMRG_Matrix[3][3];

static Int64 ftoi(double x, double m) {
  if (x = 0.0)
return Int64(x);
  else
return Int64((long double)x + (long double)m);
}

static double itof(Int64 i, Int64 m) {
  return i;
}

static void v_ftoi(double u[], CMRG_Vector v, double m) {
  for (int i = 0; i = 2; i++)
v[i] = ftoi(u[i], m);
}

static void v_itof(CMRG_Vector u, double v[], Int64 m) {
  for (int i = 0; i = 2; i++)
v[i] = itof(u[i], m);
}

static void v_copy(CMRG_Vector u, CMRG_Vector v) {
  for (int i = 0; i = 2; i++)
v[i] = u[i];
}

static void m_ftoi(double a[][3], CMRG_Matrix b, double m) {
  for (int i = 0; i = 2; i++)
for (int j = 0; j = 2; j++)
  b[i][j] = ftoi(a[i][j], m);
}

static void m_copy(CMRG_Matrix a, CMRG_Matrix b) {
  for (int i = 0; i = 2; i++)
for (int j = 0; j = 2; j++)
  b[i][j] = a[i][j];
}

static void mv_mul(CMRG_Matrix a, CMRG_Vector u, CMRG_Vector v, Int64 m) {
  CMRG_Vector w;
  int i, j;
  for (i = 0; i = 2; i++) {
w[i] = 0;
for (j = 0; j = 2; j++)
  w[i] = (a[i][j] * u[j] + w[i]) % m;
  }
  v_copy(w, v);
}

static void mm_mul(CMRG_Matrix a, CMRG_Matrix b, CMRG_Matrix c, Int64 m) {
  CMRG_Matrix d;
  int i, j, k;
  for (i = 0; i = 2; i++) {
for (j = 0; j = 2; j++) {
  d[i][j] = 0;
  for (k = 0; k = 2; k++)
d[i][j] = (a[i][k] * b[k][j] + d[i][j]) % m;
}
  }
  m_copy(d, c);
}

/*
 *   Advance the CMRG by n*2^e steps
 */

void CMRRandomGenerator::Advance(unsigned long n, unsigned int e) {
  CMRG_Matrix B[2];
  CMRG_Vector S[2];
  Int64 M[2];
  int i;
  for (i = 0; i = 1; i++) {
m_ftoi(a[i], B[i], m[i]);
v_ftoi(s[i], S[i], m[i]);
M[i] = Int64(m[i]);
  }
  while (e--) {
for (i = 0; i = 1; i++)
  mm_mul(B[i], B[i], B[i], M[i]);
  }
  while (n) {
if (n  1)
  for (i = 0; i = 1; i++)
mv_mul(B[i], S[i], S[i], M[i]);
n = 1;
if (n)
  for (i = 0; i = 1; i++)
mm_mul(B[i], B[i], B[i], M[i]);
  }
  for (i = 0; i = 1; i++)
v_itof(S[i], s[i], M[i]);
}
/*
 *   cmr_random_generator.H
 *   ==
 *
 *   Combined Multiple Recursive random number generator.
 */

#ifndef 

Re: [Python-Dev] random number generator state

2009-08-15 Thread Raymond Hettinger


[Scott David Daniels]

I find I have a need in randomized testing for a shorter version
of getstate, even if it _is_ slower to restore.  When running
exhaustive tests, a failure report should show the start state
of the generator.  Unfortunately, our current state includes a
625-element array.  I want a state that can be read off a report
and typed in to reproduce the state.  Something a bit like the
initial seed, a count of cycle calls, and a few other things.


Sounds like you could easily wrap the generator to get this.
It would slow you down but would give the information you want.

I think it would be a mistake to complexify the API to accomodate
short states -- I'm not even sure than they are generally useful
(recording my initial seed and how many cycles I've run through
is only helpful for sequences short enough that I'm willing to rerun
them).

I'm curious what your use case is.  Why not just record the 
the sequence as generated -- I don't see any analytic value to
just knowing the initial seed and cycle count.  


Ability to print out a short state implies that you are using only a
small subset of possible states (i.e. the ones you can get to with
a short seed).  A short state print out isn't even possible if you actually
have a random initial state (every state having an equal chance of
being the starting point).




 In trying to
get this to work, I found what might be a bug:
code says
  mt[0] = 0x8000UL; /* MSB is 1; assuring non-zero initial array */
but probably should be:
  mt[0] |= 0x8000UL; /* MSB is 1; assuring non-zero initial array */


Please file a bug report for this and assign to me.  I put in the existing
MT code and took it directly from the author's published (and widely
tested code).  Also, our tests for MT exactly reproduce their published test
sequence.  But, if there is an error, I would be happy to fix it.




In checking into that issue, I went to the original Mersenne-Twister
code, and I see the original authors are pursuing a newer generator,
dSFMT.


The MT itself has the advantage of having been widely exercised and
tested.  The newer generator may have more states but has not been
as extensively tested.



I now have a dilemma.  Should I continue the work on the original M-T
code (which is now seeming problematic for compatibility) or simply make
a new generator with similar calls using dSFMT and put the new feature
in that where there is no compatibility problem.  Which would be more
useful for the Python community?


It's not hard to subclass Random and add different generators.  Why not
publish some code on ASPN and see how it gets received.  I've put a
recipe there for a long period generator, 
http://code.activestate.com/recipes/576707/ ,
but there doesn't seem to have been any real interest in generators with
longer periods than MT. 



Raymond


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] random number generator state

2009-08-15 Thread Mark Dickinson
On Sat, Aug 15, 2009 at 8:54 PM, Scott David
Danielsscott.dani...@acm.org wrote:
 [...] input to .setstate: old, new-short, and new-long.  In trying to
 get this to work, I found what might be a bug:
 code says
  mt[0] = 0x8000UL; /* MSB is 1; assuring non-zero initial array */
 but probably should be:
  mt[0] |= 0x8000UL; /* MSB is 1; assuring non-zero initial array */

I'm 92.3% sure that this isn't a bug.  For one thing, that line comes
directly from the authors' code[1], so if it's a bug then it's a bug in
the original code, dating from 2002;  this seems unlikely, given how
widely used and (presumably) well-scrutinized MT is.

For a more technical justification, the Mersenne Twister is based
on a linear transformation of a 19937-dimensional vector space
over F2, so its state naturally consists of 19937 bits of information,
which is 623 words plus one additional bit.  In this implementation,
that extra bit is the top bit of the first word;  the other 31 bits of that
first word shouldn't really be regarded as part of the state proper.
If you examine the genrand_int32 function in _randommodule.c,
you'll see that the low 31 bits of mt[0] play no role in updating the
state;  i.e., their value doesn't affect the new state.  So using
mt[0] |= 0x8000UL instead of mt[0] = 0x8000UL during
initialization should make no difference to the resulting stream of
random numbers (with the possible exception of the first random
number generated).

[1] http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/MT2002/CODES/mt19937ar.c

Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] random number generator state

2009-08-15 Thread Greg Ewing

Scott David Daniels wrote:

I find I have a need in randomized testing for a shorter version
of getstate, even if it _is_ slower to restore.  When running
exhaustive tests, a failure report should show the start state
of the generator.  Unfortunately, our current state includes a
625-element array.


Do you need to use the Mersenne Twister in particular
for this? There are other kinds of generator with very
long cycles and good statistical properties, that can
easily be restored to any state in constant time given
an initial state and a count.

Let me know if you're interested and I can give you
further details.

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com