Re: [PATCH, fortran/52879] RANDOM_SEED revisited

2014-02-10 Thread Steve Kargl
On Mon, Feb 10, 2014 at 05:23:39PM +0200, Janne Blomqvist wrote:
> On Sun, Feb 9, 2014 at 2:40 AM, Steve Kargl
>  wrote:
> > All,
> >
> > Here is a potentially contentious patch for PR fortran/52879.
> >
> > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52879
> >
> > As demonstrated in that PR, it is possible to provide RANDOM_SEED
> > with sets of seeds that result in a very poor sequences of PRN.
> > Gfortran's RANDOM_NUMBER uses 3 independent KISS PRNG to generate
> > the bits of the significands of the real PRN.  Each KISS PRNG
> > requires 4 seeds.  This patch removes the need for a user to
> > specify 12 seeds.  It uses a Lehmer linear congruent generator
> > to generate the 12 KISS seeds.  This LCG requires a single seed.
> > I have selected to have seed(1)=0 correspond to the current
> > default set of KISS seeds.  Internally, the LCG declares the
> > seed as a GFC_UINTEGER_8 (ie., uint64_t) type, so for gfortran's
> > default integer kind one has 2**31-1 possible seeds and if one
> > use -fdefault-integer-8 then one has 2**32 possible seeds.
> 
> I like this approach, as it would make the seeding a bit more
> fool-proof and easier to use. Per se, loss of the ability to retrieve
> or update parts of the internal state of the RNG seems not terribly
> relevant, after all users who might be interested in that are most
> likely not going to be satisfied with the black box RNG provided by
> the language anyway.
> 
> That being said, I wonder whether this conforms to the standard. From N1830:
> 
> "The pseudorandom number generator used by RANDOM NUMBER maintains a
> seed that is updated during the execution of RANDOM NUMBER and that
> may be specified or returned by RANDOM SEED."
> 
> and
> 
> "
> For example, after execution of the statements
> 
>  CALL RANDOM_SEED (PUT=SEED1)
>  CALL RANDOM_SEED (GET=SEED2)
>  CALL RANDOM_NUMBER (X1)
>  CALL RANDOM_SEED (PUT=SEED2)
>  CALL RANDOM_NUMBER (X2)
> 
> X2 equals X1.
> "
> 
> That would imply that GET= and PUT= save/restore the complete internal
> state of the PRNG, which seems incompatible with using the simple LCG
> to generate the seed (if I understood this correctly, I agree with
> Nick that the interface is badly designed, but what can you do..).

I suppose at one time I knew about the above requirement.  The
simple approach I proposed won't work as is.  A possible work-around
(and it is ugly) would be to keep an internal count of the number
of times that a random number is drawn.  Call this count CNT.  Then,
on reseeding, random_seed would burn CNT draws and reset CNT to zero.
In thinking about it, we could make seed(1)= and seed(2)=CNT.
This would restrict CNT to 2**31-1 before rolling over.  If CNT is
internal to random_seed(), it can be a uint64_t giving 2**64 before
rolling over.  This certainly has some performance impact on any code
that calls random_seed numerous times.

> Or maybe it would be possible to get around this requirement by
> requiring the 12 seeds, but then calculate some kind of entropy value
> of them, and if the entropy is too low (say, all values are the same,
> or only the first value is non-zero) then use the Lehmer LCG,
> otherwise set the state directly? Sounds brittle, though..

Interesting idea.  I can't remember.  Is it possible to issue
a runtime warning from within libgfortran?  Is so, we could do

   seed2 = [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
   call random_seed(put=seed2)

Runtime Warning: poor quality seeds detected in random_seed().
Seeds reset to PUT=[].

I'll think about a possible algorithm.

-- 
Steve


Re: [PATCH, fortran/52879] RANDOM_SEED revisited

2014-02-10 Thread Janne Blomqvist
On Sun, Feb 9, 2014 at 2:40 AM, Steve Kargl
 wrote:
> All,
>
> Here is a potentially contentious patch for PR fortran/52879.
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52879
>
> As demonstrated in that PR, it is possible to provide RANDOM_SEED
> with sets of seeds that result in a very poor sequences of PRN.
> Gfortran's RANDOM_NUMBER uses 3 independent KISS PRNG to generate
> the bits of the significands of the real PRN.  Each KISS PRNG
> requires 4 seeds.  This patch removes the need for a user to
> specify 12 seeds.  It uses a Lehmer linear congruent generator
> to generate the 12 KISS seeds.  This LCG requires a single seed.
> I have selected to have seed(1)=0 correspond to the current
> default set of KISS seeds.  Internally, the LCG declares the
> seed as a GFC_UINTEGER_8 (ie., uint64_t) type, so for gfortran's
> default integer kind one has 2**31-1 possible seeds and if one
> use -fdefault-integer-8 then one has 2**32 possible seeds.

I like this approach, as it would make the seeding a bit more
fool-proof and easier to use. Per se, loss of the ability to retrieve
or update parts of the internal state of the RNG seems not terribly
relevant, after all users who might be interested in that are most
likely not going to be satisfied with the black box RNG provided by
the language anyway.

That being said, I wonder whether this conforms to the standard. From N1830:

"The pseudorandom number generator used by RANDOM NUMBER maintains a
seed that is updated during the execution of RANDOM NUMBER and that
may be specified or returned by RANDOM SEED."

and

"
For example, after execution of the statements

 CALL RANDOM_SEED (PUT=SEED1)
 CALL RANDOM_SEED (GET=SEED2)
 CALL RANDOM_NUMBER (X1)
 CALL RANDOM_SEED (PUT=SEED2)
 CALL RANDOM_NUMBER (X2)

X2 equals X1.
"

That would imply that GET= and PUT= save/restore the complete internal
state of the PRNG, which seems incompatible with using the simple LCG
to generate the seed (if I understood this correctly, I agree with
Nick that the interface is badly designed, but what can you do..).

Or maybe it would be possible to get around this requirement by
requiring the 12 seeds, but then calculate some kind of entropy value
of them, and if the entropy is too low (say, all values are the same,
or only the first value is non-zero) then use the Lehmer LCG,
otherwise set the state directly? Sounds brittle, though..

> It is possible to extend the patch to allow a skip count such that
> for example seed(2)=10 would use every tenth value in the LCG
> sequence. I haven't pursued this as it appears to be an unneeded
> complication.

Yes, this seems unnecessary.


-- 
Janne Blomqvist


Re: [PATCH, fortran/52879] RANDOM_SEED revisited

2014-02-09 Thread Steve Kargl
On Sun, Feb 09, 2014 at 03:31:09PM +0100, Dominique Dhumieres wrote:
> Steve,
> 
> First, it is needed to remove the line
> 
>   mpz_t put_size, get_size;
> 
> in gcc/fortran/check.c, otherwise compiling the file with -Werror
> (default) fails.

OK.

> Second, the patch breaks the interface of RANDOM_SEED(GET/PUT...)
> as shown for gfortran.dg/random_seed_1.f90 and the original test
> in pr52879. This likely makes the patch stage 1 material.

Technically, it doesn't change the interface as SIZE should
be used to determine the shape of GET/PUT.  random_seed_1.f90
was specifically crafted with the assumption that SIZE=12.
But, yes, I know gfortran users will have hard coded 12 into
their codes.

As far as stage 1, it was low hanging fruit and I had a little time
to look at a gfortran PR.

> AFAIU my tests in pr52879, comment 1, the generated sequence for real
> does not depend on the four first integers. From a user point of view,
> this surprising (may be undocumented, I did not check).

I could not remember what your comment 1 was trying to convey.  But.
yes the 12 seeds were originally split into 3 groups of 4.  The first
set went to the first 32-bits, the second set was used for the next
32-bits, and the last 4 give the next 49 bits. Then, ...

> So I think the seed generation should keep the present properties
> and only add a way to randomize the 12 integers.

FX added the scramble_seed() and unscramble_seed() functions to 
try to randomize the seeds.  But, as shown in comment 0, it is
possible to give random_seed() a set of seeds that defeats FX's
scrambling.  The scrambling still yields the grouping as described 
above.  This gives the nice property of

program rnd
   real(4)  a04
   real(8)  a08
   real(10) a10
   real(16) a16
   call random_seed() ; call random_number(a04) ; print *, a04
   call random_seed() ; call random_number(a08) ; print *, a08
   call random_seed() ; call random_number(a10) ; print *, a10
   call random_seed() ; call random_number(a16) ; print *, a16
end program rnd
% gfc4x -o z -O rnd1.f90 && ./z
  0.997559547
  0.99755959009261719 
  0.997559590092617187729  
  0.997559590092617200441811501464004185  

To go beyond the scrambling that FX gave us, I think would require
a caching the user defined seeds (so GET will recover PUT) and
much more complicated mashing up of 12 numbers.

-- 
Steve


Re: [PATCH, fortran/52879] RANDOM_SEED revisited

2014-02-09 Thread Dominique Dhumieres
Steve,

First, it is needed to remove the line

  mpz_t put_size, get_size;

in gcc/fortran/check.c, otherwise compiling the file with -Werror
(default) fails.

Second, the patch breaks the interface of RANDOM_SEED(GET/PUT...)
as shown for gfortran.dg/random_seed_1.f90 and the original test
in pr52879. This likely makes the patch stage 1 material.

AFAIU my tests in pr52879, comment 1, the generated sequence for real
does not depend on the four first integers. From a user point of view,
this surprising (may be undocumented, I did not check).

So I think the seed generation should keep the present properties
and only add a way to randomize the 12 integers.

Thanks for working on it.

Dominique


[PATCH, fortran/52879] RANDOM_SEED revisited

2014-02-08 Thread Steve Kargl
All,

Here is a potentially contentious patch for PR fortran/52879.

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52879

As demonstrated in that PR, it is possible to provide RANDOM_SEED
with sets of seeds that result in a very poor sequences of PRN.
Gfortran's RANDOM_NUMBER uses 3 independent KISS PRNG to generate
the bits of the significands of the real PRN.  Each KISS PRNG
requires 4 seeds.  This patch removes the need for a user to
specify 12 seeds.  It uses a Lehmer linear congruent generator
to generate the 12 KISS seeds.  This LCG requires a single seed.
I have selected to have seed(1)=0 correspond to the current
default set of KISS seeds.  Internally, the LCG declares the 
seed as a GFC_UINTEGER_8 (ie., uint64_t) type, so for gfortran's
default integer kind one has 2**31-1 possible seeds and if one
use -fdefault-integer-8 then one has 2**32 possible seeds.

It is possible to extend the patch to allow a skip count such that
for example seed(2)=10 would use every tenth value in the LCG
sequence. I haven't pursued this as it appears to be an unneeded
complication.

The only testsuite failure is for gfortran.dg/random_seed_1.f90,
which is a test designed to specifically test for the 12 KISS
seeds.  This test will be removed if the patch is OK.


-- 
Steve


Index: gcc/fortran/check.c
===
--- gcc/fortran/check.c (revision 207633)
+++ gcc/fortran/check.c (working copy)
@@ -4925,16 +4925,9 @@ gfc_check_random_number (gfc_expr *harve
 bool
 gfc_check_random_seed (gfc_expr *size, gfc_expr *put, gfc_expr *get)
 {
-  unsigned int nargs = 0, kiss_size;
+  unsigned int nargs = 0;
   locus *where = NULL;
   mpz_t put_size, get_size;
-  bool have_gfc_real_16; /* Try and mimic HAVE_GFC_REAL_16 in libgfortran.  */
-
-  have_gfc_real_16 = gfc_validate_kind (BT_REAL, 16, true) != -1;
-
-  /* Keep the number of bytes in sync with kiss_size in
- libgfortran/intrinsics/random.c.  */
-  kiss_size = (have_gfc_real_16 ? 48 : 32) / gfc_default_integer_kind;
 
   if (size != NULL)
 {
@@ -4975,13 +4968,6 @@ gfc_check_random_seed (gfc_expr *size, g
 
   if (!kind_value_check (put, 1, gfc_default_integer_kind))
return false;
-
-  if (gfc_array_size (put, &put_size)
- && mpz_get_ui (put_size) < kiss_size)
-   gfc_error ("Size of '%s' argument of '%s' intrinsic at %L "
-  "too small (%i/%i)",
-  gfc_current_intrinsic_arg[1]->name, gfc_current_intrinsic,
-  where, (int) mpz_get_ui (put_size), kiss_size);
 }
 
   if (get != NULL)
@@ -5007,13 +4993,6 @@ gfc_check_random_seed (gfc_expr *size, g
 
   if (!kind_value_check (get, 2, gfc_default_integer_kind))
return false;
-
-   if (gfc_array_size (get, &get_size)
- && mpz_get_ui (get_size) < kiss_size)
-   gfc_error ("Size of '%s' argument of '%s' intrinsic at %L "
-  "too small (%i/%i)",
-  gfc_current_intrinsic_arg[2]->name, gfc_current_intrinsic,
-  where, (int) mpz_get_ui (get_size), kiss_size);
 }
 
   /* RANDOM_SEED may not have more than one non-optional argument.  */
Index: libgfortran/intrinsics/random.c
===
--- libgfortran/intrinsics/random.c (revision 207633)
+++ libgfortran/intrinsics/random.c (working copy)
@@ -224,9 +224,20 @@ KISS algorithm.  */
z=0,c=0 and z=2^32-1,c=698769068
should be avoided.  */
 
-/* Any modifications to the seeds that change kiss_size below need to be
-   reflected in check.c (gfc_check_random_seed) to enable correct
-   compile-time checking of PUT size for the RANDOM_SEED intrinsic.  */
+/* The 3 KISS generators require 3 sets for 4 seeds.  It is possible for a
+   a user to specify a set of seeds that is inadequate for reseeding the
+   generators.  For example, change only the value of one of the 12 seeds
+   may not reset the PRNG sequences.  To work-around this issue, use a 
+   simple Lehmer linear congruential generator that requires only a single
+   seed to generate the 12 KISS seeds.  */
+
+static const GFC_INTEGER_4 lehmer_lcg_size = 1;
+static GFC_UINTEGER_8 lehmer_lcg_seed = 0;
+static GFC_UINTEGER_4
+lehmer_lcg(GFC_UINTEGER_4 a)
+{
+return ((GFC_UINTEGER_8)a * 279470273UL) % 4294967291UL;
+}
 
 #define KISS_DEFAULT_SEED_1 123456789, 362436069, 521288629, 316191069
 #define KISS_DEFAULT_SEED_2 987654321, 458629013, 582859209, 438195021
@@ -636,28 +647,6 @@ arandom_r16 (gfc_array_r16 *x)
 #endif
 
 
-
-static void
-scramble_seed (unsigned char *dest, unsigned char *src, int size)
-{
-  int i;
-
-  for (i = 0; i < size; i++)
-dest[(i % 2) * (size / 2) + i / 2] = src[i];
-}
-
-
-static void
-unscramble_seed (unsigned char *dest, unsigned char *src, int size)
-{
-  int i;
-
-  for (i = 0; i < size; i++)
-dest[i] = src[(i % 2) * (size / 2) + i / 2];
-}
-
-
-
 /* random_seed is used to seed the PRNG with either a default