Re: [PATCH, fortran/52879] RANDOM_SEED revisited
On Mon, Feb 10, 2014 at 05:23:39PM +0200, Janne Blomqvist wrote: > On Sun, Feb 9, 2014 at 2:40 AM, Steve Kargl > wrote: > > All, > > > > Here is a potentially contentious patch for PR fortran/52879. > > > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52879 > > > > As demonstrated in that PR, it is possible to provide RANDOM_SEED > > with sets of seeds that result in a very poor sequences of PRN. > > Gfortran's RANDOM_NUMBER uses 3 independent KISS PRNG to generate > > the bits of the significands of the real PRN. Each KISS PRNG > > requires 4 seeds. This patch removes the need for a user to > > specify 12 seeds. It uses a Lehmer linear congruent generator > > to generate the 12 KISS seeds. This LCG requires a single seed. > > I have selected to have seed(1)=0 correspond to the current > > default set of KISS seeds. Internally, the LCG declares the > > seed as a GFC_UINTEGER_8 (ie., uint64_t) type, so for gfortran's > > default integer kind one has 2**31-1 possible seeds and if one > > use -fdefault-integer-8 then one has 2**32 possible seeds. > > I like this approach, as it would make the seeding a bit more > fool-proof and easier to use. Per se, loss of the ability to retrieve > or update parts of the internal state of the RNG seems not terribly > relevant, after all users who might be interested in that are most > likely not going to be satisfied with the black box RNG provided by > the language anyway. > > That being said, I wonder whether this conforms to the standard. From N1830: > > "The pseudorandom number generator used by RANDOM NUMBER maintains a > seed that is updated during the execution of RANDOM NUMBER and that > may be specified or returned by RANDOM SEED." > > and > > " > For example, after execution of the statements > > CALL RANDOM_SEED (PUT=SEED1) > CALL RANDOM_SEED (GET=SEED2) > CALL RANDOM_NUMBER (X1) > CALL RANDOM_SEED (PUT=SEED2) > CALL RANDOM_NUMBER (X2) > > X2 equals X1. > " > > That would imply that GET= and PUT= save/restore the complete internal > state of the PRNG, which seems incompatible with using the simple LCG > to generate the seed (if I understood this correctly, I agree with > Nick that the interface is badly designed, but what can you do..). I suppose at one time I knew about the above requirement. The simple approach I proposed won't work as is. A possible work-around (and it is ugly) would be to keep an internal count of the number of times that a random number is drawn. Call this count CNT. Then, on reseeding, random_seed would burn CNT draws and reset CNT to zero. In thinking about it, we could make seed(1)= and seed(2)=CNT. This would restrict CNT to 2**31-1 before rolling over. If CNT is internal to random_seed(), it can be a uint64_t giving 2**64 before rolling over. This certainly has some performance impact on any code that calls random_seed numerous times. > Or maybe it would be possible to get around this requirement by > requiring the 12 seeds, but then calculate some kind of entropy value > of them, and if the entropy is too low (say, all values are the same, > or only the first value is non-zero) then use the Lehmer LCG, > otherwise set the state directly? Sounds brittle, though.. Interesting idea. I can't remember. Is it possible to issue a runtime warning from within libgfortran? Is so, we could do seed2 = [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] call random_seed(put=seed2) Runtime Warning: poor quality seeds detected in random_seed(). Seeds reset to PUT=[]. I'll think about a possible algorithm. -- Steve
Re: [PATCH, fortran/52879] RANDOM_SEED revisited
On Sun, Feb 9, 2014 at 2:40 AM, Steve Kargl wrote: > All, > > Here is a potentially contentious patch for PR fortran/52879. > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52879 > > As demonstrated in that PR, it is possible to provide RANDOM_SEED > with sets of seeds that result in a very poor sequences of PRN. > Gfortran's RANDOM_NUMBER uses 3 independent KISS PRNG to generate > the bits of the significands of the real PRN. Each KISS PRNG > requires 4 seeds. This patch removes the need for a user to > specify 12 seeds. It uses a Lehmer linear congruent generator > to generate the 12 KISS seeds. This LCG requires a single seed. > I have selected to have seed(1)=0 correspond to the current > default set of KISS seeds. Internally, the LCG declares the > seed as a GFC_UINTEGER_8 (ie., uint64_t) type, so for gfortran's > default integer kind one has 2**31-1 possible seeds and if one > use -fdefault-integer-8 then one has 2**32 possible seeds. I like this approach, as it would make the seeding a bit more fool-proof and easier to use. Per se, loss of the ability to retrieve or update parts of the internal state of the RNG seems not terribly relevant, after all users who might be interested in that are most likely not going to be satisfied with the black box RNG provided by the language anyway. That being said, I wonder whether this conforms to the standard. From N1830: "The pseudorandom number generator used by RANDOM NUMBER maintains a seed that is updated during the execution of RANDOM NUMBER and that may be specified or returned by RANDOM SEED." and " For example, after execution of the statements CALL RANDOM_SEED (PUT=SEED1) CALL RANDOM_SEED (GET=SEED2) CALL RANDOM_NUMBER (X1) CALL RANDOM_SEED (PUT=SEED2) CALL RANDOM_NUMBER (X2) X2 equals X1. " That would imply that GET= and PUT= save/restore the complete internal state of the PRNG, which seems incompatible with using the simple LCG to generate the seed (if I understood this correctly, I agree with Nick that the interface is badly designed, but what can you do..). Or maybe it would be possible to get around this requirement by requiring the 12 seeds, but then calculate some kind of entropy value of them, and if the entropy is too low (say, all values are the same, or only the first value is non-zero) then use the Lehmer LCG, otherwise set the state directly? Sounds brittle, though.. > It is possible to extend the patch to allow a skip count such that > for example seed(2)=10 would use every tenth value in the LCG > sequence. I haven't pursued this as it appears to be an unneeded > complication. Yes, this seems unnecessary. -- Janne Blomqvist
Re: [PATCH, fortran/52879] RANDOM_SEED revisited
On Sun, Feb 09, 2014 at 03:31:09PM +0100, Dominique Dhumieres wrote: > Steve, > > First, it is needed to remove the line > > mpz_t put_size, get_size; > > in gcc/fortran/check.c, otherwise compiling the file with -Werror > (default) fails. OK. > Second, the patch breaks the interface of RANDOM_SEED(GET/PUT...) > as shown for gfortran.dg/random_seed_1.f90 and the original test > in pr52879. This likely makes the patch stage 1 material. Technically, it doesn't change the interface as SIZE should be used to determine the shape of GET/PUT. random_seed_1.f90 was specifically crafted with the assumption that SIZE=12. But, yes, I know gfortran users will have hard coded 12 into their codes. As far as stage 1, it was low hanging fruit and I had a little time to look at a gfortran PR. > AFAIU my tests in pr52879, comment 1, the generated sequence for real > does not depend on the four first integers. From a user point of view, > this surprising (may be undocumented, I did not check). I could not remember what your comment 1 was trying to convey. But. yes the 12 seeds were originally split into 3 groups of 4. The first set went to the first 32-bits, the second set was used for the next 32-bits, and the last 4 give the next 49 bits. Then, ... > So I think the seed generation should keep the present properties > and only add a way to randomize the 12 integers. FX added the scramble_seed() and unscramble_seed() functions to try to randomize the seeds. But, as shown in comment 0, it is possible to give random_seed() a set of seeds that defeats FX's scrambling. The scrambling still yields the grouping as described above. This gives the nice property of program rnd real(4) a04 real(8) a08 real(10) a10 real(16) a16 call random_seed() ; call random_number(a04) ; print *, a04 call random_seed() ; call random_number(a08) ; print *, a08 call random_seed() ; call random_number(a10) ; print *, a10 call random_seed() ; call random_number(a16) ; print *, a16 end program rnd % gfc4x -o z -O rnd1.f90 && ./z 0.997559547 0.99755959009261719 0.997559590092617187729 0.997559590092617200441811501464004185 To go beyond the scrambling that FX gave us, I think would require a caching the user defined seeds (so GET will recover PUT) and much more complicated mashing up of 12 numbers. -- Steve
Re: [PATCH, fortran/52879] RANDOM_SEED revisited
Steve, First, it is needed to remove the line mpz_t put_size, get_size; in gcc/fortran/check.c, otherwise compiling the file with -Werror (default) fails. Second, the patch breaks the interface of RANDOM_SEED(GET/PUT...) as shown for gfortran.dg/random_seed_1.f90 and the original test in pr52879. This likely makes the patch stage 1 material. AFAIU my tests in pr52879, comment 1, the generated sequence for real does not depend on the four first integers. From a user point of view, this surprising (may be undocumented, I did not check). So I think the seed generation should keep the present properties and only add a way to randomize the 12 integers. Thanks for working on it. Dominique
[PATCH, fortran/52879] RANDOM_SEED revisited
All, Here is a potentially contentious patch for PR fortran/52879. http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52879 As demonstrated in that PR, it is possible to provide RANDOM_SEED with sets of seeds that result in a very poor sequences of PRN. Gfortran's RANDOM_NUMBER uses 3 independent KISS PRNG to generate the bits of the significands of the real PRN. Each KISS PRNG requires 4 seeds. This patch removes the need for a user to specify 12 seeds. It uses a Lehmer linear congruent generator to generate the 12 KISS seeds. This LCG requires a single seed. I have selected to have seed(1)=0 correspond to the current default set of KISS seeds. Internally, the LCG declares the seed as a GFC_UINTEGER_8 (ie., uint64_t) type, so for gfortran's default integer kind one has 2**31-1 possible seeds and if one use -fdefault-integer-8 then one has 2**32 possible seeds. It is possible to extend the patch to allow a skip count such that for example seed(2)=10 would use every tenth value in the LCG sequence. I haven't pursued this as it appears to be an unneeded complication. The only testsuite failure is for gfortran.dg/random_seed_1.f90, which is a test designed to specifically test for the 12 KISS seeds. This test will be removed if the patch is OK. -- Steve Index: gcc/fortran/check.c === --- gcc/fortran/check.c (revision 207633) +++ gcc/fortran/check.c (working copy) @@ -4925,16 +4925,9 @@ gfc_check_random_number (gfc_expr *harve bool gfc_check_random_seed (gfc_expr *size, gfc_expr *put, gfc_expr *get) { - unsigned int nargs = 0, kiss_size; + unsigned int nargs = 0; locus *where = NULL; mpz_t put_size, get_size; - bool have_gfc_real_16; /* Try and mimic HAVE_GFC_REAL_16 in libgfortran. */ - - have_gfc_real_16 = gfc_validate_kind (BT_REAL, 16, true) != -1; - - /* Keep the number of bytes in sync with kiss_size in - libgfortran/intrinsics/random.c. */ - kiss_size = (have_gfc_real_16 ? 48 : 32) / gfc_default_integer_kind; if (size != NULL) { @@ -4975,13 +4968,6 @@ gfc_check_random_seed (gfc_expr *size, g if (!kind_value_check (put, 1, gfc_default_integer_kind)) return false; - - if (gfc_array_size (put, &put_size) - && mpz_get_ui (put_size) < kiss_size) - gfc_error ("Size of '%s' argument of '%s' intrinsic at %L " - "too small (%i/%i)", - gfc_current_intrinsic_arg[1]->name, gfc_current_intrinsic, - where, (int) mpz_get_ui (put_size), kiss_size); } if (get != NULL) @@ -5007,13 +4993,6 @@ gfc_check_random_seed (gfc_expr *size, g if (!kind_value_check (get, 2, gfc_default_integer_kind)) return false; - - if (gfc_array_size (get, &get_size) - && mpz_get_ui (get_size) < kiss_size) - gfc_error ("Size of '%s' argument of '%s' intrinsic at %L " - "too small (%i/%i)", - gfc_current_intrinsic_arg[2]->name, gfc_current_intrinsic, - where, (int) mpz_get_ui (get_size), kiss_size); } /* RANDOM_SEED may not have more than one non-optional argument. */ Index: libgfortran/intrinsics/random.c === --- libgfortran/intrinsics/random.c (revision 207633) +++ libgfortran/intrinsics/random.c (working copy) @@ -224,9 +224,20 @@ KISS algorithm. */ z=0,c=0 and z=2^32-1,c=698769068 should be avoided. */ -/* Any modifications to the seeds that change kiss_size below need to be - reflected in check.c (gfc_check_random_seed) to enable correct - compile-time checking of PUT size for the RANDOM_SEED intrinsic. */ +/* The 3 KISS generators require 3 sets for 4 seeds. It is possible for a + a user to specify a set of seeds that is inadequate for reseeding the + generators. For example, change only the value of one of the 12 seeds + may not reset the PRNG sequences. To work-around this issue, use a + simple Lehmer linear congruential generator that requires only a single + seed to generate the 12 KISS seeds. */ + +static const GFC_INTEGER_4 lehmer_lcg_size = 1; +static GFC_UINTEGER_8 lehmer_lcg_seed = 0; +static GFC_UINTEGER_4 +lehmer_lcg(GFC_UINTEGER_4 a) +{ +return ((GFC_UINTEGER_8)a * 279470273UL) % 4294967291UL; +} #define KISS_DEFAULT_SEED_1 123456789, 362436069, 521288629, 316191069 #define KISS_DEFAULT_SEED_2 987654321, 458629013, 582859209, 438195021 @@ -636,28 +647,6 @@ arandom_r16 (gfc_array_r16 *x) #endif - -static void -scramble_seed (unsigned char *dest, unsigned char *src, int size) -{ - int i; - - for (i = 0; i < size; i++) -dest[(i % 2) * (size / 2) + i / 2] = src[i]; -} - - -static void -unscramble_seed (unsigned char *dest, unsigned char *src, int size) -{ - int i; - - for (i = 0; i < size; i++) -dest[i] = src[(i % 2) * (size / 2) + i / 2]; -} - - - /* random_seed is used to seed the PRNG with either a default