[ https://issues.apache.org/jira/browse/RNG-173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alex Herbert updated RNG-173: ----------------------------- Component/s: core > BaseProvider state filling procedure can be improved > ---------------------------------------------------- > > Key: RNG-173 > URL: https://issues.apache.org/jira/browse/RNG-173 > Project: Commons RNG > Issue Type: Improvement > Components: core > Reporter: Alex Herbert > Priority: Trivial > Fix For: 1.5 > > Time Spent: 20m > Remaining Estimate: 0h > > The BaseProvider has a method to fill in remaining state if the input seed is > too short. The fill uses existing seed values to fill the remaining. > The next state is created using: > {code:java} > long n = state[i - seed.length]; > state[i] = 1812433253L * (n ^ (n >> 30)) + i{code} > If the existing state is zero then the new state is i. When the input seed > has no length then the filled state is a natural sequence. > Here is a state of 10 filled from empty seeds of length 0 to 5: > {noformat} > 0: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] > 1: [0, 1, 1812433255, 3284914298392595265, 6102061520201954364, > -3308799481182342998, -3869692221293809580, -7101959917617921332, > 7986832403292652032, 8936067391732911773] > 2: [0, 0, 2, 3, 3624866510, 5437299764, 6569828598597623783, > -8592001180344199076, 1136775338421644002, 8717367692712810396] > 3: [0, 0, 0, 3, 4, 5, 5437299765, 7249733019, 9062166273, > -8592001182156632327] > 4: [0, 0, 0, 0, 4, 5, 6, 7, 7249733020, 9062166274] > 5: [0, 0, 0, 0, 0, 5, 6, 7, 8, 9] > {noformat} > When the seed is zero length or close to half the length of the desired state > and all zeros then the output state has a low number of non-zero bits. > Note: > This has little impact when using the Commons RNG simple module to create a > generator. The seed is produced to the correct length using a high quality > random source. > A second issue is that the method to fill the state is an instance method. > Since it uses no state it could be a static method. I would suggest a method > to convert a seed to the correct length: > {code:java} > protected static long[] ensureSeedLength(long[] seed, int length); {code} > This would allow classes that implement the following pattern: > {code:java} > MyRNG(long[] seed) { > if (seed.length < SEED_SIZE) { > final long[] state = new long[SEED_SIZE]; > fillState(state, seed); > setState(state); > } else { > setState(seed); > } > } {code} > To simplify to: > {code:java} > MyRNG(long[] seed) { > setState(ensureSeedLength(seed, SEED_SIZE)); > }{code} > h2. Compatibility > The user guide states: > {noformat} > upon initialization, the underlying generation algorithm > - may not use all the information contents of the seed, > - may use a procedure (using the given seed as input) for further filling its > internal state (in order to avoid a too uniform initial state). > In both cases, the behavior is not standard but should not change between > releases of the library (bugs notwithstanding).{noformat} > Since behaviour *should not change* it would rule out changes for existing > classes. New classes could use the new static version to fill state. > I would suggest providing a new method to ensure the input seed is a minimum > length. If the method seeds a SplitMix64 style generator with the first value > of the input seed (or zero if the seed length is zero) then the filled state > will be high quality. This type of generator only outputs zero once during > the period and so any seed length can be ensured to be non zero when it has > been expanded. An input seed of entirely zero values would be passed through > unchanged. This is the default *user beware* behaviour for full length zero > seeds. > A 32-bit variant can be created using a similar hashing function that outputs > only a single 0 in the period, for example MurmurHash3's 32-bit finaliser > function. > An example implementation for long values is: > {code:java} > private static final long GOLDEN_RATIO = 0x9e3779b97f4a7c15L > protected static long[] ensureSeedLength(long[] seed, int length) { > if (seed.length < length) { > final long[] s = Arrays.copyOf(seed, length); > // Fill the rest as if using a SplitMix64 RNG > long x = s[0]; > for (int i = seed.length; i < length; i++) { > s[i] = stafford13(x += GOLDEN_RATIO); > } > return s; > } > return seed; > } > private static long stafford13(long x) { > x = (x ^ (x >>> 30)) * 0xbf58476d1ce4e5b9L; > x = (x ^ (x >>> 27)) * 0x94d049bb133111ebL; > return x ^ (x >>> 31); > } > {code} > A 32-bit mix function for Murmur32 is: > {code:java} > private static int murmur32(int x) { > x = (x ^ (x >>> 16)) * 0x85ebca6b; > x = (x ^ (x >>> 13)) * 0xc2b2ae35; > return x ^ (x >>> 16); > }{code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)