This is a starter for a discussion on the split and jump functionality
for a random generator.
Split:
To create a new instance of the generator that is deterministically
based on the state of the current instance but the probability that the
sequence generated by the new instance and the current instance overlap
is negligible.
Jump:
To create a new instance of the generator that is deterministically
based on the state of the current instance but advanced a set number of
iterations.
Split:
This is an alternative to simply creating a new instance of the same
generator using a different seed, or using the generator to create a
seed then creating a new instance. In the later case this would advance
the current generator but a split does not have to (more on this later).
Assuming the state of a generator is a set of bits with no further
requirements then a split can be performed by a bit mixing algorithm. A
commonly used algorithm is the finalisation step of the MurmurHash3 of
Austin Appleby [1] which has variants to mix int and long values from
input int or long.
Most of the algorithms in Commons RNG should have a state applicable to
simple mixing. Notable exceptions are those that implement a controlled
seeding procedure: MersenneTwister; MersenneTwister64; ISAAC; and
TwoCmres. These have a state that is more controlled that just random
bits. In most cases the split could be implemented by mixing the state
to create a seed for a new instance, then running through the
initialisation procedure. There may not be much value in this (over just
forcing a user to create a new seeded instance) although making all the
generators Splittable makes an easier to use library.
A notable exception is the SplitMix64 algorithm. This implements
generation of long values identically to Java 8's SplittableRandom. The
sequence is generated using a linear series of long values that is mixed
to output random long values. The linear series is produced using a
default increment that maximises the period of the sequence. A simple
split implementation would just mix the current point in the series.
However Java 8's SplittableRandom also mixes the increment constant
(using a MurmurHash3 mix step). The constant is then forced to be odd
and contain a high number of bit state transitions (i.e. 0s to 1s or 1s
to 0s, the maximum for a long is 32) [2]. So should the Commons RNG
class match the implementation of SplittableRandom? This may have
licensing issues. Note: A side-effect of requiring two numbers for the
split instance is the current instance is advanced.
Given there are no decisions to be made by the user when splitting a
suggestion for an API would be a new interface akin to
RestorableUniformRandomProvider:
package org.apache.commons.rng;
public interface SplittableUniformRandomProvider extends
UniformRandomProvider {
SplittableUniformRandomProvider split();
}
An implementation detail:
Q. Should a split advance the state of the current generator? This
allows successive split() calls from the same instance to create new
variants.
I would vote yes. It will prevent misuse where multiple threads are
created that each have an instance provided by a call to split() on the
same source generator that is not otherwise advanced.
A user can always save the state of any of the generators in the library
and restore it to the state before the call to split if they so desire.
Jump:
This requires a generator that has a predictable state at some point in
the future given the current state. The XorShiRo family have jump
functions computed by the original authors. The jump size in power of 2
is provided. The SplitMix is a simple linear series and so can be jumped
any distance. Jumps functions are available for:
XOR_SHIFT_1024_S & XOR_SHIFT_1024_S_PHI (512)
XO_SHI_RO_128_PLUS & XO_SHI_RO_128_SS (64)
XO_RO_SHI_RO_128_PLUS & XO_RO_SHI_RO_128_SS (64, 96)
XO_SHI_RO_256_PLUS & XO_SHI_RO_256_SS (128, 192)
XO_SHI_RO_512_PLUS & XO_SHI_RO_512_SS (256)
SPLIT_MIX_64 (1 - 63) [3]
*** The list may not be complete as I have not checkout the original
source for all generators. ***
The fact that the jump size can be different requires a more flexible
API. Currently all the jumps are powers of 2. But this may not always be
true. This allows the following specification to indicate the number of
jumps that are supported:
package org.apache.commons.rng;
public interface JumpableUniformRandomProvider extends
UniformRandomProvider {
int getNumberOfJumps();
JumpableUniformRandomProvider jump(int jump);
}
The implementation would then support any jump parameter up to the
supported number of jumps.
The definition of the meaning of the jump parameter is ambiguous. It
should be documented in the implementing class what the 'jump' refers to.
This would allow for example XO_SHI_RO_256_PLUS to return 2 for the two
supported jump sizes and document the actual size of the jumps. The
SplitMix algorithm would return some TBD number [4].
An alternative is to mandate only a single jump, or as shown below a
maximum of two jumps:
package org.apache.commons.rng;
public interface JumpableUniformRandomProvider extends
UniformRandomProvider {
boolean isLongJumpSupported();
JumpableUniformRandomProvider jump();
JumpableUniformRandomProvider longJump();
}
This may not be flexible enough going forward.
Discussion on this is requested.
Alex
[1] https://github.com/aappleby/smhasher
[2] Using Long.bitCount(long ^ (long >>> 1)) to count transitions
[3] The SplitMix64 is a simple linear series and thus can be jumped in
any power of 2 up to the maximum for a long (which causes sequence
wrapping). It can actually be jumped any number of iterations using
BigInteger arithmetic but jumping in powers of 2 can be implemented
using long arithmetic where the rollover bits beyond 64 are naturally
discarded:
long jumps = 1234567;
long increment = 0x9e3779b97f4a7c15L;
state = BigInteger.valueOf(state)
.add(BigInteger.valueOf(increment).multiply(BigInteger.valueOf(jumps)))
.longValue(); // narrowing primitive conversion
int jumpPower = 32;
state = BigInteger.valueOf(state)
.add(BigInteger.valueOf(increment).shiftLeft(jumpPower))
.longValue(); // narrowing primitive conversion
// Same as above by discarding overflow bits
state = state + (increment << jumpPower);
This is based on my understanding of BigInteger and signed/unsigned
arithmetic and should be verified in tests.
[4] Given the period of the SplitMix is 2^64, and the period may be less
than this in practice it may be better to only support jumps of e.g.
2^32 in a manner similar to those provided for the XorShiRo generators
where the state can be advanced a factor of the period, or just not
supports jumps. I can see the utility in jumping more than
Integer.MAX_VALUE (guaranteed unique outputs for the maximum array size)
but less than 2^32 is tending towards not very many random numbers from
the original instance before sequence overlap with the jumped instance.