Hi Alex.

Le lun. 8 avr. 2019 à 14:36, Alex Herbert <alex.d.herb...@gmail.com> a écrit :
>
> This is a starter for a discussion on the split and jump functionality
> for a random generator.
>
> Split:
>
> To create a new instance of the generator that is deterministically
> based on the state of the current instance but the probability that the
> sequence generated by the new instance and the current instance overlap
> is negligible.

I may well be mistaken but I seem to recall that a split is supposed
to create an instance with no overlap for a sequence below a certain
length.

>
> Jump:
>
> To create a new instance of the generator that is deterministically
> based on the state of the current instance but advanced a set number of
> iterations.

IOW, after a jump, the new state is equivalent to the state one would
have gotten to after a specified (large) number of calls to "next()".

>
> Split:
>
> This is an alternative to simply creating a new instance of the same
> generator using a different seed,

It's not my (limited/wrong ?) understanding.  What prevents the new
seed from creating a state that would be reached (by the original
instance) after just a few numbers?

> or using the generator to create a
> seed then creating a new instance.

What is gained from using the current instance, rather than any other
generator to create a new seed (like it's done with the "SeedFactory")?

> In the later case this would advance
> the current generator but a split does not have to (more on this later).
>
> Assuming the state of a generator is a set of bits with no further
> requirements then a split can be performed by a bit mixing algorithm. A
> commonly used algorithm is the finalisation step of the MurmurHash3 of
> Austin Appleby [1] which has variants to mix int and long values from
> input int or long.
>
> Most of the algorithms in Commons RNG should have a state applicable to
> simple mixing.

Subject to my very partial knowledge, I assumed that "jump" was
the engine behind the "split" functionality (whose purpose is to
ensure a specified number of non-overlapping sequences).

> Notable exceptions are those that implement a controlled
> seeding procedure: MersenneTwister; MersenneTwister64; ISAAC; and
> TwoCmres. These have a state that is more controlled that just random
> bits. In most cases the split could be implemented by mixing the state
> to create a seed for a new instance, then running through the
> initialisation procedure. There may not be much value in this (over just
> forcing a user to create a new seeded instance) although making all the
> generators Splittable makes an easier to use library.

I thought that the question was whether a "jump" function always exists.
IOW, for some generators, the only way to get from one state to another
would be by passing through all the intermediate states.

> A notable exception is the SplitMix64 algorithm. This implements
> generation of long values identically to Java 8's SplittableRandom. The
> sequence is generated using a linear series of long values that is mixed
> to output random long values. The linear series is produced using a
> default increment that maximises the period of the sequence. A simple
> split implementation would just mix the current point in the series.
> However Java 8's SplittableRandom also mixes the increment constant
> (using a MurmurHash3 mix step). The constant is then forced to be odd
> and contain a high number of bit state transitions (i.e. 0s to 1s or 1s
> to 0s, the maximum for a long is 32) [2]. So should the Commons RNG
> class match the implementation of SplittableRandom? This may have
> licensing issues. Note: A side-effect of requiring two numbers for the
> split instance is the current instance is advanced.
>
> Given there are no decisions to be made by the user when splitting a
> suggestion for an API would be a new interface akin to
> RestorableUniformRandomProvider:
>
> package org.apache.commons.rng;
>
> public interface SplittableUniformRandomProvider extends
> UniformRandomProvider {
>
>      SplittableUniformRandomProvider split();
>
> }
>
>
> An implementation detail:
>
> Q. Should a split advance the state of the current generator? This
> allows successive split() calls from the same instance to create new
> variants.

I cannot give a pertinent opinion given my questions above...

IIUC what you wrote earlier (i.e. "split" is akin to create a new
instance from a different seed), it does not need to be defined
"internally".  And, actually, calling "RandomSource.create", without
specifying the seed, is indeed doing a "split" (?).]

If I'm correct (about "jump" ensuring non-overlapping sequences),
then "rng.split()" would just be equivalent to "rng.jump()".

> I would vote yes. It will prevent misuse where multiple threads are
> created that each have an instance provided by a call to split() on the
> same source generator that is not otherwise advanced.
>
> A user can always save the state of any of the generators in the library
> and restore it to the state before the call to split if they so desire.
>
>
> Jump:
>
> This requires a generator that has a predictable state at some point in
> the future given the current state. The XorShiRo family have jump
> functions computed by the original authors. The jump size in power of 2
> is provided. The SplitMix is a simple linear series and so can be jumped
> any distance. Jumps functions are available for:
>
> XOR_SHIFT_1024_S & XOR_SHIFT_1024_S_PHI (512)
> XO_SHI_RO_128_PLUS & XO_SHI_RO_128_SS (64)
> XO_RO_SHI_RO_128_PLUS & XO_RO_SHI_RO_128_SS (64, 96)
> XO_SHI_RO_256_PLUS & XO_SHI_RO_256_SS (128, 192)
> XO_SHI_RO_512_PLUS & XO_SHI_RO_512_SS (256)
> SPLIT_MIX_64 (1 - 63) [3]
>
> *** The list may not be complete as I have not checkout the original
> source for all generators. ***
>
> The fact that the jump size can be different requires a more flexible
> API. Currently all the jumps are powers of 2. But this may not always be
> true. This allows the following specification to indicate the number of
> jumps that are supported:

What is meant by supported?
I'd guess the number of jumps before the sequence wraps (?).
If so, does the newly created instance gets its number of supported
jumps reduced by one?

>
> package org.apache.commons.rng;
>
> public interface JumpableUniformRandomProvider extends
> UniformRandomProvider {
>
>      int getNumberOfJumps();
>
>      JumpableUniformRandomProvider jump(int jump);
>
> }

If jump sizes and number of "supported" jumps are implementation-dependent,
it's not clear to me that it should be part of the API here (e.g.
could the caller
obtain an instance for which "getNumberOfJumps()" would return zero?).

>
> The implementation would then support any jump parameter up to the
> supported number of jumps.
>
> The definition of the meaning of the jump parameter is ambiguous. It
> should be documented in the implementing class what the 'jump' refers to.
>
> This would allow for example XO_SHI_RO_256_PLUS to return 2 for the two
> supported jump sizes and document the actual size of the jumps. The
> SplitMix algorithm would return some TBD number [4].
>
> An alternative is to mandate only a single jump, or as shown below a
> maximum of two jumps:
>
> package org.apache.commons.rng;
>
> public interface JumpableUniformRandomProvider extends
> UniformRandomProvider {
>
>      boolean isLongJumpSupported();
>
>      JumpableUniformRandomProvider jump();
>
>      JumpableUniformRandomProvider longJump();
>
> }
>
> This may not be flexible enough going forward.

I'm missing many points, and this one too as a consequence.

My (simplistic?) idea was to have methods added to the
"UniformRandomProvider" interface:

public interface UniformRandomProvider {
    // ...

    boolean isJumpable();
    UniformRandomProvider jump();
}

[Of course, it would be an incompatible change, unless we decide
to only target JDK versions that allow "default" methods.]

With comptability, we could have a "JumpableUniformRandomProvider"
interface with a single "jump" method (functional interface?), and define
"createJumpable" methods in "RandomSource" that would throw if the
"core" implementation does not support "jump".

>
> Discussion on this is requested.

Do any of these questions/comments make sense? :-)

Best,
Gilles

>
>
> Alex
>
> [1] https://github.com/aappleby/smhasher
>
> [2] Using Long.bitCount(long ^ (long >>> 1)) to count transitions
>
> [3] The SplitMix64 is a simple linear series and thus can be jumped in
> any power of 2 up to the maximum for a long (which causes sequence
> wrapping). It can actually be jumped any number of iterations using
> BigInteger arithmetic but jumping in powers of 2 can be implemented
> using long arithmetic where the rollover bits beyond 64 are naturally
> discarded:
>
> long jumps = 1234567;
>
> long increment = 0x9e3779b97f4a7c15L;
>
> state = BigInteger.valueOf(state)
>
>                    
> .add(BigInteger.valueOf(increment).multiply(BigInteger.valueOf(jumps)))
>
>                    .longValue(); // narrowing primitive conversion
>
> int jumpPower = 32;
>
> state = BigInteger.valueOf(state)
>
>                    .add(BigInteger.valueOf(increment).shiftLeft(jumpPower))
>
>                    .longValue(); // narrowing primitive conversion
>
> // Same as above by discarding overflow bits
>
> state = state + (increment << jumpPower);
>
> This is based on my understanding of BigInteger and signed/unsigned
> arithmetic and should be verified in tests.
>
> [4] Given the period of the SplitMix is 2^64, and the period may be less
> than this in practice it may be better to only support jumps of e.g.
> 2^32 in a manner similar to those provided for the XorShiRo generators
> where the state can be advanced a factor of the period, or just not
> supports jumps. I can see the utility in jumping more than
> Integer.MAX_VALUE (guaranteed unique outputs for the maximum array size)
> but less than 2^32 is tending towards not very many random numbers from
> the original instance before sequence overlap with the jumped instance.
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to