On 09/04/2019 17:36, Gilles Sadowski wrote:
Hello.

Le mar. 9 avr. 2019 à 16:42, Alex Herbert <alex.d.herb...@gmail.com> a écrit :
Hi Gilles,

Lots of sensible discussion here. I'll just put all replies at the end.

On 09/04/2019 13:28, Gilles Sadowski wrote:
Hello.

Hi Gilles,

You ask some good questions which I may have been vague about due to
familiarity with the possibilities. I hope to clarify a bit below.

On 08/04/2019 16:05, Gilles Sadowski wrote:
Hi Alex.

Le lun. 8 avr. 2019 à 14:36, Alex Herbert <alex.d.herb...@gmail.com> a écrit :
This is a starter for a discussion on the split and jump functionality
for a random generator.

Split:

To create a new instance of the generator that is deterministically
based on the state of the current instance but the probability that the
sequence generated by the new instance and the current instance overlap
is negligible.
I may well be mistaken but I seem to recall that a split is supposed
to create an instance with no overlap for a sequence below a certain
length.
   From the implementations I have found in the XorShiRo family they have
both a split and a jump.
The C implementations do not contain a "split" function, although the
Java ones do.
Is it only to preserve familiarity with the "SplittableRandom" API?
What are use-cases that would not be (better) covered by jump?

The split basically creates a new random generator. There are no
guarantees about sequence overlap. This is like seeding a new instance.
The difference is that it is deterministic based on the current state
and will return an instance of the same generator, that will be
different, and do it fast. It is very simple to do this. Just scramble
the current state using an algorithm different from how the state is
regularly updated. I see it as a "scrambled copy" type functionality.
Is the only purpose to get a second instance faster than by
going through the "RandomSource" factory?  [If so, how many
creations would be needed in order to see a noticeable difference?]
Or there is a inherent interest of generating a instance using
another's state a seed?  [I'd tend to think there isn't due to the lack
of this functionality in C codes.]

The jump is more constrained. It advances the generator to a point that
would be reached after a large number of calls to next(). Here's how the
documentation from XorShiro256StarStar describes its use (c-code):

/* This is the jump function for the generator. It is equivalent
      to 2^128 calls to next(); it can be used to generate 2^128
      non-overlapping subsequences for parallel computations. */

void jump(void);
Indeed, the "non-overlapping" guarantee is a added feature.

In comparison, "split" looks pretty bland (as just another way
to instantiate a generator).

/* This is the long-jump function for the generator. It is equivalent to
      2^192 calls to next(); it can be used to generate 2^64 starting points,
      from each of which jump() will generate 2^64 non-overlapping
      subsequences for parallel distributed computations. */

void long_jump(void);

So the idea is to seed an experiment once to get a single generator.
Then jump it for each parallel computation. Each computation will then
be guaranteed to run with a different sequence for at least as long as
the jump length.

This results in the following type of code using the API I suggested:

// If jump returns a new instance
JumpableUniformRandomProvider source = ...;
UniformRandomProvider[] rngs = new UniformRandomProvider[128];
for (int i = 0; i < rngs.length; i++) {
       // Advance state
       rngs[i] = source.jump();
       source = rngs[i];
}

In my suggested API the jump returns a new instance. So calling jump
repeatedly on the same generator keeps returning the same fast-forward
state.
The least surprise behaviour.

This could lead to errors if not well documented how to use it.
To be documented, but it would be the same kind of errors as
forgetting to increment a counter.

The alternative is to advance the state of the same instance. So to get
the same effect for parallel computations you must have an ability to
copy a generator (which we do not have in the API).

// If jump updates the current instance
JumpableUniformRandomProvider source = ...;
UniformRandomProvider[] rngs = new UniformRandomProvider[128];
for (int i = 0; i < rngs.length; i++) {
       // Advance state and copy
       rngs[i] = source.jump().copy();
}

We previously discussed copy(). IIRC the conclusion was that copy()
could be misused for parallel computations; it basically allows a
parallel set of work all with a copy of the same generator and so
limited randomness across the set.
Not sure what you mean by "limited randomness".

Leaving it out of the API out forces
the use of new generators for parallel work.

So there are at least three options for jump:

1. jump returns a new instance with an advanced state. Current state is
the same (possible errors when using this repeatedly)

2. jump advances the current state (requires separate copy method to be
of use)

3. jump advances the current state and returns a new copy instance
(combined jump and copy is possibly a bit confusing)

I would opt maybe instead for option 2 over the original option 1. But
then that exposes the 'risks' of misuse of just copy().
IIUC, what seems the most sensible is the option (1):

UniformRandomProvider rng = RandomSource.create(...);

// Create a non-overlapping RNG (e.g. to pass it to another thread.)
UniformRandomProvider another = rng.jump();

[This is supposing that there is a "master" thread for managing
the jumps (so as to have different sequences in all "slave"
threads).]

// If the user wants to use an "advanced" state within the
// same thread (but, what's the use-case?).
rng = rng.jump();

Jump:

To create a new instance of the generator that is deterministically
based on the state of the current instance but advanced a set number of
iterations.
IOW, after a jump, the new state is equivalent to the state one would
have gotten to after a specified (large) number of calls to "next()".
Yes.
Split:

This is an alternative to simply creating a new instance of the same
generator using a different seed,
It's not my (limited/wrong ?) understanding.  What prevents the new
seed from creating a state that would be reached (by the original
instance) after just a few numbers?
Nothing. There are no guarantees with split. However it should be noted
that for the generators with a large state space the probability of
collision will be low. Plus it would be no worse than creating a new
instance anyway, just faster.
[See my issue with "faster", mentioned above.]

or using the generator to create a
seed then creating a new instance.
What is gained from using the current instance, rather than any other
generator to create a new seed (like it's done with the "SeedFactory")?
Nothing. It was an example of an equivalent result to a split: get a new
instance seeded differently.

This circular argument does seem to indicate very little advantage to a
split operation other than speed to get another instance of the same type.
Ah. ;-)

Perhaps a use case is parallel computations where you do not care if the
sequences overlap,
?
IIRC, this is the one of the cased where a user might need the guarantee
that sequences do *not* overlap (e.g. where the overlap would lead to
duplicating computations already done in another thread).

only that they start from different points and have a
unique instance for the thread. This would work for instance where each
parallel task had different input to combine the randomness with anyway.
In effect a situation where a simple copy of the same generator would
also suffice.

In the later case this would advance
the current generator but a split does not have to (more on this later).

Assuming the state of a generator is a set of bits with no further
requirements then a split can be performed by a bit mixing algorithm. A
commonly used algorithm is the finalisation step of the MurmurHash3 of
Austin Appleby [1] which has variants to mix int and long values from
input int or long.

Most of the algorithms in Commons RNG should have a state applicable to
simple mixing.
Subject to my very partial knowledge, I assumed that "jump" was
the engine behind the "split" functionality (whose purpose is to
ensure a specified number of non-overlapping sequences).
No. The two are different. Split is faster and has less restrictions.
Faster in which real-life use-case?

Notable exceptions are those that implement a controlled
seeding procedure: MersenneTwister; MersenneTwister64; ISAAC; and
TwoCmres. These have a state that is more controlled that just random
bits. In most cases the split could be implemented by mixing the state
to create a seed for a new instance, then running through the
initialisation procedure. There may not be much value in this (over just
forcing a user to create a new seeded instance) although making all the
generators Splittable makes an easier to use library.
I thought that the question was whether a "jump" function always exists.
IOW, for some generators, the only way to get from one state to another
would be by passing through all the intermediate states.
There may be no jump (which requires an exact update to the state) but
there can always be a split. So this may be a reason to add both
interfaces to the library. With split implemented by all generators they
can all be used in parallel computations via split() to get a generator
for each thread.
IIUC, "split" is useful when each thread that receives an instance
assumes that it can spawn further threads that will further "split"
that instance.
Given that instantiating a new thread will take much more time
than "split", the speedup (wrt instantiation by "RandomSource")
argument does not seem to hold.

The lack of a jump is evident in the constructor for TwoCmres which
would benefit from a jump function but instead explicitly forwards the
state of each sub-cycle generator by calling it n times.

A notable exception is the SplitMix64 algorithm. This implements
generation of long values identically to Java 8's SplittableRandom. The
sequence is generated using a linear series of long values that is mixed
to output random long values. The linear series is produced using a
default increment that maximises the period of the sequence. A simple
split implementation would just mix the current point in the series.
However Java 8's SplittableRandom also mixes the increment constant
(using a MurmurHash3 mix step). The constant is then forced to be odd
and contain a high number of bit state transitions (i.e. 0s to 1s or 1s
to 0s, the maximum for a long is 32) [2]. So should the Commons RNG
class match the implementation of SplittableRandom? This may have
licensing issues. Note: A side-effect of requiring two numbers for the
split instance is the current instance is advanced.

Given there are no decisions to be made by the user when splitting a
suggestion for an API would be a new interface akin to
RestorableUniformRandomProvider:

package org.apache.commons.rng;

public interface SplittableUniformRandomProvider extends
UniformRandomProvider {

        SplittableUniformRandomProvider split();

}


An implementation detail:

Q. Should a split advance the state of the current generator? This
allows successive split() calls from the same instance to create new
variants.
I cannot give a pertinent opinion given my questions above...

IIUC what you wrote earlier (i.e. "split" is akin to create a new
instance from a different seed), it does not need to be defined
"internally".  And, actually, calling "RandomSource.create", without
specifying the seed, is indeed doing a "split" (?).]

If I'm correct (about "jump" ensuring non-overlapping sequences),
then "rng.split()" would just be equivalent to "rng.jump()".
Given the above split just scrambles the state into a new generator.
This allows:

// If split does not update the current instance
JumpableUniformRandomProvider source = ...;
UniformRandomProvider[] rngs = new UniformRandomProvider[128];
for (int i = 0; i < rngs.length; i++) {
       // Advance state and split
       source.nextLong();
       rngs[i] = source.split();
}

// If split advances the current instance
JumpableUniformRandomProvider source = ...;
UniformRandomProvider[] rngs = new UniformRandomProvider[128];
for (int i = 0; i < rngs.length; i++) {
       // Split advances (even only one step)
       rngs[i] = source.split();
}
And the same, without "split":

RandomSource source = ...;
for (int i = 0; i < rngs.length; i++) {
      rngs[i] = RandomSource.create(source);
}

So perhaps the better implementation is to require that split will alter
the state of the current instance and return a new instance of the same
generator which will have a different output. This allows repeat calls
to split to create new generators.

Note that the split implementation for SplittableRandom will:

- Advance the current instance two calls to next()

- Create a new instance with a random seed and a different (random)
increment

This ensures each new generator has a different output. The Javadoc states:

        * Constructs and returns a new SplittableRandom instance that
        * shares no mutable state with this instance. However, with very
        * high probability, the set of values collectively generated by
        * the two objects has the same statistical properties as if the
        * same quantity of values were generated by a single thread using
        * a single SplittableRandom object.  Either or both of the two
        * objects may be further split using the {@code split()} method,
        * and the same expected statistical properties apply to the
        * entire set of generators constructed by such recursive
        * splitting.

This is a bit more than a split to scramble the state (since the output
sequences will very unlikely overlap) but not quite a jump of a defined
length along the output sequence.
Before adding "split" to API, we must identify how it is supposed
to be used.
If the purpose is to provide alternatives to "SplittableRandom", I'd
rather explore whether/how to create an adapter class around
"RandomSource" like it was done for "Random".
SplittableRandom is final.
I would vote yes. It will prevent misuse where multiple threads are
created that each have an instance provided by a call to split() on the
same source generator that is not otherwise advanced.

A user can always save the state of any of the generators in the library
and restore it to the state before the call to split if they so desire.


Jump:

This requires a generator that has a predictable state at some point in
the future given the current state. The XorShiRo family have jump
functions computed by the original authors. The jump size in power of 2
is provided. The SplitMix is a simple linear series and so can be jumped
any distance. Jumps functions are available for:

XOR_SHIFT_1024_S & XOR_SHIFT_1024_S_PHI (512)
XO_SHI_RO_128_PLUS & XO_SHI_RO_128_SS (64)
XO_RO_SHI_RO_128_PLUS & XO_RO_SHI_RO_128_SS (64, 96)
XO_SHI_RO_256_PLUS & XO_SHI_RO_256_SS (128, 192)
XO_SHI_RO_512_PLUS & XO_SHI_RO_512_SS (256)
SPLIT_MIX_64 (1 - 63) [3]

*** The list may not be complete as I have not checkout the original
source for all generators. ***

The fact that the jump size can be different requires a more flexible
API. Currently all the jumps are powers of 2. But this may not always be
true. This allows the following specification to indicate the number of
jumps that are supported:
What is meant by supported?
I'd guess the number of jumps before the sequence wraps (?).
If so, does the newly created instance gets its number of supported
jumps reduced by one?
I meant 'Supported' as the number of different jump lengths that can be
done. So at least 1 but, as for the XorShiRo generators, this can be 2.

I did not think an alternative API returning how far you can jump was
useful. Just that it be flexible enough that short and long jumps can be
specified.

Tracking the number of jumps left before a collision would be up to the
user.
Then why not stay close to the C and define a "LongJumpable" interface?

package org.apache.commons.rng;

public interface JumpableUniformRandomProvider extends
UniformRandomProvider {

        int getNumberOfJumps();

        JumpableUniformRandomProvider jump(int jump);

}
If jump sizes and number of "supported" jumps are implementation-dependent,
it's not clear to me that it should be part of the API here (e.g.
could the caller
obtain an instance for which "getNumberOfJumps()" would return zero?).
No. If you implement the JumpableUniformRandomProvider then you must be
able to jump. So perhaps just the version with jump(), longJump() and
isLongJumpSupported().
I think that the "number of jumps" is ambiguous, IIUC that you actually
mean "number of jump types" ("short" vs ""long").

The implementation would then support any jump parameter up to the
supported number of jumps.

The definition of the meaning of the jump parameter is ambiguous. It
should be documented in the implementing class what the 'jump' refers to.
The meaning is clear (shortcut to advance to a state very far from the
current one); the documentation should indeed indicate "how far", and
how many jumps before wrapping occurs.

This would allow for example XO_SHI_RO_256_PLUS to return 2 for the two
supported jump sizes and document the actual size of the jumps. The
SplitMix algorithm would return some TBD number [4].

An alternative is to mandate only a single jump, or as shown below a
maximum of two jumps:

package org.apache.commons.rng;

public interface JumpableUniformRandomProvider extends
UniformRandomProvider {

        boolean isLongJumpSupported();

        JumpableUniformRandomProvider jump();

        JumpableUniformRandomProvider longJump();

}

This may not be flexible enough going forward.
I'm missing many points, and this one too as a consequence.

My (simplistic?) idea was to have methods added to the
"UniformRandomProvider" interface:

public interface UniformRandomProvider {
       // ...

       boolean isJumpable();
       UniformRandomProvider jump();
}

[Of course, it would be an incompatible change, unless we decide
to only target JDK versions that allow "default" methods.]

With comptability, we could have a "JumpableUniformRandomProvider"
interface with a single "jump" method (functional interface?), and define
"createJumpable" methods in "RandomSource" that would throw if the
"core" implementation does not support "jump".
A single jump function is better than nothing but missing the fact that
for half of the implementations for which we know a jump they have two
jump lengths. Since we are starting from scratch I think we should make
the API support this flexibility.
Do you foresee the appearance of "long_long_jump" functions in C
implementations that would justify having a one-argument "jump(int)"
method (for which the argument would 1 or 2 currently) in Java?
IMHO, it's far-fetched, given
* the number of Java users still happily using "Random",
* the fraction of them who'd consider a better implementation,
* the fraction of those who'd need more than one RNG instance,
* the fraction who'd use them in a multi-threaded application,
* and with a non-overlapping guarantee (thus requiring "jump")
* and generating a very long sequence from the same RNG (thus
    requiring "longJump"), ...
;-)

Discussion on this is requested.
Do any of these questions/comments make sense? :-)
Always.
Thanks.

To summarize:
Do you think it's possible to discuss "jump" separately from "split"?
IMO, "jump" is a more important feature, and is definitely algorithm-dependent
("core") functionality, while "split" seems a JDK invention (?) which we might
perhaps mimic without adding code to the core implementations (?).
In short I think that the split and jump are aimed at achieving the same
function. To provide thread safe instances that will not overlap during
parallel computations.


A jump provides a guarantee that the sequence will not overlap for a set
number of cycles. The count should be documented in the jump function.

For the jump functionality we have two scenarios that could be supported:

1. A master that jumps and hands a generator to each thread

2. A master that long jumps and hands a generator to sub-master thread.
The sub-master then jumps the generator and hands a generator to each
thread.


So perhaps a Jumpable interface with jump, longJump and
isLongJumpSupported methods, or Jumpable and LongJumpable. I like the
later. I do not envision a three tiered simulation set-up. Two teired is
complex enough. If the interface is changed in the future for more jumps
then we do not have to support it or move to Java 8 and add defaults.


The split does not provide a guarantee that the sequence will not
overlap. In this case then the simple split of mixing up the current
state is just a convenience method that substitutes for creating a new
generator. It need not be added but would be convenient.

However there is a case for split if it returns a new generator with a
sequence that will very likely not overlap because the internal
algorithm of the generator has been changed.
But this is "dangerous"? Isn't this kind of change that can create
correlated generators?  [Cf. "XorShit1024Star" vs its "Phi" variant;
or the small set of "TwoCmres" variants.]

I would say no. The XorShift1024Star applied the variant as the final multiplication factor. If the long arithmetic was not subject to rollover then the sequences would be perfectly correlated:

[x * a] verses [x * b]

In the case of SplitMix the variants are adding to the current state:

state += update

Then the state is passed through a mixing algorithm taken from a hash finalisation step that will equally distribute the bits. So there is a lot more done to the number before the output sequence.


This is how
SplittableRandom works. I've been looking into this so here are my
findings. It creates a new increment for the Weyl sequence [1], no
longer using the hard coded golden ratio increment. The likelihood of
sequence overlap is then 1/[number of possible increments], i.e. hope
you do not create the same increment. Since all increments have to be
odd for a Weyl sequence this is 1/[2^63]. However the SplittableRandom
requires at least 24 bit transitions in the increment. This is modelled
as a Binomial distribution with n=63, p=0.5. The cumulative probability
of less than 24 transitions is 2.25%. However I found a bug/design
choice where the SplittableRandom uses an unsigned shift to count
transitions (>>>) not the signed shift (>>) meaning that the
distribution is not exactly a binomial. However I constructed an
empirical cumulative probability and p(x<24) = 1.62%. The result being
that the collision probability of the increment is 1/[0.9838 * 2^63]
(1.1e-19) from splitting a SplittableRandom. Then there are 2^64
possible start points for the sequence so even if you did create the
same increment it still would only overlap some of the time, depending
on the number of cycles you require from each generator.
Is "split" always based on the "Weyl sequence"?
The split RNG is yes. The Weyl sequence just accumulates a sum using an increment that is required to be odd. The entries of the series are sent through the mix algorithm to generate a uniformly distributed series.
IIUC your first message (using the state to create a new seed), I'd
be wary to provide the API for either changing the state (keeping the
same algo) or changing the algo.
I know. I alluded to that a bit later when I discussed that you could regenerate from a Restorable state any generator with a increment of your choice.

Assuming that for certain generators, changing the algo in that way
is proven to provide an equally good RNG (as is the case for
"SplittableRandom", I guess?), couldn't "Commons RNG" provide
the feature like for "TwoCmres", i.e. pass some argument to the
constructor/factory?

So a SPLIT_MIX_64_K where there is a second argument that is used to create an increment for the Weyl sequence (this is known as integer K on the wikipedia page). It could also be GAMMA (which is what SplittableRandom calls it). The name can be decided if it works.

I can try this out and pass a few versions through BigCrush.

Overall it seems to be that the split() functionality is not an interface to be added to the library. This is something that the JDK has created as a workable alternative to a jump and we could add it to the library just for the SplitMix algorithm if it works.


If you split again then this holds because split uses the same random
method (i.e. no memory of previous splits). The likelihood of a
collision will increase as you continue to split, but will remain low
within a realistic number of splits. The collision is modelled using a
binomial distribution with n=splits and p=(1 - 1.1e-19) as the
probability the split generates a unique generator sequence. To check
the collision probability requires the cumulative probability P(X<n) or
1 - p^n. I can't do this on my calculator (since 1-p ~ 1) but I would
imagine it requires a lot of n before p is noticeably below 1.

Anyway the split function provides an alternative to the jump function
when the jump function is not available. We could support this for
SplitMix and the as yet to be added Middle Square Weyl Sequence (MSWS)
generator which also uses a Weyl sequence. However the reference MSWS
implementation for that does create the Weyl sequence increment in the
constructor and so a split would be no different from a new instance [2].

For all the other generators a split would be just like creating a new
generator. This seems without a use case other than convenience.

So is it worth adding a Split interface if it works for only one (maybe
two) generators? Personally I would say yes because I would use it. The
SplitMix is current the fastest generator in the library and for small
parallel computation work this would be a good choice.

I note that this will alter the internals of the SplitMix requiring a
long to store the state and one for the increment. So although you could
not create a split RNG directly through a constructor (by passing the
Weyl sequence increment) you would be able to create a split RNG using
the Restoreable interface.


Note: We cannot create a container around a RNG for SplittableRandom.
That class is final.
Yes, I actually mentioned it in the userguide. :-}
So, if we cannot pass [RNG] generators as a disguised
"SplittableRandom" to methods that need it, I don't see
the use-case for a "split" method at the interface level
whereas a generator can be created the usual way.
Yes. It seems that way.

But we can do a factory method:

UniformRandomProvider split(UniformRandomProvider rng);

The method would discover the RandomSource from the provider class and
create a new instance. This would not require altering any of the core
implementations. If the class is unknown (i.e. not a concrete class from
the core module) then it will throw an exception.

This would be purely for convenience as it will just call
RandomSource.create().
How is calling
   RandomSource.split(...)
(significantly) more convenient than
   RandomSource.create(...)
?

Whenever possible, it still seems better to not add to "core" whenever
a similar feature (like "An instance that is unlikely to produce the same
sequence") can be obtained otherwise, especially if the change would
make the "core" implementation depart from the reference (C code).

Another option is to add the Splittable... interface and have a wrapper
function to be used when the RandomSource is not itself Splittable:

/**

   * Creates an instance of {@link SplittableUniformRandomProvider} using

   * {@link #create(RandomSource, Object, Object...)} that can generate further

   * randomly seeded instances using the {@link 
SplittableUniformRandomProvider#split} method.

   *

   * @param source RNG type.

   * @param seed Seed value.  It can be {@code null} (in which case a

   * random value will be used).

   * @param data Additional arguments to the implementation's constructor.

   * Please refer to the documentation of each specific implementation.

   * @return the RNG.

   * @throws UnsupportedOperationException if the type of the {@code seed}

   * is invalid.

   * @throws IllegalStateException if data is missing to initialize the

   * generator implemented by the given {@code source}.

   *

   * @see #create(RandomSource)

   */

public static SplittableUniformRandomProvider createSplittable(RandomSource 
source,

                                                                 Object seed,

                                                                 Object ... 
data) {

      final UniformRandomProvider rng = create(source, seed, data);

      if (rng instanceof SplittableUniformRandomProvider) {

          return rng;

      }

      return new SplittableUniformRandomProvider() {

          /** {@inheritDoc} */

          @Override

          public SplittableUniformRandomProvider split() {
              // A default split is unseeded

              return create(source, null, data);

          }

          // ... and the rest

      };

}

Or using discovery from an already created provider:

/**

   * Wraps the given {@code delegate} generator in a new instance that

   * supports the {@link SplittableUniformRandomProvider}. The split method

   * is supported by creating a new instance.

   *

   * @param delegate Generator to which calls will be delegated.

   * @return an instance which is a SplittableUniformRandomProvider.

   * @throws IllegalArgumentException if the RandomSource for the delegate is 
unknown.

   */

public static SplittableUniformRandomProvider splittable(final 
UniformRandomProvider delegate) {

      if (delegate instanceof SplittableUniformRandomProvider) {

          return delegate;

      }

      // The map should be constructed for all classes in the 
RandomSourceInternal enum

      // that do not have arguments. This can be done automatically.

      final RandomSource source = map.get(delegate.getClass());

      if (source == null) {

          throw new IllegalArgumentException("Unknown source for class: " + 
delegate.getClass());

      }

      return new SplittableUniformRandomProvider() {

          /** {@inheritDoc} */

          @Override

          public SplittableUniformRandomProvider split() {
              // You get the idea here but it should be done without recursion

              return splittable(create(source));

          }

          // ... and the rest

      };

}


Then core implementations can be updated to implement Splittable if it
is deemed better than just creating a new instance. Currently this would
only be SplitMix.


I just noted that if a generator does implement split directly then this
would avoid synchronisation in the RandomSource.create method when
building the seed.
I'm curious: What would be the use-case where the performance gain will
be discernible?

I always like performance gains. But if it is used with threads like you mentioned the overhead of starting and working with them (via Executors) is likely to swamp anything.

So do we have a working idea to:

- Add interface 'JumpableUniformRandomProvider'

- Add interface 'LongJumpable... extends JumpableUniformRandomProvider'

- Test if a SplitMix variant with a self generated Weyl sequence can pass tests of uniformity. This would just require a seed of long[2], one for the state and one to use to derive the Weyl sequence increment.


Alex



Regards,
Gilles

Alex


[1] https://en.wikipedia.org/wiki/Weyl_sequence

[2] The Jira ticket RNG-85 had a note about the seeding algorithm for
the generator being GPL. There are alternatives based on the
SplittableRandom seeding method that could be used instead to create the
increment for the Weyl sequence. I've speed tested the provided
algorithm and it is about 85x slower than the one used in
SplittableRandom. Since that algorithm has an issue with the unsigned
shift not being modelled by the Binomial distribution an updated
algorithm could be used that would be novel so avoid the Oracle or GPL
licences.

Best,
Gilles

Alex

[1] https://github.com/aappleby/smhasher

[2] Using Long.bitCount(long ^ (long >>> 1)) to count transitions

[3] The SplitMix64 is a simple linear series and thus can be jumped in
any power of 2 up to the maximum for a long (which causes sequence
wrapping). It can actually be jumped any number of iterations using
BigInteger arithmetic but jumping in powers of 2 can be implemented
using long arithmetic where the rollover bits beyond 64 are naturally
discarded:

long jumps = 1234567;

long increment = 0x9e3779b97f4a7c15L;

state = BigInteger.valueOf(state)

                      
.add(BigInteger.valueOf(increment).multiply(BigInteger.valueOf(jumps)))

                      .longValue(); // narrowing primitive conversion

int jumpPower = 32;

state = BigInteger.valueOf(state)

                      .add(BigInteger.valueOf(increment).shiftLeft(jumpPower))

                      .longValue(); // narrowing primitive conversion

// Same as above by discarding overflow bits

state = state + (increment << jumpPower);

This is based on my understanding of BigInteger and signed/unsigned
arithmetic and should be verified in tests.

[4] Given the period of the SplitMix is 2^64, and the period may be less
than this in practice it may be better to only support jumps of e.g.
2^32 in a manner similar to those provided for the XorShiRo generators
where the state can be advanced a factor of the period, or just not
supports jumps. I can see the utility in jumping more than
Integer.MAX_VALUE (guaranteed unique outputs for the maximum array size)
but less than 2^32 is tending towards not very many random numbers from
the original instance before sequence overlap with the jumped instance.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to