On 10/06/2019 15:31, Gilles Sadowski wrote:
P.S. Thinking of releasing 1.3?
Not yet. I think there are a few outstanding items that work together
for the multi-threaded focus of the new code and the new generators:
Sure but some of them could be postponed, if just to RERO.

- RNG-98: LongJumpable (easy)

- RNG-102: SharedStateSampler (lots of easy work)

- RNG-106: XorShiRo generators require non-zero input seeds

(I'm still thinking about the best way to do this. The Jira ticket
suggests a speed test to at least know the implications of different ideas.)
This is only when using the "SeedFactory" (?).  [Otherwise, it's the
user's responsibility to construct an appropriate seed.]

Couldn't we just check that the output of the internal generator is not
all zero (and call it again if it is)?

Yes. The worse case scenario is a 1 in 2^64 collision rate with zero. All other generators have larger state sizes. So this would be fine. An alternative would be to set a single bit to non zero. This throws away 1 bit of randomness from the seed and will always work without any recursion. But it makes the seed worse. The ideas are in the header for this Jira ticket:

https://issues.apache.org/jira/browse/RNG-106

I'll fix this soon.

The other item I did not mention is outcome from RNG-104. This seems to indicate that using System.identityHashCode(new Object()) is not as good a mixer as a ThreadLocal random generator, both for speed and also quality. I'm currently testing Well44497b ^ SplitMix in BigCrush but I think this should replace the identity hash code method.

It also shows that using a synchronized block on each call to the generator is slow. Seed arrays can be built 2x faster using 8 calls to the generator per synchronisation when single threaded. When multi-threaded it is much better. I'm still testing to find a good estimate of the optimum block size for all scenarios.


There are also outstanding items I've partially looked at:

- RNG-90: Improve nextInt(int) and nextLong(long) for powers of 2

I paused testing this as I moved on to other things. The easy fix is to
copy the JDK SplittableRandom implementation. But it requires a
generator with good quality lower bits. It would have to be worked
around for generators that have low period lower bits. So this requires
digesting all the results of BigCrush to determine which generators can
use the new method and which should not change. Then is the decision on
how to do it.
A second-order improvement IMHO.
Which is why I moved on. Note that the speed of using the new approach is much faster for powers of 2.

- RNG-95: DiscreteUniformSampler

I have code that computes a discrete uniform sample using multiply and
not the modulus algorithm used in nextInt(int). However I cannot find
anywhere that uses the method so currently I am the author. I cannot
imagine no-one has done this before
Interesting...

but to be on the safe side it may be
better to put it in as an alternative DiscreteSampler, e.g.
FastDiscreteUniformSampler and leave the current DiscreteUniformSampler
to default to using nextInt(int).
I'm wary of this naming after the "FastMath" experience.
Perhaps safer is to puti in a feature branch, until you are sure that
it can replace the current implementation.
I couldn't think of a name describing the method. It is related to the discrete Weyl sequence so perhaps WeylDiscreteUniformSampler. It's a work in progress...

Speed tests show it is faster, and can be over 2-fold faster when the
rejection algorithm in nextInt(int) is worse case.

- RNG-100: GuideTableDiscreteSampler

All done but should be rebased and put in a PR

- RNG-99: AliasMethodDiscreteSampler

Also done but is very hard to find a probability distribution where it
is better than GuideTableDiscreteSampler. It could be added as a
reference implementation.
+1

- RNG-XX: Use GuideTableDiscreteSampler behind
DiscreteProbabilityCollectionSampler<T>

It will be faster and remove the binarySearch method from that class.
+1

I also thought we wait until end of GSoC and so new generators could
also be included.
I'd rather not wait.
There are many improvements and new features (thanks to you!)
that warrant a release.
OK. I'll get on with fixing the "must haves".

And I also think that releasing his GSoC work would be a nice
achievement task for Abhishek (after he will have assisted at
how it worked out for 1.3).

Best,
Gilles

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to