[jira] [Commented] (RNG-168) LXM family of random number generators

Alex Herbert (Jira) Tue, 22 Mar 2022 15:45:04 -0700


    [ 
https://issues.apache.org/jira/browse/RNG-168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17510923#comment-17510923
 ]


Alex Herbert commented on RNG-168:
----------------------------------

I have created a set of LXM generators using the examples provided in Steele & 
Vigna (2021). These generators all use the same basic steps:
{noformat}
 s = state of LCG
 t = state of XBG

 generate():
    z <- mix(combine(w upper bits of s, w upper bits of t))
    s <- LCG state update
    t <- XBG state update
    return z{noformat}
All the generators use w as 64-bits with the combine operation being an 
addition. The mix function is a variant on the final mixing function used in 
MurmurHash3. This mix function, with different constants, is used in the 
SplitMix64 generator and is provided in Steele and Vigna's paper. One exception 
is a faster mix function that matches that used in Vigna's StarStar generators 
such as XoRoShiRo128StarStar.

I have added the following classes which implement the same LXM generators as 
JDK 17:
||JDK 17||Class||RandomSource||XBG||
|L64X128StarStarRandom|L64X128StarStar|L64_X128_SS|XoRoShiRo128|
|L64X128MixRandom|L64X128Mix|L64_X128_MIX|XoRoShiRo128|
|L64X256MixRandom|L64X256Mix|L64_X256_MIX|XoShiRo256|
|L64X1024MixRandom|L64X1024Mix|L64_X1024_MIX|XoRoShiRo1024|
|L128X128MixRandom|L128X128Mix|L128_X128_MIX|XoRoShiRo128|
|L128X256MixRandom|L128X256Mix|L128_X256_MIX|XoShiRo256|
|L128X1024MixRandom|L128X1024Mix|L128_X1024_MIX|XoRoShiRo1024|

I have used the equivalent class in the JDK 17 these to create reference data 
for the new generators. In doing so I found two bugs in the seeding and 
construction routines of the JDK generators. These have been raised as issues 
([JDK-8283083|https://bugs.java.com/bugdatabase/view_bug.do?bug_id=JDK-8283083] 
and 
[JDK-8282144|https://bugs.java.com/bugdatabase/view_bug.do?bug_id=JDK-8282144]).
 Thus the seeds used for the reference generators could not be a full random 
byte[] of the correct length. Six generators can be seeded using a random 
byte[] that uses only positively signed bytes in any position that is not 
modulus 8 == 0. One generator has to be seeded using a single long value. I 
extracted the routine used in JDK 17 to generate the full seed and recorded it 
for the unit tests. This routine initialises the state of the 128-bit LCG to 1 
so the seeding for the reference data could be improved when the bugs are fixed.

One item to note is that the byte[] seed is converted in the JDK generators 
using big-endian format. The bytes create in order: the LCG additive parameter; 
the LCG state; and the XBG state. Seed conversion in the RNG simple module uses 
little-endian format. So it would be impossible to create a generator using the 
same seed the outputs the same sequence without a modification of the byte[] 
seed conversion in RNG simple, i.e.
{code:java}
byte[] seed = SecureRandom.generateSeed(Long.Bytes * 6);
RandomGenerator g = RandomGeneratorFactory.of("L128X128MixRandom").create(seed);
UniformRandomProvider rng = RandomSource.L128_X128_MIX.create(seed);

g.nextLong() != rng.nextLong();
{code}
In addition, since the generators in commons RNG have a base XBG implementation 
already in the library, the seed must be passed up to the super-class during 
construction. It is simplest to leave the initial part of the long[] the same 
and add the extra seed bytes to the end. Thus the order of the seed in Commons 
RNG is: the XBG state; the LCG state; and the LCG additive parameter. This is a 
swap of each part of the seed. Leaving the additive parameter at the end is an 
arbitrary choice but note that it is a requirement for this to be odd. The 
constructor sets the input value to odd. If this is the last component of the 
seed then it can be documented that the seed is used entirely except for the 
final trailing bit.
h2. Quality

Here are results from the various test suites:
{noformat}
RNG             Dieharder       ∩       TestU01 (BigCrush)      ∩       
PractRand       ∩
L64_X128_SS     0,1,0,0,0               0,0,0,0,0                       -,-,-   
         
L64_X128_MIX    0,0,0,0,0               0,0,0,1,2                       -,-,-   
         
L64_X256_MIX    0,0,0,0,0               0,0,0,0,0                       -,-,-   
         
L64_X1024_MIX   0,0,0,0,0               0,0,1,0,1                       -,-,-   
         
L128_X128_MIX   1,0,0,0,0               0,1,2,0,0                       -,-,-   
         
L128_X256_MIX   1,0,0,0,0               0,0,0,2,0                       -,-,-   
         
L128_X1024_MIX  0,0,0,0,0               0,0,0,2,1                       -,-,-   
         
{noformat}
No surprises here. These are high quality generators that pass 4TB of output 
through PractRand and have a few spurious fails in Dieharder and BigCrush.
h2. Performance

Here is a JMH benchmark result for nextLong run on JDK 11:
||Random Source||Method||Score||Baselined||Relative to parent XBG||
| |baselineLong|2.27| | |
|XO_RO_SHI_RO_128_PP|nextLong|4.08|1.81| |
|XO_SHI_RO_256_PP|nextLong|4.68|2.41| |
|XO_RO_SHI_RO_1024_PP|nextLong|5.18|2.91| |
|L64_X128_SS|nextLong|5.07|2.80|1.55|
|L64_X128_MIX|nextLong|5.47|3.20|1.77|
|L64_X256_MIX|nextLong|6.27|4.00|1.66|
|L64_X1024_MIX|nextLong|6.84|4.57|1.57|
|L128_X128_MIX|nextLong|9.42|7.15|3.95|
|L128_X256_MIX|nextLong|10.18|7.91|3.28|
|L128_X1024_MIX|nextLong|11.02|8.75|3.01|

So these generators are, as expected, not as fast as the underlying XBG 
generator. In the case of the 64-bit LCG the performance is about 50-70% 
slower. For the 128-bit LCG the performance is 3-4x slower. The update step for 
the 128-bit LCG requires use of 64-bit math to compose a 128-bit result. The 
most involved part is the computation of a 128-bit multiplication. I have 
written an implementation for an equivalent to:
{code:java}
long Math.multiplyHigh(long, long);
{code}
This method was introduced in JDK 9 and is a hotspot intrinsic candidate from 
JDK 10. The actual computation requires this:
{code:java}
static long unsignedMultiplyHigh(long a, long b) {
    return Math.multiplyHigh(a, b) + ((a >> 63) & b) + ((b >> 63) & a)
}
{code}
However when manually computing the product with 64-bit longs the unsigned 
result can be done in the same number of operations as the signed result. The 
bit shift trick is therefore not required.

Note that {{Math.unsignedMultiplyHigh}} with the same signature was introduced 
in JDK 18 and is a hotspot intrinsic. I have benchmarked the intrinsic versions 
and they are noticeably faster. The update of the 128-bit LCG is still complex 
even with a hardware supported 128-bit multiply:
{code:java}
// The LCG is, in effect, "s = m * s + a" where m = ((1LL << 64) + ML)
final long u = ML * sl;
// High half
sh = ML * sh + LXMSupport.unsignedMultiplyHigh(ML, sl) + sl + ah;
// Low half
sl = u + al;
// Carry propagation
if (Long.compareUnsigned(sl, u) < 0) {
    ++sh;
}
{code}
The Long.compareUnsigned requires a branch condition to perform carry 
propagation. This code is as provided in Steele & Vigna and also in the JDK 17 
source. There may be a bit shift trick to remove this conditional although 
performance may not be altered if this only saves a few CPU cycles.
h2. Jumping

The JDK 17 generators do not support jumping. These generators prefer to do a 
split operation. The LXM family is extremely robust to sequence overlap if the 
additive parameter for the LCG is different. The JDK has a method to create a 
stream of generators. This uses a special spliterator that provides the 
additive parameter for each constructed generator during the stream operations. 
Thus all generators from the stream will have a different additive parameter.

Steele & Vigna suggest methods for jumping by advancing/rewinding the LCG and 
XBG the same number of cycles. In practice this is performed using:
 #  A large jump of the XBG that wraps the state of the LCG.
 #  Leaving the XBG state unchanged and advancing the LCG. This effectively 
rewinds the state of the LXM by the period of the XBG multiplied by the number 
of cycles advanced by the LCG.

In order to test jumping (which is already done by the Commons RNG 
implementation for the XBG), I created a routine to jump the LCG. In most cases 
for a *type-1* jump this is not used as the LCG state is wrapped entirely by 
the jump of the XBG. The only jump that is small is the jump of 2^64 and 2^96 
cycles done by XoRoShiRo128 in combination with a 128-bit LCG. I did create 
this version, benchmarked jumping and as expected it was slow.

A *type-2* jump is easy to implement but the long jump requires advancing the 
LCG a larger number of steps. This can be done using an algorithm to compute 
arbitrary strides to an LCG:
{noformat}
Brown, F.B. (1994) Random number generation with arbitrary strides,
Transactions of the American Nuclear Society 71.{noformat}
The algorithm computes the multiplier and addition to update the LCG in a 
single step. If the jump size and base LCG multiplier are constant, which is 
the case here, then the jump coefficients can be precomputed. By choosing a 
long jump that is half the cycle length of the LCG, the entire lower half of 
the state remains unchanged. Thus it is possible to jump the 128-bit LCG a 
stride of 2^64 cycles faster than moving it 1 cycle.

Here is the output of a new jump benchmark. The benchmark contains only those 
generators with a unique jump function. Some generator variants have exactly 
the same jump and can be omitted:
||Random Source||Method||Score||
|L64_X128_MIX|jump|11.3|
|L64_X256_MIX|jump|13.3|
|L128_X128_MIX|jump|16.8|
|L128_X256_MIX|jump|18.7|
|L64_X1024_MIX|jump|25.3|
|L128_X1024_MIX|jump|31.6|
|XO_RO_SHI_RO_128_PP|jump|177.3|
|XO_RO_SHI_RO_128_PLUS|jump|194.4|
|XO_SHI_RO_128_PLUS|jump|254.9|
|XO_SHI_RO_256_PLUS|jump|448.5|
|XO_SHI_RO_512_PLUS|jump|1651.9|
|XO_RO_SHI_RO_1024_PP|jump|6926.8|
|XOR_SHIFT_1024_S|jump|8224.0|
|L64_X128_MIX|longJump|11.5|
|L128_X128_MIX|longJump|13.8|
|L64_X256_MIX|longJump|14.3|
|L128_X256_MIX|longJump|17.5|
|L64_X1024_MIX|longJump|25.7|
|L128_X1024_MIX|longJump|41.7|

Here we see that jumping the LCG is much faster than jumping the parent XBG. 
The speed difference of the various LXM generators is due to the state size for 
larger generators. The generator must be copied first and then jumped. The jump 
computation is exactly the same. Note also that longJump is faster than jump 
for the smaller LXM generators with a 128-bit LCG due to the simplification of 
the state update step. The L128_X1024_MIX long jump result is an outlier here 
as it is about 30% slower than the corresponding jump.

Since there is no reference for jumping I have created a test using a composite 
LXM generator. This fuses a LCG with an existing XBG generator from the 
library. The jump function in the LCG can support any power of 2. Thus small 
jumps can be verified by iteration. Larger jumps can then be verified by two 
consecutive jumps of the next smaller power. This allows bootstrap testing of 
any power of 2 jump in the range [0, 127]. The test demonstrates that jumping 
the LCG works. There are tests to show arbitrary combinations of jumps and 
state updates put the LXM generator in the same state as the same jumps and 
state updates delivered in a different order, e.g.
{noformat}
jump, 10 cycles, jump, 13 cycles    ==  10 cycles, jump, 13 cycles, 
jump{noformat}
The composite LXM generator then serves as a reference for the jump and long 
jump function in the LXM implementation.
h2. Notes

These generators pass their seed to the parent generator, then use more state. 
If the seed is too small then it must be expanded. To avoid seed expansion in 
the parent generator, then again in the LXM generator I created a static 
function to expand the seed only once. This fills the state to the required 
length using a SplitMix style generator.

This is a departure from the state expansion procedure used in all the other 
generators in the library. It is not strictly necessary. However note that XBG 
generators are sensitive to all zero seeds and a SplitMix generator only 
outputs zero once in the period. A similar SplitMix style method is used to 
expand seeds in the JDK 17 versions from a single input long.

 

This addition does not provide the generator L32X64MixRandom. The lea32 mix 
function is not provided in Steele and Vigna's paper. Thus is is not possible 
to create this generator from publicly available information. This is an 
unfortunate omission from the library as this generator is the default 
generator in JDK 17. It would be more complete to have a back port of the 
generator in Commons RNG for use in JDK 8. It would be a simple addition to put 
this in a branch using the mix function extracted from JDK 17. It requires only 
a single constant that is not public on the JDK 17 package javadoc for the 
generators. I will raise an enhancement request with Oracle to make this mix 
function part of the javadoc for completeness. They may respond positively 
given the package javadoc is so comprehensive the other generators can be 
created just from information in that page since it includes the LCG 
multipliers, XBG update shifts and the state combination steps. It simply omits 
the mix constants from the lea32 and lea64 mix functions. Note I had previously 
raised a javadoc bug in the package info that has now been fixed in more a 
recent JDK.

 

It would be possible to add LXM generators based on XoShiRo512. There would be 
no reference output. The output from a composite LCG+XBG generator could 
provide the data. But this would be testing an implementation using a second 
implementation from the same codebase. It also seems redundant as the 
XoRoShiRo1024 base generator has a larger period and would be essentially the 
same speed. All you gain is a smaller memory footprint than XoRoShiRo1024 and a 
slightly larger period than the XoShiRo256 based generator.

 

All new code in [PR 104|https://github.com/apache/commons-rng/pull/104]

> LXM family of random number generators
> --------------------------------------
>
>                 Key: RNG-168
>                 URL: https://issues.apache.org/jira/browse/RNG-168
>             Project: Commons RNG
>          Issue Type: New Feature
>          Components: core, simple
>    Affects Versions: 1.4
>            Reporter: Alex Herbert
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> JDK 17 updates the random number support and adds a LXM family of random 
> number generators. The generators are described in the following paper:
> {noformat}
> Steele and Vigna (2021)
> LXM: better splittable pseudorandom number generators (and almost as fast). 
> Proceedings of the ACM on Programming Languages, Volume 5, Article 148, pp 
> 1–31
> {noformat}
> [https://doi.org/10.1145/3485525]
> The generators mix the output of a linear conguential generator (LCG; L) and 
> a Xor-based generator (XBG; X). The paper provides code examples for 
> L64X128MixRandom and L128X256MixRandom from JDK 17.
> Details of the combinations of the linear conguential generator (LCG; L) and 
> Xor-based generators (XBG; X) used in JDK 17 are provided in the package 
> javadoc:
> [JDK 17 
> java.util.random|https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/random/package-summary.html]
> The current commons RNG library implements all the XBG generators. Adding 
> support for these generators should add a LCG to combine with the output of 
> the XBG via the specified mix function.
> All constants are supplied in the JDK 17 package javadoc to allow 
> implementation with the exception of the mixing functions lea32 and lea64. 
> The lea64 function is provided in the paper above but the reference origin is 
> personal communication with Doug Lea (2013). Thus the lea32 mix function for 
> 32-bit generators is not published. Obtaining from the JDK source code is 
> possible but would be subject to the Java licence.
> The generators support jump functionality and are suitable for streaming in 
> large numbers for parallel applications (the paper tests up to 2^24 
> generators) by using a different additive constant for the LCG for each 
> generator to ensure no sequence overlap. Creating a new additive constant is 
> significantly faster than performing a jump operation.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (RNG-168) LXM family of random number generators

Reply via email to