[ https://issues.apache.org/jira/browse/RNG-168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17510923#comment-17510923 ]
Alex Herbert commented on RNG-168: ---------------------------------- I have created a set of LXM generators using the examples provided in Steele & Vigna (2021). These generators all use the same basic steps: {noformat} s = state of LCG t = state of XBG generate(): z <- mix(combine(w upper bits of s, w upper bits of t)) s <- LCG state update t <- XBG state update return z{noformat} All the generators use w as 64-bits with the combine operation being an addition. The mix function is a variant on the final mixing function used in MurmurHash3. This mix function, with different constants, is used in the SplitMix64 generator and is provided in Steele and Vigna's paper. One exception is a faster mix function that matches that used in Vigna's StarStar generators such as XoRoShiRo128StarStar. I have added the following classes which implement the same LXM generators as JDK 17: ||JDK 17||Class||RandomSource||XBG|| |L64X128StarStarRandom|L64X128StarStar|L64_X128_SS|XoRoShiRo128| |L64X128MixRandom|L64X128Mix|L64_X128_MIX|XoRoShiRo128| |L64X256MixRandom|L64X256Mix|L64_X256_MIX|XoShiRo256| |L64X1024MixRandom|L64X1024Mix|L64_X1024_MIX|XoRoShiRo1024| |L128X128MixRandom|L128X128Mix|L128_X128_MIX|XoRoShiRo128| |L128X256MixRandom|L128X256Mix|L128_X256_MIX|XoShiRo256| |L128X1024MixRandom|L128X1024Mix|L128_X1024_MIX|XoRoShiRo1024| I have used the equivalent class in the JDK 17 these to create reference data for the new generators. In doing so I found two bugs in the seeding and construction routines of the JDK generators. These have been raised as issues ([JDK-8283083|https://bugs.java.com/bugdatabase/view_bug.do?bug_id=JDK-8283083] and [JDK-8282144|https://bugs.java.com/bugdatabase/view_bug.do?bug_id=JDK-8282144]). Thus the seeds used for the reference generators could not be a full random byte[] of the correct length. Six generators can be seeded using a random byte[] that uses only positively signed bytes in any position that is not modulus 8 == 0. One generator has to be seeded using a single long value. I extracted the routine used in JDK 17 to generate the full seed and recorded it for the unit tests. This routine initialises the state of the 128-bit LCG to 1 so the seeding for the reference data could be improved when the bugs are fixed. One item to note is that the byte[] seed is converted in the JDK generators using big-endian format. The bytes create in order: the LCG additive parameter; the LCG state; and the XBG state. Seed conversion in the RNG simple module uses little-endian format. So it would be impossible to create a generator using the same seed the outputs the same sequence without a modification of the byte[] seed conversion in RNG simple, i.e. {code:java} byte[] seed = SecureRandom.generateSeed(Long.Bytes * 6); RandomGenerator g = RandomGeneratorFactory.of("L128X128MixRandom").create(seed); UniformRandomProvider rng = RandomSource.L128_X128_MIX.create(seed); g.nextLong() != rng.nextLong(); {code} In addition, since the generators in commons RNG have a base XBG implementation already in the library, the seed must be passed up to the super-class during construction. It is simplest to leave the initial part of the long[] the same and add the extra seed bytes to the end. Thus the order of the seed in Commons RNG is: the XBG state; the LCG state; and the LCG additive parameter. This is a swap of each part of the seed. Leaving the additive parameter at the end is an arbitrary choice but note that it is a requirement for this to be odd. The constructor sets the input value to odd. If this is the last component of the seed then it can be documented that the seed is used entirely except for the final trailing bit. h2. Quality Here are results from the various test suites: {noformat} RNG Dieharder ∩ TestU01 (BigCrush) ∩ PractRand ∩ L64_X128_SS 0,1,0,0,0 0,0,0,0,0 -,-,- L64_X128_MIX 0,0,0,0,0 0,0,0,1,2 -,-,- L64_X256_MIX 0,0,0,0,0 0,0,0,0,0 -,-,- L64_X1024_MIX 0,0,0,0,0 0,0,1,0,1 -,-,- L128_X128_MIX 1,0,0,0,0 0,1,2,0,0 -,-,- L128_X256_MIX 1,0,0,0,0 0,0,0,2,0 -,-,- L128_X1024_MIX 0,0,0,0,0 0,0,0,2,1 -,-,- {noformat} No surprises here. These are high quality generators that pass 4TB of output through PractRand and have a few spurious fails in Dieharder and BigCrush. h2. Performance Here is a JMH benchmark result for nextLong run on JDK 11: ||Random Source||Method||Score||Baselined||Relative to parent XBG|| | |baselineLong|2.27| | | |XO_RO_SHI_RO_128_PP|nextLong|4.08|1.81| | |XO_SHI_RO_256_PP|nextLong|4.68|2.41| | |XO_RO_SHI_RO_1024_PP|nextLong|5.18|2.91| | |L64_X128_SS|nextLong|5.07|2.80|1.55| |L64_X128_MIX|nextLong|5.47|3.20|1.77| |L64_X256_MIX|nextLong|6.27|4.00|1.66| |L64_X1024_MIX|nextLong|6.84|4.57|1.57| |L128_X128_MIX|nextLong|9.42|7.15|3.95| |L128_X256_MIX|nextLong|10.18|7.91|3.28| |L128_X1024_MIX|nextLong|11.02|8.75|3.01| So these generators are, as expected, not as fast as the underlying XBG generator. In the case of the 64-bit LCG the performance is about 50-70% slower. For the 128-bit LCG the performance is 3-4x slower. The update step for the 128-bit LCG requires use of 64-bit math to compose a 128-bit result. The most involved part is the computation of a 128-bit multiplication. I have written an implementation for an equivalent to: {code:java} long Math.multiplyHigh(long, long); {code} This method was introduced in JDK 9 and is a hotspot intrinsic candidate from JDK 10. The actual computation requires this: {code:java} static long unsignedMultiplyHigh(long a, long b) { return Math.multiplyHigh(a, b) + ((a >> 63) & b) + ((b >> 63) & a) } {code} However when manually computing the product with 64-bit longs the unsigned result can be done in the same number of operations as the signed result. The bit shift trick is therefore not required. Note that {{Math.unsignedMultiplyHigh}} with the same signature was introduced in JDK 18 and is a hotspot intrinsic. I have benchmarked the intrinsic versions and they are noticeably faster. The update of the 128-bit LCG is still complex even with a hardware supported 128-bit multiply: {code:java} // The LCG is, in effect, "s = m * s + a" where m = ((1LL << 64) + ML) final long u = ML * sl; // High half sh = ML * sh + LXMSupport.unsignedMultiplyHigh(ML, sl) + sl + ah; // Low half sl = u + al; // Carry propagation if (Long.compareUnsigned(sl, u) < 0) { ++sh; } {code} The Long.compareUnsigned requires a branch condition to perform carry propagation. This code is as provided in Steele & Vigna and also in the JDK 17 source. There may be a bit shift trick to remove this conditional although performance may not be altered if this only saves a few CPU cycles. h2. Jumping The JDK 17 generators do not support jumping. These generators prefer to do a split operation. The LXM family is extremely robust to sequence overlap if the additive parameter for the LCG is different. The JDK has a method to create a stream of generators. This uses a special spliterator that provides the additive parameter for each constructed generator during the stream operations. Thus all generators from the stream will have a different additive parameter. Steele & Vigna suggest methods for jumping by advancing/rewinding the LCG and XBG the same number of cycles. In practice this is performed using: # A large jump of the XBG that wraps the state of the LCG. # Leaving the XBG state unchanged and advancing the LCG. This effectively rewinds the state of the LXM by the period of the XBG multiplied by the number of cycles advanced by the LCG. In order to test jumping (which is already done by the Commons RNG implementation for the XBG), I created a routine to jump the LCG. In most cases for a *type-1* jump this is not used as the LCG state is wrapped entirely by the jump of the XBG. The only jump that is small is the jump of 2^64 and 2^96 cycles done by XoRoShiRo128 in combination with a 128-bit LCG. I did create this version, benchmarked jumping and as expected it was slow. A *type-2* jump is easy to implement but the long jump requires advancing the LCG a larger number of steps. This can be done using an algorithm to compute arbitrary strides to an LCG: {noformat} Brown, F.B. (1994) Random number generation with arbitrary strides, Transactions of the American Nuclear Society 71.{noformat} The algorithm computes the multiplier and addition to update the LCG in a single step. If the jump size and base LCG multiplier are constant, which is the case here, then the jump coefficients can be precomputed. By choosing a long jump that is half the cycle length of the LCG, the entire lower half of the state remains unchanged. Thus it is possible to jump the 128-bit LCG a stride of 2^64 cycles faster than moving it 1 cycle. Here is the output of a new jump benchmark. The benchmark contains only those generators with a unique jump function. Some generator variants have exactly the same jump and can be omitted: ||Random Source||Method||Score|| |L64_X128_MIX|jump|11.3| |L64_X256_MIX|jump|13.3| |L128_X128_MIX|jump|16.8| |L128_X256_MIX|jump|18.7| |L64_X1024_MIX|jump|25.3| |L128_X1024_MIX|jump|31.6| |XO_RO_SHI_RO_128_PP|jump|177.3| |XO_RO_SHI_RO_128_PLUS|jump|194.4| |XO_SHI_RO_128_PLUS|jump|254.9| |XO_SHI_RO_256_PLUS|jump|448.5| |XO_SHI_RO_512_PLUS|jump|1651.9| |XO_RO_SHI_RO_1024_PP|jump|6926.8| |XOR_SHIFT_1024_S|jump|8224.0| |L64_X128_MIX|longJump|11.5| |L128_X128_MIX|longJump|13.8| |L64_X256_MIX|longJump|14.3| |L128_X256_MIX|longJump|17.5| |L64_X1024_MIX|longJump|25.7| |L128_X1024_MIX|longJump|41.7| Here we see that jumping the LCG is much faster than jumping the parent XBG. The speed difference of the various LXM generators is due to the state size for larger generators. The generator must be copied first and then jumped. The jump computation is exactly the same. Note also that longJump is faster than jump for the smaller LXM generators with a 128-bit LCG due to the simplification of the state update step. The L128_X1024_MIX long jump result is an outlier here as it is about 30% slower than the corresponding jump. Since there is no reference for jumping I have created a test using a composite LXM generator. This fuses a LCG with an existing XBG generator from the library. The jump function in the LCG can support any power of 2. Thus small jumps can be verified by iteration. Larger jumps can then be verified by two consecutive jumps of the next smaller power. This allows bootstrap testing of any power of 2 jump in the range [0, 127]. The test demonstrates that jumping the LCG works. There are tests to show arbitrary combinations of jumps and state updates put the LXM generator in the same state as the same jumps and state updates delivered in a different order, e.g. {noformat} jump, 10 cycles, jump, 13 cycles == 10 cycles, jump, 13 cycles, jump{noformat} The composite LXM generator then serves as a reference for the jump and long jump function in the LXM implementation. h2. Notes These generators pass their seed to the parent generator, then use more state. If the seed is too small then it must be expanded. To avoid seed expansion in the parent generator, then again in the LXM generator I created a static function to expand the seed only once. This fills the state to the required length using a SplitMix style generator. This is a departure from the state expansion procedure used in all the other generators in the library. It is not strictly necessary. However note that XBG generators are sensitive to all zero seeds and a SplitMix generator only outputs zero once in the period. A similar SplitMix style method is used to expand seeds in the JDK 17 versions from a single input long. This addition does not provide the generator L32X64MixRandom. The lea32 mix function is not provided in Steele and Vigna's paper. Thus is is not possible to create this generator from publicly available information. This is an unfortunate omission from the library as this generator is the default generator in JDK 17. It would be more complete to have a back port of the generator in Commons RNG for use in JDK 8. It would be a simple addition to put this in a branch using the mix function extracted from JDK 17. It requires only a single constant that is not public on the JDK 17 package javadoc for the generators. I will raise an enhancement request with Oracle to make this mix function part of the javadoc for completeness. They may respond positively given the package javadoc is so comprehensive the other generators can be created just from information in that page since it includes the LCG multipliers, XBG update shifts and the state combination steps. It simply omits the mix constants from the lea32 and lea64 mix functions. Note I had previously raised a javadoc bug in the package info that has now been fixed in more a recent JDK. It would be possible to add LXM generators based on XoShiRo512. There would be no reference output. The output from a composite LCG+XBG generator could provide the data. But this would be testing an implementation using a second implementation from the same codebase. It also seems redundant as the XoRoShiRo1024 base generator has a larger period and would be essentially the same speed. All you gain is a smaller memory footprint than XoRoShiRo1024 and a slightly larger period than the XoShiRo256 based generator. All new code in [PR 104|https://github.com/apache/commons-rng/pull/104] > LXM family of random number generators > -------------------------------------- > > Key: RNG-168 > URL: https://issues.apache.org/jira/browse/RNG-168 > Project: Commons RNG > Issue Type: New Feature > Components: core, simple > Affects Versions: 1.4 > Reporter: Alex Herbert > Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > JDK 17 updates the random number support and adds a LXM family of random > number generators. The generators are described in the following paper: > {noformat} > Steele and Vigna (2021) > LXM: better splittable pseudorandom number generators (and almost as fast). > Proceedings of the ACM on Programming Languages, Volume 5, Article 148, pp > 1–31 > {noformat} > [https://doi.org/10.1145/3485525] > The generators mix the output of a linear conguential generator (LCG; L) and > a Xor-based generator (XBG; X). The paper provides code examples for > L64X128MixRandom and L128X256MixRandom from JDK 17. > Details of the combinations of the linear conguential generator (LCG; L) and > Xor-based generators (XBG; X) used in JDK 17 are provided in the package > javadoc: > [JDK 17 > java.util.random|https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/random/package-summary.html] > The current commons RNG library implements all the XBG generators. Adding > support for these generators should add a LCG to combine with the output of > the XBG via the specified mix function. > All constants are supplied in the JDK 17 package javadoc to allow > implementation with the exception of the mixing functions lea32 and lea64. > The lea64 function is provided in the paper above but the reference origin is > personal communication with Doug Lea (2013). Thus the lea32 mix function for > 32-bit generators is not published. Obtaining from the JDK source code is > possible but would be subject to the Java licence. > The generators support jump functionality and are suitable for streaming in > large numbers for parallel applications (the paper tests up to 2^24 > generators) by using a different additive constant for the LCG for each > generator to ensure no sequence overlap. Creating a new additive constant is > significantly faster than performing a jump operation. -- This message was sent by Atlassian Jira (v8.20.1#820001)