Alex D Herbert created RNG-113: ---------------------------------- Summary: Speed improvement to PCG 32 Xsh-Rs implementation Key: RNG-113 URL: https://issues.apache.org/jira/browse/RNG-113 Project: Commons RNG Issue Type: Improvement Components: core Affects Versions: 1.3 Reporter: Alex D Herbert
Investigate a possible speed increase for the PCG 32 Xsh-Rs generator for the methods nextLong and nextDouble. The default implementation for nextLong() and nextDouble() for IntProvider is: {code:java} public long nextLong() { return NumberFactory.makeLong(nextInt(), nextInt()); } public double nextDouble() { return NumberFactory.makeDouble(nextInt(), nextInt()); } {code} Each of these methods composes a long from two int values converted to long. The Pcg32 implementation of nextInt() creates the int from a primitive cast conversion of a long. *Thus the current default implementation for nextLong and nextDouble is performing a round trip of a long to an int to a long twice over.* The same logic can be implemented by a special implementation of the methods that work only using long types. {code:java} public long nextLong() { // Get two values from the LCG final long x = state; final long y = bump(state); state = bump(y); // Perform mix function. // For a 32-bit output the x bits should be shifted down (22 + (int) (x >>> 61)). // Leave in the upper bits by shift up 32 - (22 + (int) (x >>> 61)) final long upper = (x ^ (x >>> 22)) << (10 - (int) (x >>> 61)); final long lower = (y ^ (y >>> 22)) >>> (22 + (int) (y >>> 61)); return (upper & 0xffffffff00000000L) | (lower & 0xffffffffL); } /** Upper mask for top 26 bits shifted by 27 bits. */ private static final long MASK1 = ((long) (0xffffffff >>> 6)) << 27; /** Lower mask shifted by 5 for 27 bits. */ private static final long MASK2 = 0xffffffffL >>> 5; public double nextDouble() { // Get two values from the LCG final long x = state; final long y = bump(state); state = bump(y); // Perform mix function. // For a 32-bit output the x bits should be shifted down (22 + (int) (x >>> 61)). // To match nextDouble requires 26-bits from int 1 and 27 bits from int 2. // Int 1 is stored in the upper 32-bits as per nextLong() but shifted down 11 and // then masked to keep the upper 26-bits. Discard an extra 5 from int 2. final long upper = (x ^ (x >>> 22)) >>> (1 + (int) (x >>> 61)); final long lower = (y ^ (y >>> 22)) >>> (27 + (int) (y >>> 61)); return ((upper & MASK1) | (lower & MASK2)) * 0x1.0p-53; } {code} These implementations require that the private {{state}} member and {{bump()}} method of the parent AbstractPcg6432 are made accessible. The change should be tested to determine if there is a performance increase with a custom implementation. -- This message was sent by Atlassian JIRA (v7.6.14#76016)