Alex D Herbert created RNG-113:
----------------------------------

             Summary: Speed improvement to PCG 32 Xsh-Rs implementation
                 Key: RNG-113
                 URL: https://issues.apache.org/jira/browse/RNG-113
             Project: Commons RNG
          Issue Type: Improvement
          Components: core
    Affects Versions: 1.3
            Reporter: Alex D Herbert


Investigate a possible speed increase for the PCG 32 Xsh-Rs generator for the 
methods nextLong and nextDouble.

The default implementation for nextLong() and nextDouble() for IntProvider is:
{code:java}
public long nextLong() {
    return NumberFactory.makeLong(nextInt(), nextInt());
}

public double nextDouble() {
    return NumberFactory.makeDouble(nextInt(), nextInt());
}
{code}
Each of these methods composes a long from two int values converted to long.

The Pcg32 implementation of nextInt() creates the int from a primitive cast 
conversion of a long. *Thus the current default implementation for nextLong and 
nextDouble is performing a round trip of a long to an int to a long twice 
over.* The same logic can be implemented by a special implementation of the 
methods that work only using long types.
{code:java}
public long nextLong() {
    // Get two values from the LCG
    final long x = state;
    final long y = bump(state);
    state = bump(y);
    // Perform mix function.
    // For a 32-bit output the x bits should be shifted down (22 + (int) (x >>> 
61)).
    // Leave in the upper bits by shift up 32 - (22 + (int) (x >>> 61))
    final long upper = (x ^ (x >>> 22)) << (10 - (int) (x >>> 61));
    final long lower = (y ^ (y >>> 22)) >>> (22 + (int) (y >>> 61));
    return (upper & 0xffffffff00000000L) | (lower & 0xffffffffL);
}

/** Upper mask for top 26 bits shifted by 27 bits. */
private static final long MASK1 = ((long) (0xffffffff >>> 6)) << 27;
/** Lower mask shifted by 5 for 27 bits. */
private static final long MASK2 = 0xffffffffL >>> 5;

public double nextDouble() {
    // Get two values from the LCG
    final long x = state;
    final long y = bump(state);
    state = bump(y);
    // Perform mix function.
    // For a 32-bit output the x bits should be shifted down (22 + (int) (x >>> 
61)).
    // To match nextDouble requires 26-bits from int 1 and 27 bits from int 2.
    // Int 1 is stored in the upper 32-bits as per nextLong() but shifted down 
11 and
    // then masked to keep the upper 26-bits. Discard an extra 5 from int 2.
    final long upper = (x ^ (x >>> 22)) >>> (1 + (int) (x >>> 61));
    final long lower = (y ^ (y >>> 22)) >>> (27 + (int) (y >>> 61));
    return ((upper & MASK1) | (lower & MASK2)) * 0x1.0p-53;
}
{code}
These implementations require that the private {{state}} member and {{bump()}} 
method of the parent AbstractPcg6432 are made accessible.

The change should be tested to determine if there is a performance increase 
with a custom implementation.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to