I'm not sure what the problem is. If X is uniform, then "the number of leading zero bits" of X is exponential. The probability of getting a "small" number, in which the first (say) 32 bits are 0 is 2^-32. If this is what is measured by the histrogram[5], then the current implementation looks correct to me.

On 12/19/2018 11:01 PM, Brian Burkhalter wrote:
https://bugs.openjdk.java.net/browse/JDK-8215441

This issue was filed to cover improving the uniformity of randomly generated 
BigIntegers. It is not intended to resolve [1] which is deliberately left open. 
The proposed patch implements a modified version of the “workaround” suggested 
in [1].

The problem is that the magnitude of the random BigInteger is created from a 
sequence of bytes generated by Random.nextBytes() [2]. The likelihood that any 
of these bytes is zero is small so the distribution of the resulting random 
BigIntegers is skewed towards values close to the maximum bit size “numBits” 
specified to the constructor.

The workaround suggested in [1] is to randomly change numBits to the value 
numBits = Random.nextInt(numBits + 1) [3]. (Actually the suggested workaround 
is nextInt(numBits) which is incorrect as the parameter is an exclusive upper 
bound.) This greatly improves the uniformity of the distribution. A remaining 
problem however is that now the very largest numbers in the interval 
[0,2^numBits) are underrepresented. A modification of this approach is to 
increment the new value of numBits as numBits = Random.nextInt(numBits + 1) + 1 
[4]. This was empirically observed to improve the underrepresentation of the 
largest values.

The distribution of the random BigIntegers was estimated using [5]. For a given 
maximum bit length, bin size, and number of random values to generate, this 
creates a histogram and calculates the coefficient of variation of the numbers 
generated. The histogram bin at index zero represents the largest valued bin in 
the result. The count in a given histogram bin is the number of values for 
which that bin is the leftmost (largest valued) with at least one non-zero bit. 
The bin of maximum index represents zero.

Results for the current and two modified approaches for 256 bits with a 1-bit 
bin size and for 4096 bits with a 4-bit bin size are given at [6-11]. As may be 
observed, the original histogram is clustered towards the largest possible 
value 2^numBits - 1, and the coefficient of variation is small. The results for 
the two variants of the patch show a flattened distribution, i.e., more 
uniform, and a significantly larger coefficient of variation. The second 
approach shows better flattening of both ends of the histogram. These results 
are samples only but are exemplary of the results observed over numerous runs 
of this code.

The test ModPow is modified as the modPow() method throws an 
ArithmeticException for a zero modulus. The current algorithm never generates a 
random BigInteger equal to zero however so that exception never occurs. That is 
not the case for either modified version.

Thanks,

Brian

[1] https://bugs.openjdk.java.net/browse/JDK-8146153
[2] 
https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/Random.html#nextBytes(byte[])
[3] http://cr.openjdk.java.net/~bpb/8146153/webrev.00/
[4] http://cr.openjdk.java.net/~bpb/8146153/webrev.01/
[5] http://cr.openjdk.java.net/~bpb/8146153/StatsBigIntegerRandom.java
[6] http://cr.openjdk.java.net/~bpb/8146153/before-256-1.txt
[7] http://cr.openjdk.java.net/~bpb/8146153/after-256-1.txt
[8] http://cr.openjdk.java.net/~bpb/8146153/after-extra-bit-256-1.txt
[9] http://cr.openjdk.java.net/~bpb/8146153/before-4096-4.txt
[10] http://cr.openjdk.java.net/~bpb/8146153/after-4096-4.txt
[11] http://cr.openjdk.java.net/~bpb/8146153/after-extra-bit-4096-4.txt

Reply via email to