I'm not sure what the problem is. If X is uniform, then "the number of
leading zero bits" of X is exponential. The probability of getting a
"small" number, in which the first (say) 32 bits are 0 is 2^-32. If this
is what is measured by the histrogram[5], then the current
implementation looks correct to me.
On 12/19/2018 11:01 PM, Brian Burkhalter wrote:
https://bugs.openjdk.java.net/browse/JDK-8215441
This issue was filed to cover improving the uniformity of randomly generated
BigIntegers. It is not intended to resolve [1] which is deliberately left open.
The proposed patch implements a modified version of the “workaround” suggested
in [1].
The problem is that the magnitude of the random BigInteger is created from a
sequence of bytes generated by Random.nextBytes() [2]. The likelihood that any
of these bytes is zero is small so the distribution of the resulting random
BigIntegers is skewed towards values close to the maximum bit size “numBits”
specified to the constructor.
The workaround suggested in [1] is to randomly change numBits to the value
numBits = Random.nextInt(numBits + 1) [3]. (Actually the suggested workaround
is nextInt(numBits) which is incorrect as the parameter is an exclusive upper
bound.) This greatly improves the uniformity of the distribution. A remaining
problem however is that now the very largest numbers in the interval
[0,2^numBits) are underrepresented. A modification of this approach is to
increment the new value of numBits as numBits = Random.nextInt(numBits + 1) + 1
[4]. This was empirically observed to improve the underrepresentation of the
largest values.
The distribution of the random BigIntegers was estimated using [5]. For a given
maximum bit length, bin size, and number of random values to generate, this
creates a histogram and calculates the coefficient of variation of the numbers
generated. The histogram bin at index zero represents the largest valued bin in
the result. The count in a given histogram bin is the number of values for
which that bin is the leftmost (largest valued) with at least one non-zero bit.
The bin of maximum index represents zero.
Results for the current and two modified approaches for 256 bits with a 1-bit
bin size and for 4096 bits with a 4-bit bin size are given at [6-11]. As may be
observed, the original histogram is clustered towards the largest possible
value 2^numBits - 1, and the coefficient of variation is small. The results for
the two variants of the patch show a flattened distribution, i.e., more
uniform, and a significantly larger coefficient of variation. The second
approach shows better flattening of both ends of the histogram. These results
are samples only but are exemplary of the results observed over numerous runs
of this code.
The test ModPow is modified as the modPow() method throws an
ArithmeticException for a zero modulus. The current algorithm never generates a
random BigInteger equal to zero however so that exception never occurs. That is
not the case for either modified version.
Thanks,
Brian
[1] https://bugs.openjdk.java.net/browse/JDK-8146153
[2]
https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/Random.html#nextBytes(byte[])
[3] http://cr.openjdk.java.net/~bpb/8146153/webrev.00/
[4] http://cr.openjdk.java.net/~bpb/8146153/webrev.01/
[5] http://cr.openjdk.java.net/~bpb/8146153/StatsBigIntegerRandom.java
[6] http://cr.openjdk.java.net/~bpb/8146153/before-256-1.txt
[7] http://cr.openjdk.java.net/~bpb/8146153/after-256-1.txt
[8] http://cr.openjdk.java.net/~bpb/8146153/after-extra-bit-256-1.txt
[9] http://cr.openjdk.java.net/~bpb/8146153/before-4096-4.txt
[10] http://cr.openjdk.java.net/~bpb/8146153/after-4096-4.txt
[11] http://cr.openjdk.java.net/~bpb/8146153/after-extra-bit-4096-4.txt