Re: 8215441: Increase uniformity of the distribution of BigIntegers constructed by BigInteger(int, Random)
I wrote the following simple test case to look at the uniformity of the distribution. I don't see any problem running it up to 4096 buckets. Admittedly I did not do any statistical tests on the buckets but by eye they look uniformly distributed. public static void main(String[] args) throws Throwable { SecureRandom sr = SecureRandom.getInstance("SHA1PRNG"); final int nBits = 4096; final int nBuckets = 128; final int count = 10; Pair[] buckets = new Pair[nBuckets]; BigInteger max = BigInteger.TWO.pow(nBits); BigInteger limit = max; BigInteger step = limit.divide(BigInteger.valueOf(buckets.length)); for (int i = 0; i < buckets.length; i++) buckets[i] = new Pair(limit = limit.subtract(step)); for (int i = 0; i < count; ++i) { // biased towards high numbers. never chooses below a high limit BigInteger number = new BigInteger(nBits, sr); int j; for (j = 0; buckets[j].limit.compareTo(number) > 0; j++) {} buckets[j].count++; } for (int i = buckets.length; i > 0; i--) System.out.print(buckets[i-1].count + (i%8==0 ? "\n" : "\t")); System.out.println(); } Douglas
Re: 8215441: Increase uniformity of the distribution of BigIntegers constructed by BigInteger(int, Random)
Correction: Half have bit length 4, 25% 3, ... > On Dec 20, 2018, at 9:08 AM, Brian Burkhalter > wrote: > > Half of the values have bit length 4, 25% have bit length 2, etc.
Re: 8215441: Increase uniformity of the distribution of BigIntegers constructed by BigInteger(int, Random)
Thanks for the comments. It looks like I got this completely wrong. For some reason I was thinking that the distribution of bit lengths should be uniform which is patently incorrect. Consider the example of a four bit integer. The possible values are 1110 1101 1100 1011 1010 1001 1000 0111 0110 0101 0100 0011 0010 0001 Half of the values have bit length 4, 25% have bit length 2, etc. Re-running my test code shows pretty much this sort of distribution of the bit lengths. It looks like the likelihood of bit lengths from the largest length downward is a geometric sequence with initial value 0.5 for the largest bit length and common ratio 0.5. So I concur with Adam that the current implementation looks correct. Therefore I think that [1] should be withdrawn and [2] closed as not an issue. Thanks, Brian [1] https://bugs.openjdk.java.net/browse/JDK-8215441 [2] https://bugs.openjdk.java.net/browse/JDK-8146153
Re: 8215441: Increase uniformity of the distribution of BigIntegers constructed by BigInteger(int, Random)
Hi Brian, I don't quite understand this reasoning. I need some explanation... You are talking about the distribution of BigInteger values (not the bit lengths of the values), obtained with constructor BigInteger(int, java.util.Random) for which the javadoc says: * Constructs a randomly generated BigInteger, uniformly distributed over * the range 0 to (2{@code numBits} - 1), inclusive. * The uniformity of the distribution assumes that a fair source of random * bits is provided in {@code rnd}. Note that this constructor always * constructs a non-negative BigInteger. I think that the javadoc meant to say that the *values* of the BigIntegers constructed are uniformly distributed... On 12/20/18 5:01 AM, Brian Burkhalter wrote: https://bugs.openjdk.java.net/browse/JDK-8215441 This issue was filed to cover improving the uniformity of randomly generated BigIntegers. It is not intended to resolve [1] which is deliberately left open. The proposed patch implements a modified version of the “workaround” suggested in [1]. The problem is that the magnitude of the random BigInteger is created from a sequence of bytes generated by Random.nextBytes() [2]. The likelihood that any of these bytes is zero is small It should be 1/256 right? If the bytes returned by Random.nextBytes() are good random bytes, they are independent and their distribution is uniform. So the likelihood of an individual byte being 0 is approximately the same as the likelihood of it being any particular other byte value. As there are 256 different values a byte can have, this amounts to 1/256... so the distribution of the resulting random BigIntegers is skewed towards values close to the maximum bit size “numBits” specified to the constructor. I think that's a normal consequence of uniform distribution of values. The "bit length" of a BigInteger value is the index of its highest 1 bit (+1 counted from lowest bit0). Say you produce a uniformly distributed random BigInteger in range 0 ... 2^8 - 1 (the bitLength constructor parameter being 8 in this case) The probability that you get an integer with "bit length" of 8 is 1/2 (that's the probability of a uniformly distributed 8 bit integer having the highest bit 1)... 1XXX The probability that you get an integer with "bit lenght" of 7 is 1/4 (that's the probability of a uniformly distributed 8 bit integer having the highest bit 0 and the next to highest bit 1)... 01XX Probability you get bit length 6 is 1/8 ... 001X Probability you get bit length 5 is 1/16 ... 0001 Probability you get bit length 4 is 1/32 ... 1XXX Probability you get bit length 3 is 1/64 ... 01XX Probability you get bit length 2 is 1/128 ... 001X Probability you get bit length 1 is 1/256 ... (the probability of getting value 1) ... 0001 Probability you get bit length 0 is 1/256 ... (the probability of getting value 0) ... In general, the probability that you get an integer with bit length N is equal to the probability that you get an integer with bit length < N. The probability that you get 0 from BigInteger random constructor when you pass to it numBits is 1/(2^numBits). When you choose 256 for numBits, that's a very slim probability. Practically zero. Let alone when you choose 4096. The workaround suggested in [1] is to randomly change numBits to the value numBits = Random.nextInt(numBits + 1) [3]. (Actually the suggested workaround is nextInt(numBits) which is incorrect as the parameter is an exclusive upper bound.) This greatly improves the uniformity of the distribution. A remaining problem however is that now the very largest numbers in the interval [0,2^numBits) are underrepresented. A modification of this approach is to increment the new value of numBits as numBits = Random.nextInt(numBits + 1) + 1 [4]. This was empirically observed to improve the underrepresentation of the largest values. The distribution of the random BigIntegers was estimated using [5]. For a given maximum bit length, bin size, and number of random values to generate, this creates a histogram and calculates the coefficient of variation of the numbers generated. The histogram bin at index zero represents the largest valued bin in the result. The count in a given histogram bin is the number of values for which that bin is the leftmost (largest valued) with at least one non-zero bit. The bin of maximum index represents zero. What exactly are you trying to show with this code? I'm strugling to understand it... If you wanted to show the distribution of BigInteger values into equally sized bins by the value (each bin representing the sub-interval in the interval 0 ... 2^numBits -1) then the code would be much simpler. But you are trying to show something else... What is that? Regards, Peter
Re: 8215441: Increase uniformity of the distribution of BigIntegers constructed by BigInteger(int, Random)
I'm not sure what the problem is. If X is uniform, then "the number of leading zero bits" of X is exponential. The probability of getting a "small" number, in which the first (say) 32 bits are 0 is 2^-32. If this is what is measured by the histrogram[5], then the current implementation looks correct to me. On 12/19/2018 11:01 PM, Brian Burkhalter wrote: https://bugs.openjdk.java.net/browse/JDK-8215441 This issue was filed to cover improving the uniformity of randomly generated BigIntegers. It is not intended to resolve [1] which is deliberately left open. The proposed patch implements a modified version of the “workaround” suggested in [1]. The problem is that the magnitude of the random BigInteger is created from a sequence of bytes generated by Random.nextBytes() [2]. The likelihood that any of these bytes is zero is small so the distribution of the resulting random BigIntegers is skewed towards values close to the maximum bit size “numBits” specified to the constructor. The workaround suggested in [1] is to randomly change numBits to the value numBits = Random.nextInt(numBits + 1) [3]. (Actually the suggested workaround is nextInt(numBits) which is incorrect as the parameter is an exclusive upper bound.) This greatly improves the uniformity of the distribution. A remaining problem however is that now the very largest numbers in the interval [0,2^numBits) are underrepresented. A modification of this approach is to increment the new value of numBits as numBits = Random.nextInt(numBits + 1) + 1 [4]. This was empirically observed to improve the underrepresentation of the largest values. The distribution of the random BigIntegers was estimated using [5]. For a given maximum bit length, bin size, and number of random values to generate, this creates a histogram and calculates the coefficient of variation of the numbers generated. The histogram bin at index zero represents the largest valued bin in the result. The count in a given histogram bin is the number of values for which that bin is the leftmost (largest valued) with at least one non-zero bit. The bin of maximum index represents zero. Results for the current and two modified approaches for 256 bits with a 1-bit bin size and for 4096 bits with a 4-bit bin size are given at [6-11]. As may be observed, the original histogram is clustered towards the largest possible value 2^numBits - 1, and the coefficient of variation is small. The results for the two variants of the patch show a flattened distribution, i.e., more uniform, and a significantly larger coefficient of variation. The second approach shows better flattening of both ends of the histogram. These results are samples only but are exemplary of the results observed over numerous runs of this code. The test ModPow is modified as the modPow() method throws an ArithmeticException for a zero modulus. The current algorithm never generates a random BigInteger equal to zero however so that exception never occurs. That is not the case for either modified version. Thanks, Brian [1] https://bugs.openjdk.java.net/browse/JDK-8146153 [2] https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/Random.html#nextBytes(byte[]) [3] http://cr.openjdk.java.net/~bpb/8146153/webrev.00/ [4] http://cr.openjdk.java.net/~bpb/8146153/webrev.01/ [5] http://cr.openjdk.java.net/~bpb/8146153/StatsBigIntegerRandom.java [6] http://cr.openjdk.java.net/~bpb/8146153/before-256-1.txt [7] http://cr.openjdk.java.net/~bpb/8146153/after-256-1.txt [8] http://cr.openjdk.java.net/~bpb/8146153/after-extra-bit-256-1.txt [9] http://cr.openjdk.java.net/~bpb/8146153/before-4096-4.txt [10] http://cr.openjdk.java.net/~bpb/8146153/after-4096-4.txt [11] http://cr.openjdk.java.net/~bpb/8146153/after-extra-bit-4096-4.txt
Re: 8215441: Increase uniformity of the distribution of BigIntegers constructed by BigInteger(int, Random)
Hm strange, never saw it this way. Would other types have the same problem (should be visible in your histogram for long as well, right?) Gruss Bernd -- http://bernd.eckenfels.net Von: core-libs-dev im Auftrag von Brian Burkhalter Gesendet: Donnerstag, Dezember 20, 2018 8:03 AM An: core-libs-dev Betreff: 8215441: Increase uniformity of the distribution of BigIntegers constructed by BigInteger(int, Random) https://bugs.openjdk.java.net/browse/JDK-8215441 This issue was filed to cover improving the uniformity of randomly generated BigIntegers. It is not intended to resolve [1] which is deliberately left open. The proposed patch implements a modified version of the “workaround” suggested in [1]. The problem is that the magnitude of the random BigInteger is created from a sequence of bytes generated by Random.nextBytes() [2]. The likelihood that any of these bytes is zero is small so the distribution of the resulting random BigIntegers is skewed towards values close to the maximum bit size “numBits” specified to the constructor. The workaround suggested in [1] is to randomly change numBits to the value numBits = Random.nextInt(numBits + 1) [3]. (Actually the suggested workaround is nextInt(numBits) which is incorrect as the parameter is an exclusive upper bound.) This greatly improves the uniformity of the distribution. A remaining problem however is that now the very largest numbers in the interval [0,2^numBits) are underrepresented. A modification of this approach is to increment the new value of numBits as numBits = Random.nextInt(numBits + 1) + 1 [4]. This was empirically observed to improve the underrepresentation of the largest values. The distribution of the random BigIntegers was estimated using [5]. For a given maximum bit length, bin size, and number of random values to generate, this creates a histogram and calculates the coefficient of variation of the numbers generated. The histogram bin at index zero represents the largest valued bin in the result. The count in a given histogram bin is the number of values for which that bin is the leftmost (largest valued) with at least one non-zero bit. The bin of maximum index represents zero. Results for the current and two modified approaches for 256 bits with a 1-bit bin size and for 4096 bits with a 4-bit bin size are given at [6-11]. As may be observed, the original histogram is clustered towards the largest possible value 2^numBits - 1, and the coefficient of variation is small. The results for the two variants of the patch show a flattened distribution, i.e., more uniform, and a significantly larger coefficient of variation. The second approach shows better flattening of both ends of the histogram. These results are samples only but are exemplary of the results observed over numerous runs of this code. The test ModPow is modified as the modPow() method throws an ArithmeticException for a zero modulus. The current algorithm never generates a random BigInteger equal to zero however so that exception never occurs. That is not the case for either modified version. Thanks, Brian [1] https://bugs.openjdk.java.net/browse/JDK-8146153 [2] https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/Random.html#nextBytes(byte[]) [3] http://cr.openjdk.java.net/~bpb/8146153/webrev.00/ [4] http://cr.openjdk.java.net/~bpb/8146153/webrev.01/ [5] http://cr.openjdk.java.net/~bpb/8146153/StatsBigIntegerRandom.java [6] http://cr.openjdk.java.net/~bpb/8146153/before-256-1.txt [7] http://cr.openjdk.java.net/~bpb/8146153/after-256-1.txt [8] http://cr.openjdk.java.net/~bpb/8146153/after-extra-bit-256-1.txt [9] http://cr.openjdk.java.net/~bpb/8146153/before-4096-4.txt [10] http://cr.openjdk.java.net/~bpb/8146153/after-4096-4.txt [11] http://cr.openjdk.java.net/~bpb/8146153/after-extra-bit-4096-4.txt
8215441: Increase uniformity of the distribution of BigIntegers constructed by BigInteger(int, Random)
https://bugs.openjdk.java.net/browse/JDK-8215441 This issue was filed to cover improving the uniformity of randomly generated BigIntegers. It is not intended to resolve [1] which is deliberately left open. The proposed patch implements a modified version of the “workaround” suggested in [1]. The problem is that the magnitude of the random BigInteger is created from a sequence of bytes generated by Random.nextBytes() [2]. The likelihood that any of these bytes is zero is small so the distribution of the resulting random BigIntegers is skewed towards values close to the maximum bit size “numBits” specified to the constructor. The workaround suggested in [1] is to randomly change numBits to the value numBits = Random.nextInt(numBits + 1) [3]. (Actually the suggested workaround is nextInt(numBits) which is incorrect as the parameter is an exclusive upper bound.) This greatly improves the uniformity of the distribution. A remaining problem however is that now the very largest numbers in the interval [0,2^numBits) are underrepresented. A modification of this approach is to increment the new value of numBits as numBits = Random.nextInt(numBits + 1) + 1 [4]. This was empirically observed to improve the underrepresentation of the largest values. The distribution of the random BigIntegers was estimated using [5]. For a given maximum bit length, bin size, and number of random values to generate, this creates a histogram and calculates the coefficient of variation of the numbers generated. The histogram bin at index zero represents the largest valued bin in the result. The count in a given histogram bin is the number of values for which that bin is the leftmost (largest valued) with at least one non-zero bit. The bin of maximum index represents zero. Results for the current and two modified approaches for 256 bits with a 1-bit bin size and for 4096 bits with a 4-bit bin size are given at [6-11]. As may be observed, the original histogram is clustered towards the largest possible value 2^numBits - 1, and the coefficient of variation is small. The results for the two variants of the patch show a flattened distribution, i.e., more uniform, and a significantly larger coefficient of variation. The second approach shows better flattening of both ends of the histogram. These results are samples only but are exemplary of the results observed over numerous runs of this code. The test ModPow is modified as the modPow() method throws an ArithmeticException for a zero modulus. The current algorithm never generates a random BigInteger equal to zero however so that exception never occurs. That is not the case for either modified version. Thanks, Brian [1] https://bugs.openjdk.java.net/browse/JDK-8146153 [2] https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/Random.html#nextBytes(byte[]) [3] http://cr.openjdk.java.net/~bpb/8146153/webrev.00/ [4] http://cr.openjdk.java.net/~bpb/8146153/webrev.01/ [5] http://cr.openjdk.java.net/~bpb/8146153/StatsBigIntegerRandom.java [6] http://cr.openjdk.java.net/~bpb/8146153/before-256-1.txt [7] http://cr.openjdk.java.net/~bpb/8146153/after-256-1.txt [8] http://cr.openjdk.java.net/~bpb/8146153/after-extra-bit-256-1.txt [9] http://cr.openjdk.java.net/~bpb/8146153/before-4096-4.txt [10] http://cr.openjdk.java.net/~bpb/8146153/after-4096-4.txt [11] http://cr.openjdk.java.net/~bpb/8146153/after-extra-bit-4096-4.txt