Hi Brian,

I don't quite understand this reasoning. I need some explanation...

You are talking about the distribution of BigInteger values (not the bit lengths of the values), obtained with constructor BigInteger(int, java.util.Random) for which the javadoc says:

     * Constructs a randomly generated BigInteger, uniformly distributed over
     * the range 0 to (2<sup>{@code numBits}</sup> - 1), inclusive.
     * The uniformity of the distribution assumes that a fair source of random
     * bits is provided in {@code rnd}.  Note that this constructor always
     * constructs a non-negative BigInteger.

I think that the javadoc meant to say that the *values* of the BigIntegers constructed are uniformly distributed...

On 12/20/18 5:01 AM, Brian Burkhalter wrote:
https://bugs.openjdk.java.net/browse/JDK-8215441

This issue was filed to cover improving the uniformity of randomly generated 
BigIntegers. It is not intended to resolve [1] which is deliberately left open. 
The proposed patch implements a modified version of the “workaround” suggested 
in [1].

The problem is that the magnitude of the random BigInteger is created from a 
sequence of bytes generated by Random.nextBytes() [2]. The likelihood that any 
of these bytes is zero is small

It should be 1/256 right? If the bytes returned by Random.nextBytes() are good random bytes, they are independent and their distribution is uniform. So the likelihood of an individual byte being 0 is approximately the same as the likelihood of it being any particular other byte value. As there are 256 different values a byte can have, this amounts to 1/256...

  so the distribution of the resulting random BigIntegers is skewed towards 
values close to the maximum bit size “numBits” specified to the constructor.

I think that's a normal consequence of uniform distribution of values. The "bit length" of a BigInteger value is the index of its highest 1 bit (+1 counted from lowest bit0). Say you produce a uniformly distributed random BigInteger in range 0 ... 2^8 - 1 (the bitLength constructor parameter being 8 in this case)

The probability that you get an integer with "bit length" of 8 is 1/2 (that's the probability of a uniformly distributed 8 bit integer having the highest bit 1)... 1XXXXXXX The probability that you get an integer with "bit lenght" of 7 is 1/4 (that's the probability of a uniformly distributed 8 bit integer having the highest bit 0 and the next to highest bit 1)... 01XXXXXX
Probability you get bit length 6 is 1/8 ... 001XXXXX
Probability you get bit length 5 is 1/16 ... 0001XXXX
Probability you get bit length 4 is 1/32 ... 00001XXX
Probability you get bit length 3 is 1/64 ... 000001XX
Probability you get bit length 2 is 1/128 ... 0000001X
Probability you get bit length 1 is 1/256 ... (the probability of getting value 1) ... 00000001 Probability you get bit length 0 is 1/256 ... (the probability of getting value 0) ... 00000000

In general, the probability that you get an integer with bit length N is equal to the probability that you get an integer with bit length < N.

The probability that you get 0 from BigInteger random constructor when you pass to it numBits is 1/(2^numBits). When you choose 256 for numBits, that's a very slim probability. Practically zero. Let alone when you choose 4096.


The workaround suggested in [1] is to randomly change numBits to the value 
numBits = Random.nextInt(numBits + 1) [3]. (Actually the suggested workaround 
is nextInt(numBits) which is incorrect as the parameter is an exclusive upper 
bound.) This greatly improves the uniformity of the distribution. A remaining 
problem however is that now the very largest numbers in the interval 
[0,2^numBits) are underrepresented. A modification of this approach is to 
increment the new value of numBits as numBits = Random.nextInt(numBits + 1) + 1 
[4]. This was empirically observed to improve the underrepresentation of the 
largest values.

The distribution of the random BigIntegers was estimated using [5]. For a given 
maximum bit length, bin size, and number of random values to generate, this 
creates a histogram and calculates the coefficient of variation of the numbers 
generated. The histogram bin at index zero represents the largest valued bin in 
the result. The count in a given histogram bin is the number of values for 
which that bin is the leftmost (largest valued) with at least one non-zero bit. 
The bin of maximum index represents zero.

What exactly are you trying to show with this code? I'm strugling to understand it...

If you wanted to show the distribution of BigInteger values into equally sized bins by the value (each bin representing the sub-interval in the interval 0 ... 2^numBits -1) then the code would be much simpler. But you are trying to show something else... What is that?

Regards, Peter

Reply via email to