Re: 8215441: Increase uniformity of the distribution of BigIntegers constructed by BigInteger(int, Random)

Peter Levart Thu, 20 Dec 2018 07:24:21 -0800

Hi Brian,

I don't quite understand this reasoning. I need some explanation...

You are talking about the distribution of BigInteger values (not the bitlengths of the values), obtained with constructor BigInteger(int,java.util.Random) for which the javadoc says:

* Constructs a randomly generated BigInteger, uniformlydistributed over

     * the range 0 to (2<sup>{@code numBits}</sup> - 1), inclusive.

* The uniformity of the distribution assumes that a fair source ofrandom

     * bits is provided in {@code rnd}.  Note that this constructor always
     * constructs a non-negative BigInteger.

I think that the javadoc meant to say that the *values* of theBigIntegers constructed are uniformly distributed...


On 12/20/18 5:01 AM, Brian Burkhalter wrote:

https://bugs.openjdk.java.net/browse/JDK-8215441

This issue was filed to cover improving the uniformity of randomly generated 
BigIntegers. It is not intended to resolve [1] which is deliberately left open. 
The proposed patch implements a modified version of the “workaround” suggested 
in [1].

The problem is that the magnitude of the random BigInteger is created from a 
sequence of bytes generated by Random.nextBytes() [2]. The likelihood that any 
of these bytes is zero is small

It should be 1/256 right? If the bytes returned by Random.nextBytes()are good random bytes, they are independent and their distribution isuniform. So the likelihood of an individual byte being 0 isapproximately the same as the likelihood of it being any particularother byte value. As there are 256 different values a byte can have,this amounts to 1/256...

  so the distribution of the resulting random BigIntegers is skewed towards 
values close to the maximum bit size “numBits” specified to the constructor.

I think that's a normal consequence of uniform distribution of values.The "bit length" of a BigInteger value is the index of its highest 1 bit(+1 counted from lowest bit0). Say you produce a uniformly distributedrandom BigInteger in range 0 ... 2^8 - 1 (the bitLength constructorparameter being 8 in this case)

The probability that you get an integer with "bit length" of 8 is 1/2(that's the probability of a uniformly distributed 8 bit integer havingthe highest bit 1)... 1XXXXXXXThe probability that you get an integer with "bit lenght" of 7 is 1/4(that's the probability of a uniformly distributed 8 bit integer havingthe highest bit 0 and the next to highest bit 1)... 01XXXXXX

Probability you get bit length 6 is 1/8 ... 001XXXXX
Probability you get bit length 5 is 1/16 ... 0001XXXX
Probability you get bit length 4 is 1/32 ... 00001XXX
Probability you get bit length 3 is 1/64 ... 000001XX
Probability you get bit length 2 is 1/128 ... 0000001X

Probability you get bit length 1 is 1/256 ... (the probability ofgetting value 1) ... 00000001Probability you get bit length 0 is 1/256 ... (the probability ofgetting value 0) ... 00000000

In general, the probability that you get an integer with bit length N isequal to the probability that you get an integer with bit length < N.

The probability that you get 0 from BigInteger random constructor whenyou pass to it numBits is 1/(2^numBits). When you choose 256 fornumBits, that's a very slim probability. Practically zero. Let alonewhen you choose 4096.


The workaround suggested in [1] is to randomly change numBits to the value 
numBits = Random.nextInt(numBits + 1) [3]. (Actually the suggested workaround 
is nextInt(numBits) which is incorrect as the parameter is an exclusive upper 
bound.) This greatly improves the uniformity of the distribution. A remaining 
problem however is that now the very largest numbers in the interval 
[0,2^numBits) are underrepresented. A modification of this approach is to 
increment the new value of numBits as numBits = Random.nextInt(numBits + 1) + 1 
[4]. This was empirically observed to improve the underrepresentation of the 
largest values.

The distribution of the random BigIntegers was estimated using [5]. For a given 
maximum bit length, bin size, and number of random values to generate, this 
creates a histogram and calculates the coefficient of variation of the numbers 
generated. The histogram bin at index zero represents the largest valued bin in 
the result. The count in a given histogram bin is the number of values for 
which that bin is the leftmost (largest valued) with at least one non-zero bit. 
The bin of maximum index represents zero.

What exactly are you trying to show with this code? I'm strugling tounderstand it...

If you wanted to show the distribution of BigInteger values into equallysized bins by the value (each bin representing the sub-interval in theinterval 0 ... 2^numBits -1) then the code would be much simpler. Butyou are trying to show something else... What is that?


Regards, Peter

Re: 8215441: Increase uniformity of the distribution of BigIntegers constructed by BigInteger(int, Random)

Reply via email to