Re: 8215441: Increase uniformity of the distribution of BigIntegers constructed by BigInteger(int, Random)

2018-12-20 Thread Douglas Surber
I wrote the following simple test case to look at the uniformity of the 
distribution. I don't see any problem running it up to 4096 buckets. Admittedly 
 I did not do any statistical tests on the buckets but by eye they look 
uniformly distributed.


  public static void main(String[] args) throws Throwable {
SecureRandom sr = SecureRandom.getInstance("SHA1PRNG");
final int nBits = 4096;
final int nBuckets = 128;
final int count = 10;
Pair[] buckets = new Pair[nBuckets];
BigInteger max = BigInteger.TWO.pow(nBits);
BigInteger limit = max;
BigInteger step = limit.divide(BigInteger.valueOf(buckets.length));
for (int i = 0; i < buckets.length; i++) buckets[i] = new Pair(limit = 
limit.subtract(step));
for (int i = 0; i < count; ++i) {
  // biased towards high numbers. never chooses below a high limit 
  BigInteger number = new BigInteger(nBits, sr);
  int j;
  for (j = 0; buckets[j].limit.compareTo(number) > 0; j++) {}
  buckets[j].count++;
}
for (int i = buckets.length; i > 0; i--) 
System.out.print(buckets[i-1].count + (i%8==0 ? "\n" : "\t"));
System.out.println();
  }


Douglas

Re: 8215441: Increase uniformity of the distribution of BigIntegers constructed by BigInteger(int, Random)

2018-12-20 Thread Brian Burkhalter
Correction: Half have bit length 4, 25% 3, ...

> On Dec 20, 2018, at 9:08 AM, Brian Burkhalter  
> wrote:
> 
> Half of the values have bit length 4, 25% have bit length 2, etc. 



Re: 8215441: Increase uniformity of the distribution of BigIntegers constructed by BigInteger(int, Random)

2018-12-20 Thread Brian Burkhalter
Thanks for the comments. It looks like I got this completely wrong. For some 
reason I was thinking that the distribution of bit lengths should be uniform 
which is patently incorrect. Consider the example of a four bit integer. The 
possible values are

 1110 1101 1100 1011 1010 1001 1000
0111 0110 0101 0100
0011 0010
0001


Half of the values have bit length 4, 25% have bit length 2, etc. Re-running my 
test code shows pretty much this sort of distribution of the bit lengths. It 
looks like the likelihood of bit lengths from the largest length downward is a 
geometric sequence with initial value 0.5 for the largest bit length and common 
ratio 0.5. So I concur with Adam that the current implementation looks correct. 
Therefore I think that [1] should be withdrawn and [2] closed as not an issue.

Thanks,

Brian

[1] https://bugs.openjdk.java.net/browse/JDK-8215441
[2] https://bugs.openjdk.java.net/browse/JDK-8146153

Re: 8215441: Increase uniformity of the distribution of BigIntegers constructed by BigInteger(int, Random)

2018-12-20 Thread Peter Levart

Hi Brian,

I don't quite understand this reasoning. I need some explanation...

You are talking about the distribution of BigInteger values (not the bit 
lengths of the values), obtained with constructor BigInteger(int, 
java.util.Random) for which the javadoc says:


 * Constructs a randomly generated BigInteger, uniformly 
distributed over

 * the range 0 to (2{@code numBits} - 1), inclusive.
 * The uniformity of the distribution assumes that a fair source of 
random

 * bits is provided in {@code rnd}.  Note that this constructor always
 * constructs a non-negative BigInteger.

I think that the javadoc meant to say that the *values* of the 
BigIntegers constructed are uniformly distributed...


On 12/20/18 5:01 AM, Brian Burkhalter wrote:

https://bugs.openjdk.java.net/browse/JDK-8215441

This issue was filed to cover improving the uniformity of randomly generated 
BigIntegers. It is not intended to resolve [1] which is deliberately left open. 
The proposed patch implements a modified version of the “workaround” suggested 
in [1].

The problem is that the magnitude of the random BigInteger is created from a 
sequence of bytes generated by Random.nextBytes() [2]. The likelihood that any 
of these bytes is zero is small


It should be 1/256 right? If the bytes returned by Random.nextBytes() 
are good random bytes, they are independent and their distribution is 
uniform. So the likelihood of an individual byte being 0 is 
approximately the same as the likelihood of it being any particular 
other byte value. As there are 256 different values a byte can have, 
this amounts to 1/256...



  so the distribution of the resulting random BigIntegers is skewed towards 
values close to the maximum bit size “numBits” specified to the constructor.


I think that's a normal consequence of uniform distribution of values. 
The "bit length" of a BigInteger value is the index of its highest 1 bit 
(+1 counted from lowest bit0). Say you produce a uniformly distributed 
random BigInteger in range 0 ... 2^8 - 1 (the bitLength constructor 
parameter being 8 in this case)


The probability that you get an integer with "bit length" of 8 is 1/2 
(that's the probability of a uniformly distributed 8 bit integer having 
the highest bit 1)... 1XXX
The probability that you get an integer with "bit lenght" of 7 is 1/4 
(that's the probability of a uniformly distributed 8 bit integer having 
the highest bit 0 and the next to highest bit 1)... 01XX

Probability you get bit length 6 is 1/8 ... 001X
Probability you get bit length 5 is 1/16 ... 0001
Probability you get bit length 4 is 1/32 ... 1XXX
Probability you get bit length 3 is 1/64 ... 01XX
Probability you get bit length 2 is 1/128 ... 001X
Probability you get bit length 1 is 1/256 ... (the probability of 
getting value 1) ... 0001
Probability you get bit length 0 is 1/256 ... (the probability of 
getting value 0) ... 


In general, the probability that you get an integer with bit length N is 
equal to the probability that you get an integer with bit length < N.


The probability that you get 0 from BigInteger random constructor when 
you pass to it numBits is 1/(2^numBits). When you choose 256 for 
numBits, that's a very slim probability. Practically zero. Let alone 
when you choose 4096.




The workaround suggested in [1] is to randomly change numBits to the value 
numBits = Random.nextInt(numBits + 1) [3]. (Actually the suggested workaround 
is nextInt(numBits) which is incorrect as the parameter is an exclusive upper 
bound.) This greatly improves the uniformity of the distribution. A remaining 
problem however is that now the very largest numbers in the interval 
[0,2^numBits) are underrepresented. A modification of this approach is to 
increment the new value of numBits as numBits = Random.nextInt(numBits + 1) + 1 
[4]. This was empirically observed to improve the underrepresentation of the 
largest values.

The distribution of the random BigIntegers was estimated using [5]. For a given 
maximum bit length, bin size, and number of random values to generate, this 
creates a histogram and calculates the coefficient of variation of the numbers 
generated. The histogram bin at index zero represents the largest valued bin in 
the result. The count in a given histogram bin is the number of values for 
which that bin is the leftmost (largest valued) with at least one non-zero bit. 
The bin of maximum index represents zero.


What exactly are you trying to show with this code? I'm strugling to 
understand it...


If you wanted to show the distribution of BigInteger values into equally 
sized bins by the value (each bin representing the sub-interval in the 
interval 0 ... 2^numBits -1) then the code would be much simpler. But 
you are trying to show something else... What is that?


Regards, Peter



Re: 8215441: Increase uniformity of the distribution of BigIntegers constructed by BigInteger(int, Random)

2018-12-20 Thread Adam Petcher
I'm not sure what the problem is. If X is uniform, then "the number of 
leading zero bits" of X is exponential. The probability of getting a 
"small" number, in which the first (say) 32 bits are 0 is 2^-32. If this 
is what is measured by the histrogram[5], then the current 
implementation looks correct to me.


On 12/19/2018 11:01 PM, Brian Burkhalter wrote:

https://bugs.openjdk.java.net/browse/JDK-8215441

This issue was filed to cover improving the uniformity of randomly generated 
BigIntegers. It is not intended to resolve [1] which is deliberately left open. 
The proposed patch implements a modified version of the “workaround” suggested 
in [1].

The problem is that the magnitude of the random BigInteger is created from a 
sequence of bytes generated by Random.nextBytes() [2]. The likelihood that any 
of these bytes is zero is small so the distribution of the resulting random 
BigIntegers is skewed towards values close to the maximum bit size “numBits” 
specified to the constructor.

The workaround suggested in [1] is to randomly change numBits to the value 
numBits = Random.nextInt(numBits + 1) [3]. (Actually the suggested workaround 
is nextInt(numBits) which is incorrect as the parameter is an exclusive upper 
bound.) This greatly improves the uniformity of the distribution. A remaining 
problem however is that now the very largest numbers in the interval 
[0,2^numBits) are underrepresented. A modification of this approach is to 
increment the new value of numBits as numBits = Random.nextInt(numBits + 1) + 1 
[4]. This was empirically observed to improve the underrepresentation of the 
largest values.

The distribution of the random BigIntegers was estimated using [5]. For a given 
maximum bit length, bin size, and number of random values to generate, this 
creates a histogram and calculates the coefficient of variation of the numbers 
generated. The histogram bin at index zero represents the largest valued bin in 
the result. The count in a given histogram bin is the number of values for 
which that bin is the leftmost (largest valued) with at least one non-zero bit. 
The bin of maximum index represents zero.

Results for the current and two modified approaches for 256 bits with a 1-bit 
bin size and for 4096 bits with a 4-bit bin size are given at [6-11]. As may be 
observed, the original histogram is clustered towards the largest possible 
value 2^numBits - 1, and the coefficient of variation is small. The results for 
the two variants of the patch show a flattened distribution, i.e., more 
uniform, and a significantly larger coefficient of variation. The second 
approach shows better flattening of both ends of the histogram. These results 
are samples only but are exemplary of the results observed over numerous runs 
of this code.

The test ModPow is modified as the modPow() method throws an 
ArithmeticException for a zero modulus. The current algorithm never generates a 
random BigInteger equal to zero however so that exception never occurs. That is 
not the case for either modified version.

Thanks,

Brian

[1] https://bugs.openjdk.java.net/browse/JDK-8146153
[2] 
https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/Random.html#nextBytes(byte[])
[3] http://cr.openjdk.java.net/~bpb/8146153/webrev.00/
[4] http://cr.openjdk.java.net/~bpb/8146153/webrev.01/
[5] http://cr.openjdk.java.net/~bpb/8146153/StatsBigIntegerRandom.java
[6] http://cr.openjdk.java.net/~bpb/8146153/before-256-1.txt
[7] http://cr.openjdk.java.net/~bpb/8146153/after-256-1.txt
[8] http://cr.openjdk.java.net/~bpb/8146153/after-extra-bit-256-1.txt
[9] http://cr.openjdk.java.net/~bpb/8146153/before-4096-4.txt
[10] http://cr.openjdk.java.net/~bpb/8146153/after-4096-4.txt
[11] http://cr.openjdk.java.net/~bpb/8146153/after-extra-bit-4096-4.txt


Re: 8215441: Increase uniformity of the distribution of BigIntegers constructed by BigInteger(int, Random)

2018-12-20 Thread Bernd Eckenfels
Hm strange, never saw it this way. Would other types have the same problem 
(should be visible in your histogram for long as well, right?)

Gruss
Bernd
--
http://bernd.eckenfels.net


Von: core-libs-dev  im Auftrag von 
Brian Burkhalter 
Gesendet: Donnerstag, Dezember 20, 2018 8:03 AM
An: core-libs-dev
Betreff: 8215441: Increase uniformity of the distribution of BigIntegers 
constructed by BigInteger(int, Random)

https://bugs.openjdk.java.net/browse/JDK-8215441

This issue was filed to cover improving the uniformity of randomly generated 
BigIntegers. It is not intended to resolve [1] which is deliberately left open. 
The proposed patch implements a modified version of the “workaround” suggested 
in [1].

The problem is that the magnitude of the random BigInteger is created from a 
sequence of bytes generated by Random.nextBytes() [2]. The likelihood that any 
of these bytes is zero is small so the distribution of the resulting random 
BigIntegers is skewed towards values close to the maximum bit size “numBits” 
specified to the constructor.

The workaround suggested in [1] is to randomly change numBits to the value 
numBits = Random.nextInt(numBits + 1) [3]. (Actually the suggested workaround 
is nextInt(numBits) which is incorrect as the parameter is an exclusive upper 
bound.) This greatly improves the uniformity of the distribution. A remaining 
problem however is that now the very largest numbers in the interval 
[0,2^numBits) are underrepresented. A modification of this approach is to 
increment the new value of numBits as numBits = Random.nextInt(numBits + 1) + 1 
[4]. This was empirically observed to improve the underrepresentation of the 
largest values.

The distribution of the random BigIntegers was estimated using [5]. For a given 
maximum bit length, bin size, and number of random values to generate, this 
creates a histogram and calculates the coefficient of variation of the numbers 
generated. The histogram bin at index zero represents the largest valued bin in 
the result. The count in a given histogram bin is the number of values for 
which that bin is the leftmost (largest valued) with at least one non-zero bit. 
The bin of maximum index represents zero.

Results for the current and two modified approaches for 256 bits with a 1-bit 
bin size and for 4096 bits with a 4-bit bin size are given at [6-11]. As may be 
observed, the original histogram is clustered towards the largest possible 
value 2^numBits - 1, and the coefficient of variation is small. The results for 
the two variants of the patch show a flattened distribution, i.e., more 
uniform, and a significantly larger coefficient of variation. The second 
approach shows better flattening of both ends of the histogram. These results 
are samples only but are exemplary of the results observed over numerous runs 
of this code.

The test ModPow is modified as the modPow() method throws an 
ArithmeticException for a zero modulus. The current algorithm never generates a 
random BigInteger equal to zero however so that exception never occurs. That is 
not the case for either modified version.

Thanks,

Brian

[1] https://bugs.openjdk.java.net/browse/JDK-8146153
[2] 
https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/Random.html#nextBytes(byte[])
[3] http://cr.openjdk.java.net/~bpb/8146153/webrev.00/
[4] http://cr.openjdk.java.net/~bpb/8146153/webrev.01/
[5] http://cr.openjdk.java.net/~bpb/8146153/StatsBigIntegerRandom.java
[6] http://cr.openjdk.java.net/~bpb/8146153/before-256-1.txt
[7] http://cr.openjdk.java.net/~bpb/8146153/after-256-1.txt
[8] http://cr.openjdk.java.net/~bpb/8146153/after-extra-bit-256-1.txt
[9] http://cr.openjdk.java.net/~bpb/8146153/before-4096-4.txt
[10] http://cr.openjdk.java.net/~bpb/8146153/after-4096-4.txt
[11] http://cr.openjdk.java.net/~bpb/8146153/after-extra-bit-4096-4.txt


8215441: Increase uniformity of the distribution of BigIntegers constructed by BigInteger(int, Random)

2018-12-19 Thread Brian Burkhalter
https://bugs.openjdk.java.net/browse/JDK-8215441

This issue was filed to cover improving the uniformity of randomly generated 
BigIntegers. It is not intended to resolve [1] which is deliberately left open. 
The proposed patch implements a modified version of the “workaround” suggested 
in [1].

The problem is that the magnitude of the random BigInteger is created from a 
sequence of bytes generated by Random.nextBytes() [2]. The likelihood that any 
of these bytes is zero is small so the distribution of the resulting random 
BigIntegers is skewed towards values close to the maximum bit size “numBits” 
specified to the constructor.

The workaround suggested in [1] is to randomly change numBits to the value 
numBits = Random.nextInt(numBits + 1) [3]. (Actually the suggested workaround 
is nextInt(numBits) which is incorrect as the parameter is an exclusive upper 
bound.) This greatly improves the uniformity of the distribution. A remaining 
problem however is that now the very largest numbers in the interval 
[0,2^numBits) are underrepresented. A modification of this approach is to 
increment the new value of numBits as numBits = Random.nextInt(numBits + 1) + 1 
[4]. This was empirically observed to improve the underrepresentation of the 
largest values.

The distribution of the random BigIntegers was estimated using [5]. For a given 
maximum bit length, bin size, and number of random values to generate, this 
creates a histogram and calculates the coefficient of variation of the numbers 
generated. The histogram bin at index zero represents the largest valued bin in 
the result. The count in a given histogram bin is the number of values for 
which that bin is the leftmost (largest valued) with at least one non-zero bit. 
The bin of maximum index represents zero.

Results for the current and two modified approaches for 256 bits with a 1-bit 
bin size and for 4096 bits with a 4-bit bin size are given at [6-11]. As may be 
observed, the original histogram is clustered towards the largest possible 
value 2^numBits - 1, and the coefficient of variation is small. The results for 
the two variants of the patch show a flattened distribution, i.e., more 
uniform, and a significantly larger coefficient of variation. The second 
approach shows better flattening of both ends of the histogram. These results 
are samples only but are exemplary of the results observed over numerous runs 
of this code.

The test ModPow is modified as the modPow() method throws an 
ArithmeticException for a zero modulus. The current algorithm never generates a 
random BigInteger equal to zero however so that exception never occurs. That is 
not the case for either modified version.

Thanks,

Brian

[1] https://bugs.openjdk.java.net/browse/JDK-8146153
[2] 
https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/Random.html#nextBytes(byte[])
[3] http://cr.openjdk.java.net/~bpb/8146153/webrev.00/
[4] http://cr.openjdk.java.net/~bpb/8146153/webrev.01/
[5] http://cr.openjdk.java.net/~bpb/8146153/StatsBigIntegerRandom.java
[6] http://cr.openjdk.java.net/~bpb/8146153/before-256-1.txt
[7] http://cr.openjdk.java.net/~bpb/8146153/after-256-1.txt
[8] http://cr.openjdk.java.net/~bpb/8146153/after-extra-bit-256-1.txt
[9] http://cr.openjdk.java.net/~bpb/8146153/before-4096-4.txt
[10] http://cr.openjdk.java.net/~bpb/8146153/after-4096-4.txt
[11] http://cr.openjdk.java.net/~bpb/8146153/after-extra-bit-4096-4.txt