BigInteger currently uses three different algorithms for multiply. The simple 
quadratic algorithm, then the slightly better Karatsuba if we exceed a bit 
count and then Toom Cook 3 once we go into the several thousands of bits. Since 
Toom Cook 3 is a recursive algorithm, it is trivial to parallelize it. I have 
demonstrated this several times in conference talks. In order to be consistent 
with other classes such as Arrays and Collection, I have added a 
parallelMultiply() method. Internally we have added a parameter to the private 
multiply method to indicate whether the calculation should be done in parallel.

The performance improvements are as should be expected. Fibonacci of 100 
million (using a single-threaded Dijkstra's sum of squares version) completes 
in 9.2 seconds with the parallelMultiply() vs 25.3 seconds with the sequential 
multiply() method. This is on my 1-8-2 laptop. The final multiplications are 
with very large numbers, which then benefit from the parallelization of 
Toom-Cook 3. Fibonacci 100 million is a 347084 bit number.

We have also parallelized the private square() method. Internally, the square() 
method defaults to be sequential.

Some benchmark results, run on my 1-6-2 server:


Benchmark                                          (n)  Mode  Cnt      Score    
  Error  Units
BigIntegerParallelMultiply.multiply            1000000    ss    4     51.707 ±  
 11.194  ms/op
BigIntegerParallelMultiply.multiply           10000000    ss    4    988.302 ±  
235.977  ms/op
BigIntegerParallelMultiply.multiply          100000000    ss    4  24662.063 ± 
1123.329  ms/op
BigIntegerParallelMultiply.parallelMultiply    1000000    ss    4     49.337 ±  
 26.611  ms/op
BigIntegerParallelMultiply.parallelMultiply   10000000    ss    4    527.560 ±  
268.903  ms/op
BigIntegerParallelMultiply.parallelMultiply  100000000    ss    4   9076.551 ± 
1899.444  ms/op


We can see that for larger calculations (fib 100m), the execution is 2.7x 
faster in parallel. For medium size (fib 10m) it is 1.873x faster. And for 
small (fib 1m) it is roughly the same. Considering that the fibonacci algorithm 
that we used was in itself sequential, and that the last 3 calculations would 
dominate, 2.7x faster should probably be considered quite good on a 1-6-2 
machine.

-------------

Commit messages:
 - Added parallelMultiply() method to BigInteger to allow large multiplications 
to run in parallel

Changes: https://git.openjdk.java.net/jdk/pull/6409/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6409&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8277175
  Stats: 269 lines in 3 files changed: 236 ins; 6 del; 27 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6409.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6409/head:pull/6409

PR: https://git.openjdk.java.net/jdk/pull/6409

Reply via email to