Re: RFR: JDK-8277175 : Add a parallel multiply method to BigInteger [v6]

kabutz Wed, 15 Dec 2021 12:19:01 -0800

> BigInteger currently uses three different algorithms for multiply. The simple 
> quadratic algorithm, then the slightly better Karatsuba if we exceed a bit 
> count and then Toom Cook 3 once we go into the several thousands of bits. 
> Since Toom Cook 3 is a recursive algorithm, it is trivial to parallelize it. 
> I have demonstrated this several times in conference talks. In order to be 
> consistent with other classes such as Arrays and Collection, I have added a 
> parallelMultiply() method. Internally we have added a parameter to the 
> private multiply method to indicate whether the calculation should be done in 
> parallel.
> 
> The performance improvements are as should be expected. Fibonacci of 100 
> million (using a single-threaded Dijkstra's sum of squares version) completes 
> in 9.2 seconds with the parallelMultiply() vs 25.3 seconds with the 
> sequential multiply() method. This is on my 1-8-2 laptop. The final 
> multiplications are with very large numbers, which then benefit from the 
> parallelization of Toom-Cook 3. Fibonacci 100 million is a 347084 bit number.
> 
> We have also parallelized the private square() method. Internally, the 
> square() method defaults to be sequential.
> 
> Some benchmark results, run on my 1-6-2 server:
> 
> 
> Benchmark                                          (n)  Mode  Cnt      Score  
>     Error  Units
> BigIntegerParallelMultiply.multiply            1000000    ss    4     51.707 
> ±   11.194  ms/op
> BigIntegerParallelMultiply.multiply           10000000    ss    4    988.302 
> ±  235.977  ms/op
> BigIntegerParallelMultiply.multiply          100000000    ss    4  24662.063 
> ± 1123.329  ms/op
> BigIntegerParallelMultiply.parallelMultiply    1000000    ss    4     49.337 
> ±   26.611  ms/op
> BigIntegerParallelMultiply.parallelMultiply   10000000    ss    4    527.560 
> ±  268.903  ms/op
> BigIntegerParallelMultiply.parallelMultiply  100000000    ss    4   9076.551 
> ± 1899.444  ms/op
> 
> 
> We can see that for larger calculations (fib 100m), the execution is 2.7x 
> faster in parallel. For medium size (fib 10m) it is 1.873x faster. And for 
> small (fib 1m) it is roughly the same. Considering that the fibonacci 
> algorithm that we used was in itself sequential, and that the last 3 
> calculations would dominate, 2.7x faster should probably be considered quite 
> good on a 1-6-2 machine.


kabutz has updated the pull request incrementally with four additional commits 
since the last revision:

 - Made RecursiveOp fields package-private so that we do not need super. 
qualifiers in subclasses
 - Incremented depth once instead of on every call
 - Simplified depth calculation to rely on common pool parallelism or the 
current fork join pool executing the code.
 - Removed serialVersionUID and annotated class with @SuppressWarning("serial") 
instead

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/6409/files
  - new: https://git.openjdk.java.net/jdk/pull/6409/files/59de5298..3cd16443

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6409&range=05
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6409&range=04-05

  Stats: 54 lines in 1 file changed: 17 ins; 16 del; 21 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6409.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6409/head:pull/6409

PR: https://git.openjdk.java.net/jdk/pull/6409

Re: RFR: JDK-8277175 : Add a parallel multiply method to BigInteger [v6]

Reply via email to