> BigInteger currently uses three different algorithms for multiply. The simple
> quadratic algorithm, then the slightly better Karatsuba if we exceed a bit
> count and then Toom Cook 3 once we go into the several thousands of bits.
> Since Toom Cook 3 is a recursive algorithm, it is trivial to parallelize it.
> I have demonstrated this several times in conference talks. In order to be
> consistent with other classes such as Arrays and Collection, I have added a
> parallelMultiply() method. Internally we have added a parameter to the
> private multiply method to indicate whether the calculation should be done in
> parallel.
>
> The performance improvements are as should be expected. Fibonacci of 100
> million (using a single-threaded Dijkstra's sum of squares version) completes
> in 9.2 seconds with the parallelMultiply() vs 25.3 seconds with the
> sequential multiply() method. This is on my 1-8-2 laptop. The final
> multiplications are with very large numbers, which then benefit from the
> parallelization of Toom-Cook 3. Fibonacci 100 million is a 347084 bit number.
>
> We have also parallelized the private square() method. Internally, the
> square() method defaults to be sequential.
>
> Some benchmark results, run on my 1-6-2 server:
>
>
> Benchmark (n) Mode Cnt Score
> Error Units
> BigIntegerParallelMultiply.multiply 1000000 ss 4 51.707
> ± 11.194 ms/op
> BigIntegerParallelMultiply.multiply 10000000 ss 4 988.302
> ± 235.977 ms/op
> BigIntegerParallelMultiply.multiply 100000000 ss 4 24662.063
> ± 1123.329 ms/op
> BigIntegerParallelMultiply.parallelMultiply 1000000 ss 4 49.337
> ± 26.611 ms/op
> BigIntegerParallelMultiply.parallelMultiply 10000000 ss 4 527.560
> ± 268.903 ms/op
> BigIntegerParallelMultiply.parallelMultiply 100000000 ss 4 9076.551
> ± 1899.444 ms/op
>
>
> We can see that for larger calculations (fib 100m), the execution is 2.7x
> faster in parallel. For medium size (fib 10m) it is 1.873x faster. And for
> small (fib 1m) it is roughly the same. Considering that the fibonacci
> algorithm that we used was in itself sequential, and that the last 3
> calculations would dominate, 2.7x faster should probably be considered quite
> good on a 1-6-2 machine.
kabutz has updated the pull request incrementally with four additional commits
since the last revision:
- Made RecursiveOp fields package-private so that we do not need super.
qualifiers in subclasses
- Incremented depth once instead of on every call
- Simplified depth calculation to rely on common pool parallelism or the
current fork join pool executing the code.
- Removed serialVersionUID and annotated class with @SuppressWarning("serial")
instead
-------------
Changes:
- all: https://git.openjdk.java.net/jdk/pull/6409/files
- new: https://git.openjdk.java.net/jdk/pull/6409/files/59de5298..3cd16443
Webrevs:
- full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6409&range=05
- incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6409&range=04-05
Stats: 54 lines in 1 file changed: 17 ins; 16 del; 21 mod
Patch: https://git.openjdk.java.net/jdk/pull/6409.diff
Fetch: git fetch https://git.openjdk.java.net/jdk pull/6409/head:pull/6409
PR: https://git.openjdk.java.net/jdk/pull/6409