On Fri, 28 Jan 2022 18:59:56 GMT, kabutz <d...@openjdk.java.net> wrote:
>> BigInteger currently uses three different algorithms for multiply. The >> simple quadratic algorithm, then the slightly better Karatsuba if we exceed >> a bit count and then Toom Cook 3 once we go into the several thousands of >> bits. Since Toom Cook 3 is a recursive algorithm, it is trivial to >> parallelize it. I have demonstrated this several times in conference talks. >> In order to be consistent with other classes such as Arrays and Collection, >> I have added a parallelMultiply() method. Internally we have added a >> parameter to the private multiply method to indicate whether the calculation >> should be done in parallel. >> >> The performance improvements are as should be expected. Fibonacci of 100 >> million (using a single-threaded Dijkstra's sum of squares version) >> completes in 9.2 seconds with the parallelMultiply() vs 25.3 seconds with >> the sequential multiply() method. This is on my 1-8-2 laptop. The final >> multiplications are with very large numbers, which then benefit from the >> parallelization of Toom-Cook 3. Fibonacci 100 million is a 347084 bit number. >> >> We have also parallelized the private square() method. Internally, the >> square() method defaults to be sequential. >> >> Some benchmark results, run on my 1-6-2 server: >> >> >> Benchmark (n) Mode Cnt Score >> Error Units >> BigIntegerParallelMultiply.multiply 1000000 ss 4 51.707 >> ± 11.194 ms/op >> BigIntegerParallelMultiply.multiply 10000000 ss 4 988.302 >> ± 235.977 ms/op >> BigIntegerParallelMultiply.multiply 100000000 ss 4 24662.063 >> ± 1123.329 ms/op >> BigIntegerParallelMultiply.parallelMultiply 1000000 ss 4 49.337 >> ± 26.611 ms/op >> BigIntegerParallelMultiply.parallelMultiply 10000000 ss 4 527.560 >> ± 268.903 ms/op >> BigIntegerParallelMultiply.parallelMultiply 100000000 ss 4 9076.551 >> ± 1899.444 ms/op >> >> >> We can see that for larger calculations (fib 100m), the execution is 2.7x >> faster in parallel. For medium size (fib 10m) it is 1.873x faster. And for >> small (fib 1m) it is roughly the same. Considering that the fibonacci >> algorithm that we used was in itself sequential, and that the last 3 >> calculations would dominate, 2.7x faster should probably be considered quite >> good on a 1-6-2 machine. > > kabutz has updated the pull request incrementally with one additional commit > since the last revision: > > Benchmark for testing the effectiveness of BigInteger.parallelMultiply() I have added a benchmark for checking performance difference between sequential and parallel multiply of very large Mersenne primes using BigInteger. We want to measure real time, user time, system time and the amount of memory allocated. To calculate this, we create our own thread factory for the common ForkJoinPool and then use that to measure user time, cpu time and bytes allocated. We use reflection to discover all methods that match "*ultiply", and use them to multiply two very large Mersenne primes together. ### Results on a 1-6-2 machine running Ubuntu linux Memory allocation increased from 83.9GB to 84GB, for both the sequential and parallel versions. This is an increase of just 0.1%. On this machine, the parallel version was 3.8x faster in latency (real time), but it used 2.7x more CPU resources. Testing multiplying Mersenne primes of 2^57885161-1 and 2^82589933-1 #### openjdk version "18-internal" 2022-03-15 BigInteger.parallelMultiply() real 0m6.288s user 1m3.010s sys 0m0.027s mem 84.0GB BigInteger.multiply() real 0m23.682s user 0m23.530s sys 0m0.004s mem 84.0GB #### openjdk version "1.8.0_302" BigInteger.multiply() real 0m25.657s user 0m25.390s sys 0m0.001s mem 83.9GB #### openjdk version "9.0.7.1" BigInteger.multiply() real 0m24.907s user 0m24.700s sys 0m0.001s mem 83.9GB #### openjdk version "10.0.2" 2018-07-17 BigInteger.multiply() real 0m24.632s user 0m24.380s sys 0m0.004s mem 83.9GB #### openjdk version "11.0.12" 2021-07-20 LTS BigInteger.multiply() real 0m22.114s user 0m21.930s sys 0m0.001s mem 83.9GB #### openjdk version "12.0.2" 2019-07-16 BigInteger.multiply() real 0m23.015s user 0m22.830s sys 0m0.000s mem 83.9GB #### openjdk version "13.0.9" 2021-10-19 BigInteger.multiply() real 0m23.548s user 0m23.350s sys 0m0.005s mem 83.9GB #### openjdk version "14.0.2" 2020-07-14 BigInteger.multiply() real 0m22.918s user 0m22.530s sys 0m0.131s mem 83.9GB #### openjdk version "15.0.5" 2021-10-19 BigInteger.multiply() real 0m22.038s user 0m21.750s sys 0m0.003s mem 83.9GB #### openjdk version "16.0.2" 2021-07-20 BigInteger.multiply() real 0m23.049s user 0m22.760s sys 0m0.006s mem 83.9GB #### openjdk version "17" 2021-09-14 BigInteger.multiply() real 0m22.580s user 0m22.310s sys 0m0.001s mem 83.9GB ------------- PR: https://git.openjdk.java.net/jdk/pull/6409