Re: RFR: JDK-8277175 : Add a parallel multiply method to BigInteger [v4]
> BigInteger currently uses three different algorithms for multiply. The simple > quadratic algorithm, then the slightly better Karatsuba if we exceed a bit > count and then Toom Cook 3 once we go into the several thousands of bits. > Since Toom Cook 3 is a recursive algorithm, it is trivial to parallelize it. > I have demonstrated this several times in conference talks. In order to be > consistent with other classes such as Arrays and Collection, I have added a > parallelMultiply() method. Internally we have added a parameter to the > private multiply method to indicate whether the calculation should be done in > parallel. > > The performance improvements are as should be expected. Fibonacci of 100 > million (using a single-threaded Dijkstra's sum of squares version) completes > in 9.2 seconds with the parallelMultiply() vs 25.3 seconds with the > sequential multiply() method. This is on my 1-8-2 laptop. The final > multiplications are with very large numbers, which then benefit from the > parallelization of Toom-Cook 3. Fibonacci 100 million is a 347084 bit number. > > We have also parallelized the private square() method. Internally, the > square() method defaults to be sequential. > > Some benchmark results, run on my 1-6-2 server: > > > Benchmark (n) Mode Cnt Score > Error Units > BigIntegerParallelMultiply.multiply100ss4 51.707 > ± 11.194 ms/op > BigIntegerParallelMultiply.multiply 1000ss4988.302 > ± 235.977 ms/op > BigIntegerParallelMultiply.multiply 1ss4 24662.063 > ± 1123.329 ms/op > BigIntegerParallelMultiply.parallelMultiply100ss4 49.337 > ± 26.611 ms/op > BigIntegerParallelMultiply.parallelMultiply 1000ss4527.560 > ± 268.903 ms/op > BigIntegerParallelMultiply.parallelMultiply 1ss4 9076.551 > ± 1899.444 ms/op > > > We can see that for larger calculations (fib 100m), the execution is 2.7x > faster in parallel. For medium size (fib 10m) it is 1.873x faster. And for > small (fib 1m) it is roughly the same. Considering that the fibonacci > algorithm that we used was in itself sequential, and that the last 3 > calculations would dominate, 2.7x faster should probably be considered quite > good on a 1-6-2 machine. kabutz has updated the pull request incrementally with one additional commit since the last revision: Added limit on the number of recursive tasks based on number of processors - Changes: - all: https://git.openjdk.java.net/jdk/pull/6409/files - new: https://git.openjdk.java.net/jdk/pull/6409/files/81a8b599..0d52e423 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6409&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6409&range=02-03 Stats: 143 lines in 2 files changed: 69 ins; 36 del; 38 mod Patch: https://git.openjdk.java.net/jdk/pull/6409.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6409/head:pull/6409 PR: https://git.openjdk.java.net/jdk/pull/6409
Re: RFR: JDK-8277175 : Add a parallel multiply method to BigInteger [v4]
On Tue, 16 Nov 2021 12:48:03 GMT, kabutz wrote: >> BigInteger currently uses three different algorithms for multiply. The >> simple quadratic algorithm, then the slightly better Karatsuba if we exceed >> a bit count and then Toom Cook 3 once we go into the several thousands of >> bits. Since Toom Cook 3 is a recursive algorithm, it is trivial to >> parallelize it. I have demonstrated this several times in conference talks. >> In order to be consistent with other classes such as Arrays and Collection, >> I have added a parallelMultiply() method. Internally we have added a >> parameter to the private multiply method to indicate whether the calculation >> should be done in parallel. >> >> The performance improvements are as should be expected. Fibonacci of 100 >> million (using a single-threaded Dijkstra's sum of squares version) >> completes in 9.2 seconds with the parallelMultiply() vs 25.3 seconds with >> the sequential multiply() method. This is on my 1-8-2 laptop. The final >> multiplications are with very large numbers, which then benefit from the >> parallelization of Toom-Cook 3. Fibonacci 100 million is a 347084 bit >> number. >> >> We have also parallelized the private square() method. Internally, the >> square() method defaults to be sequential. >> >> >> Benchmark (n) Mode Cnt Score >> Error Units >> BigIntegerParallelMultiply.multiply100ss4 68,043 >> ± 25,317 ms/op >> BigIntegerParallelMultiply.multiply 1000ss4 1073,095 >> ± 125,296 ms/op >> BigIntegerParallelMultiply.multiply 1ss4 25317,535 >> ± 5806,205 ms/op >> BigIntegerParallelMultiply.parallelMultiply100ss4 56,552 >> ± 22,368 ms/op >> BigIntegerParallelMultiply.parallelMultiply 1000ss4536,193 >> ± 37,393 ms/op >> BigIntegerParallelMultiply.parallelMultiply 1ss4 9274,657 >> ± 826,197 ms/op > > kabutz has updated the pull request with a new target base due to a merge or > a rebase. The pull request now contains four commits: > > - Update comments > - Added parallelMultiply() method to BigInteger to allow large > multiplications to run in parallel > - 8176501: Method Shape.getBounds2D() incorrectly includes Bezier control > points in bounding box > >Addressing some of Laurent's code review recommendations/comments: > >1. use the convention t for the parametric variable x(t),y(t) >2. solve the quadratic equation using QuadCurve2d.solveQuadratic() or like > Helpers.quadraticRoots() >3. always use braces for x = (a < b) ? ... >4. always use double-precision constants in math or logical operations: (2 > * x => 2.0 * x) and (coefficients[3] != 0) => (coefficients[3] != 0.0) > >(There are two additional recommendations not in this commit that I'll ask > about shortly.) > >See https://github.com/openjdk/jdk/pull/6227#issuecomment-959757954 > - 8176501: Method Shape.getBounds2D() incorrectly includes Bezier control > points in bounding box > >The bug writeup indicated they wanted Path2D#getBounds2D() to be more > accurate/concise. They didn't explicitly say they wanted CubicCurve2D and > QuadCurve2D to become more accurate too. But a preexisting unit test failed > when Path2D#getBounds2D() was updated and those other classes weren't. At > this point I considered either: >A. Updating CubicCurve2D and QuadCurve2D to use the new more accurate > getBounds2D() or >B. Updating the unit test to forgive the discrepancy. > >I chose A. Which might technically be seen as scope creep, but it feels > like a more holistic/better approach. > >This also includes a new unit test (in Path2D/UnitTest.java) that fails > without the changes in this commit. Resubmitted as PR https://github.com/openjdk/jdk/pull/6409/ - PR: https://git.openjdk.java.net/jdk/pull/6391
Re: RFR: JDK-8277175 : Add a parallel multiply method to BigInteger [v4]
On Tue, 16 Nov 2021 12:48:03 GMT, kabutz wrote: >> BigInteger currently uses three different algorithms for multiply. The >> simple quadratic algorithm, then the slightly better Karatsuba if we exceed >> a bit count and then Toom Cook 3 once we go into the several thousands of >> bits. Since Toom Cook 3 is a recursive algorithm, it is trivial to >> parallelize it. I have demonstrated this several times in conference talks. >> In order to be consistent with other classes such as Arrays and Collection, >> I have added a parallelMultiply() method. Internally we have added a >> parameter to the private multiply method to indicate whether the calculation >> should be done in parallel. >> >> The performance improvements are as should be expected. Fibonacci of 100 >> million (using a single-threaded Dijkstra's sum of squares version) >> completes in 9.2 seconds with the parallelMultiply() vs 25.3 seconds with >> the sequential multiply() method. This is on my 1-8-2 laptop. The final >> multiplications are with very large numbers, which then benefit from the >> parallelization of Toom-Cook 3. Fibonacci 100 million is a 347084 bit >> number. >> >> We have also parallelized the private square() method. Internally, the >> square() method defaults to be sequential. >> >> >> Benchmark (n) Mode Cnt Score >> Error Units >> BigIntegerParallelMultiply.multiply100ss4 68,043 >> ± 25,317 ms/op >> BigIntegerParallelMultiply.multiply 1000ss4 1073,095 >> ± 125,296 ms/op >> BigIntegerParallelMultiply.multiply 1ss4 25317,535 >> ± 5806,205 ms/op >> BigIntegerParallelMultiply.parallelMultiply100ss4 56,552 >> ± 22,368 ms/op >> BigIntegerParallelMultiply.parallelMultiply 1000ss4536,193 >> ± 37,393 ms/op >> BigIntegerParallelMultiply.parallelMultiply 1ss4 9274,657 >> ± 826,197 ms/op > > kabutz has updated the pull request with a new target base due to a merge or > a rebase. The pull request now contains four commits: > > - Update comments > - Added parallelMultiply() method to BigInteger to allow large > multiplications to run in parallel > - 8176501: Method Shape.getBounds2D() incorrectly includes Bezier control > points in bounding box > >Addressing some of Laurent's code review recommendations/comments: > >1. use the convention t for the parametric variable x(t),y(t) >2. solve the quadratic equation using QuadCurve2d.solveQuadratic() or like > Helpers.quadraticRoots() >3. always use braces for x = (a < b) ? ... >4. always use double-precision constants in math or logical operations: (2 > * x => 2.0 * x) and (coefficients[3] != 0) => (coefficients[3] != 0.0) > >(There are two additional recommendations not in this commit that I'll ask > about shortly.) > >See https://github.com/openjdk/jdk/pull/6227#issuecomment-959757954 > - 8176501: Method Shape.getBounds2D() incorrectly includes Bezier control > points in bounding box > >The bug writeup indicated they wanted Path2D#getBounds2D() to be more > accurate/concise. They didn't explicitly say they wanted CubicCurve2D and > QuadCurve2D to become more accurate too. But a preexisting unit test failed > when Path2D#getBounds2D() was updated and those other classes weren't. At > this point I considered either: >A. Updating CubicCurve2D and QuadCurve2D to use the new more accurate > getBounds2D() or >B. Updating the unit test to forgive the discrepancy. > >I chose A. Which might technically be seen as scope creep, but it feels > like a more holistic/better approach. > >This also includes a new unit test (in Path2D/UnitTest.java) that fails > without the changes in this commit. Thanks Kevin, let me start again! - PR: https://git.openjdk.java.net/jdk/pull/6391
Re: RFR: JDK-8277175 : Add a parallel multiply method to BigInteger [v4]
On Tue, 16 Nov 2021 12:48:03 GMT, kabutz wrote: >> BigInteger currently uses three different algorithms for multiply. The >> simple quadratic algorithm, then the slightly better Karatsuba if we exceed >> a bit count and then Toom Cook 3 once we go into the several thousands of >> bits. Since Toom Cook 3 is a recursive algorithm, it is trivial to >> parallelize it. I have demonstrated this several times in conference talks. >> In order to be consistent with other classes such as Arrays and Collection, >> I have added a parallelMultiply() method. Internally we have added a >> parameter to the private multiply method to indicate whether the calculation >> should be done in parallel. >> >> The performance improvements are as should be expected. Fibonacci of 100 >> million (using a single-threaded Dijkstra's sum of squares version) >> completes in 9.2 seconds with the parallelMultiply() vs 25.3 seconds with >> the sequential multiply() method. This is on my 1-8-2 laptop. The final >> multiplications are with very large numbers, which then benefit from the >> parallelization of Toom-Cook 3. Fibonacci 100 million is a 347084 bit >> number. >> >> We have also parallelized the private square() method. Internally, the >> square() method defaults to be sequential. >> >> >> Benchmark (n) Mode Cnt Score >> Error Units >> BigIntegerParallelMultiply.multiply100ss4 68,043 >> ± 25,317 ms/op >> BigIntegerParallelMultiply.multiply 1000ss4 1073,095 >> ± 125,296 ms/op >> BigIntegerParallelMultiply.multiply 1ss4 25317,535 >> ± 5806,205 ms/op >> BigIntegerParallelMultiply.parallelMultiply100ss4 56,552 >> ± 22,368 ms/op >> BigIntegerParallelMultiply.parallelMultiply 1000ss4536,193 >> ± 37,393 ms/op >> BigIntegerParallelMultiply.parallelMultiply 1ss4 9274,657 >> ± 826,197 ms/op > > kabutz has updated the pull request with a new target base due to a merge or > a rebase. The pull request now contains four commits: > > - Update comments > - Added parallelMultiply() method to BigInteger to allow large > multiplications to run in parallel > - 8176501: Method Shape.getBounds2D() incorrectly includes Bezier control > points in bounding box > >Addressing some of Laurent's code review recommendations/comments: > >1. use the convention t for the parametric variable x(t),y(t) >2. solve the quadratic equation using QuadCurve2d.solveQuadratic() or like > Helpers.quadraticRoots() >3. always use braces for x = (a < b) ? ... >4. always use double-precision constants in math or logical operations: (2 > * x => 2.0 * x) and (coefficients[3] != 0) => (coefficients[3] != 0.0) > >(There are two additional recommendations not in this commit that I'll ask > about shortly.) > >See https://github.com/openjdk/jdk/pull/6227#issuecomment-959757954 > - 8176501: Method Shape.getBounds2D() incorrectly includes Bezier control > points in bounding box > >The bug writeup indicated they wanted Path2D#getBounds2D() to be more > accurate/concise. They didn't explicitly say they wanted CubicCurve2D and > QuadCurve2D to become more accurate too. But a preexisting unit test failed > when Path2D#getBounds2D() was updated and those other classes weren't. At > this point I considered either: >A. Updating CubicCurve2D and QuadCurve2D to use the new more accurate > getBounds2D() or >B. Updating the unit test to forgive the discrepancy. > >I chose A. Which might technically be seen as scope creep, but it feels > like a more holistic/better approach. > >This also includes a new unit test (in Path2D/UnitTest.java) that fails > without the changes in this commit. You still have the extra unwanted commits in your branch. Normally you should not force push to your branch once a review is started, but in this case the only other choice would be to abandon this PR and create a new one. Regardless of whether you close this PR and create a new one or continue with this PR, you should rebase your branch on top of the latest jdk master such that there is only a single commit with the changes you are proposing. It should _not_ touch any files from `java.desktop`. - PR: https://git.openjdk.java.net/jdk/pull/6391
Re: RFR: JDK-8277175 : Add a parallel multiply method to BigInteger [v4]
> BigInteger currently uses three different algorithms for multiply. The simple > quadratic algorithm, then the slightly better Karatsuba if we exceed a bit > count and then Toom Cook 3 once we go into the several thousands of bits. > Since Toom Cook 3 is a recursive algorithm, it is trivial to parallelize it. > I have demonstrated this several times in conference talks. In order to be > consistent with other classes such as Arrays and Collection, I have added a > parallelMultiply() method. Internally we have added a parameter to the > private multiply method to indicate whether the calculation should be done in > parallel. > > The performance improvements are as should be expected. Fibonacci of 100 > million (using a single-threaded Dijkstra's sum of squares version) completes > in 9.2 seconds with the parallelMultiply() vs 25.3 seconds with the > sequential multiply() method. This is on my 1-8-2 laptop. The final > multiplications are with very large numbers, which then benefit from the > parallelization of Toom-Cook 3. Fibonacci 100 million is a 347084 bit number. > > We have also parallelized the private square() method. Internally, the > square() method defaults to be sequential. > > > Benchmark (n) Mode Cnt Score > Error Units > BigIntegerParallelMultiply.multiply100ss4 68,043 > ± 25,317 ms/op > BigIntegerParallelMultiply.multiply 1000ss4 1073,095 > ± 125,296 ms/op > BigIntegerParallelMultiply.multiply 1ss4 25317,535 > ± 5806,205 ms/op > BigIntegerParallelMultiply.parallelMultiply100ss4 56,552 > ± 22,368 ms/op > BigIntegerParallelMultiply.parallelMultiply 1000ss4536,193 > ± 37,393 ms/op > BigIntegerParallelMultiply.parallelMultiply 1ss4 9274,657 > ± 826,197 ms/op kabutz has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: - Update comments - Added parallelMultiply() method to BigInteger to allow large multiplications to run in parallel - 8176501: Method Shape.getBounds2D() incorrectly includes Bezier control points in bounding box Addressing some of Laurent's code review recommendations/comments: 1. use the convention t for the parametric variable x(t),y(t) 2. solve the quadratic equation using QuadCurve2d.solveQuadratic() or like Helpers.quadraticRoots() 3. always use braces for x = (a < b) ? ... 4. always use double-precision constants in math or logical operations: (2 * x => 2.0 * x) and (coefficients[3] != 0) => (coefficients[3] != 0.0) (There are two additional recommendations not in this commit that I'll ask about shortly.) See https://github.com/openjdk/jdk/pull/6227#issuecomment-959757954 - 8176501: Method Shape.getBounds2D() incorrectly includes Bezier control points in bounding box The bug writeup indicated they wanted Path2D#getBounds2D() to be more accurate/concise. They didn't explicitly say they wanted CubicCurve2D and QuadCurve2D to become more accurate too. But a preexisting unit test failed when Path2D#getBounds2D() was updated and those other classes weren't. At this point I considered either: A. Updating CubicCurve2D and QuadCurve2D to use the new more accurate getBounds2D() or B. Updating the unit test to forgive the discrepancy. I chose A. Which might technically be seen as scope creep, but it feels like a more holistic/better approach. This also includes a new unit test (in Path2D/UnitTest.java) that fails without the changes in this commit. - Changes: https://git.openjdk.java.net/jdk/pull/6391/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6391&range=03 Stats: 731 lines in 8 files changed: 565 ins; 136 del; 30 mod Patch: https://git.openjdk.java.net/jdk/pull/6391.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6391/head:pull/6391 PR: https://git.openjdk.java.net/jdk/pull/6391