Re: RFR: JDK-8277175 : Add a parallel multiply method to BigInteger [v4]

2021-11-24 Thread kabutz
> BigInteger currently uses three different algorithms for multiply. The simple 
> quadratic algorithm, then the slightly better Karatsuba if we exceed a bit 
> count and then Toom Cook 3 once we go into the several thousands of bits. 
> Since Toom Cook 3 is a recursive algorithm, it is trivial to parallelize it. 
> I have demonstrated this several times in conference talks. In order to be 
> consistent with other classes such as Arrays and Collection, I have added a 
> parallelMultiply() method. Internally we have added a parameter to the 
> private multiply method to indicate whether the calculation should be done in 
> parallel.
> 
> The performance improvements are as should be expected. Fibonacci of 100 
> million (using a single-threaded Dijkstra's sum of squares version) completes 
> in 9.2 seconds with the parallelMultiply() vs 25.3 seconds with the 
> sequential multiply() method. This is on my 1-8-2 laptop. The final 
> multiplications are with very large numbers, which then benefit from the 
> parallelization of Toom-Cook 3. Fibonacci 100 million is a 347084 bit number.
> 
> We have also parallelized the private square() method. Internally, the 
> square() method defaults to be sequential.
> 
> Some benchmark results, run on my 1-6-2 server:
> 
> 
> Benchmark  (n)  Mode  Cnt  Score  
> Error  Units
> BigIntegerParallelMultiply.multiply100ss4 51.707 
> ±   11.194  ms/op
> BigIntegerParallelMultiply.multiply   1000ss4988.302 
> ±  235.977  ms/op
> BigIntegerParallelMultiply.multiply  1ss4  24662.063 
> ± 1123.329  ms/op
> BigIntegerParallelMultiply.parallelMultiply100ss4 49.337 
> ±   26.611  ms/op
> BigIntegerParallelMultiply.parallelMultiply   1000ss4527.560 
> ±  268.903  ms/op
> BigIntegerParallelMultiply.parallelMultiply  1ss4   9076.551 
> ± 1899.444  ms/op
> 
> 
> We can see that for larger calculations (fib 100m), the execution is 2.7x 
> faster in parallel. For medium size (fib 10m) it is 1.873x faster. And for 
> small (fib 1m) it is roughly the same. Considering that the fibonacci 
> algorithm that we used was in itself sequential, and that the last 3 
> calculations would dominate, 2.7x faster should probably be considered quite 
> good on a 1-6-2 machine.

kabutz has updated the pull request incrementally with one additional commit 
since the last revision:

  Added limit on the number of recursive tasks based on number of processors

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/6409/files
  - new: https://git.openjdk.java.net/jdk/pull/6409/files/81a8b599..0d52e423

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6409&range=03
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6409&range=02-03

  Stats: 143 lines in 2 files changed: 69 ins; 36 del; 38 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6409.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6409/head:pull/6409

PR: https://git.openjdk.java.net/jdk/pull/6409


Re: RFR: JDK-8277175 : Add a parallel multiply method to BigInteger [v4]

2021-11-16 Thread kabutz
On Tue, 16 Nov 2021 12:48:03 GMT, kabutz  wrote:

>> BigInteger currently uses three different algorithms for multiply. The 
>> simple quadratic algorithm, then the slightly better Karatsuba if we exceed 
>> a bit count and then Toom Cook 3 once we go into the several thousands of 
>> bits. Since Toom Cook 3 is a recursive algorithm, it is trivial to 
>> parallelize it. I have demonstrated this several times in conference talks. 
>> In order to be consistent with other classes such as Arrays and Collection, 
>> I have added a parallelMultiply() method. Internally we have added a 
>> parameter to the private multiply method to indicate whether the calculation 
>> should be done in parallel.
>> 
>> The performance improvements are as should be expected. Fibonacci of 100 
>> million (using a single-threaded Dijkstra's sum of squares version) 
>> completes in 9.2 seconds with the parallelMultiply() vs 25.3 seconds with 
>> the sequential multiply() method. This is on my 1-8-2 laptop. The final 
>> multiplications are with very large numbers, which then benefit from the 
>> parallelization of Toom-Cook 3.  Fibonacci 100 million is a 347084 bit 
>> number.
>> 
>> We have also parallelized the private square() method. Internally, the 
>> square() method defaults to be sequential.
>> 
>> 
>> Benchmark  (n)  Mode  Cnt  Score 
>>  Error  Units
>> BigIntegerParallelMultiply.multiply100ss4 68,043 
>> ±   25,317  ms/op
>> BigIntegerParallelMultiply.multiply   1000ss4   1073,095 
>> ±  125,296  ms/op
>> BigIntegerParallelMultiply.multiply  1ss4  25317,535 
>> ± 5806,205  ms/op
>> BigIntegerParallelMultiply.parallelMultiply100ss4 56,552 
>> ±   22,368  ms/op
>> BigIntegerParallelMultiply.parallelMultiply   1000ss4536,193 
>> ±   37,393  ms/op
>> BigIntegerParallelMultiply.parallelMultiply  1ss4   9274,657 
>> ±  826,197  ms/op
>
> kabutz has updated the pull request with a new target base due to a merge or 
> a rebase. The pull request now contains four commits:
> 
>  - Update comments
>  - Added parallelMultiply() method to BigInteger to allow large 
> multiplications to run in parallel
>  - 8176501: Method Shape.getBounds2D() incorrectly includes Bezier control 
> points in bounding box
>
>Addressing some of Laurent's code review recommendations/comments:
>
>1. use the convention t for the parametric variable x(t),y(t)
>2. solve the quadratic equation using QuadCurve2d.solveQuadratic() or like 
> Helpers.quadraticRoots()
>3. always use braces for x = (a < b) ? ...
>4. always use double-precision constants in math or logical operations: (2 
> * x => 2.0 * x) and (coefficients[3] != 0) => (coefficients[3] != 0.0)
>
>(There are two additional recommendations not in this commit that I'll ask 
> about shortly.)
>
>See https://github.com/openjdk/jdk/pull/6227#issuecomment-959757954
>  - 8176501: Method Shape.getBounds2D() incorrectly includes Bezier control 
> points in bounding box
>
>The bug writeup indicated they wanted Path2D#getBounds2D() to be more 
> accurate/concise. They didn't explicitly say they wanted CubicCurve2D and 
> QuadCurve2D to become more accurate too. But a preexisting unit test failed 
> when Path2D#getBounds2D() was updated and those other classes weren't. At 
> this point I considered either:
>A. Updating CubicCurve2D and QuadCurve2D to use the new more accurate 
> getBounds2D() or
>B. Updating the unit test to forgive the discrepancy.
>
>I chose A. Which might technically be seen as scope creep, but it feels 
> like a more holistic/better approach.
>
>This also includes a new unit test (in Path2D/UnitTest.java) that fails 
> without the changes in this commit.

Resubmitted as PR https://github.com/openjdk/jdk/pull/6409/

-

PR: https://git.openjdk.java.net/jdk/pull/6391


Re: RFR: JDK-8277175 : Add a parallel multiply method to BigInteger [v4]

2021-11-16 Thread kabutz
On Tue, 16 Nov 2021 12:48:03 GMT, kabutz  wrote:

>> BigInteger currently uses three different algorithms for multiply. The 
>> simple quadratic algorithm, then the slightly better Karatsuba if we exceed 
>> a bit count and then Toom Cook 3 once we go into the several thousands of 
>> bits. Since Toom Cook 3 is a recursive algorithm, it is trivial to 
>> parallelize it. I have demonstrated this several times in conference talks. 
>> In order to be consistent with other classes such as Arrays and Collection, 
>> I have added a parallelMultiply() method. Internally we have added a 
>> parameter to the private multiply method to indicate whether the calculation 
>> should be done in parallel.
>> 
>> The performance improvements are as should be expected. Fibonacci of 100 
>> million (using a single-threaded Dijkstra's sum of squares version) 
>> completes in 9.2 seconds with the parallelMultiply() vs 25.3 seconds with 
>> the sequential multiply() method. This is on my 1-8-2 laptop. The final 
>> multiplications are with very large numbers, which then benefit from the 
>> parallelization of Toom-Cook 3.  Fibonacci 100 million is a 347084 bit 
>> number.
>> 
>> We have also parallelized the private square() method. Internally, the 
>> square() method defaults to be sequential.
>> 
>> 
>> Benchmark  (n)  Mode  Cnt  Score 
>>  Error  Units
>> BigIntegerParallelMultiply.multiply100ss4 68,043 
>> ±   25,317  ms/op
>> BigIntegerParallelMultiply.multiply   1000ss4   1073,095 
>> ±  125,296  ms/op
>> BigIntegerParallelMultiply.multiply  1ss4  25317,535 
>> ± 5806,205  ms/op
>> BigIntegerParallelMultiply.parallelMultiply100ss4 56,552 
>> ±   22,368  ms/op
>> BigIntegerParallelMultiply.parallelMultiply   1000ss4536,193 
>> ±   37,393  ms/op
>> BigIntegerParallelMultiply.parallelMultiply  1ss4   9274,657 
>> ±  826,197  ms/op
>
> kabutz has updated the pull request with a new target base due to a merge or 
> a rebase. The pull request now contains four commits:
> 
>  - Update comments
>  - Added parallelMultiply() method to BigInteger to allow large 
> multiplications to run in parallel
>  - 8176501: Method Shape.getBounds2D() incorrectly includes Bezier control 
> points in bounding box
>
>Addressing some of Laurent's code review recommendations/comments:
>
>1. use the convention t for the parametric variable x(t),y(t)
>2. solve the quadratic equation using QuadCurve2d.solveQuadratic() or like 
> Helpers.quadraticRoots()
>3. always use braces for x = (a < b) ? ...
>4. always use double-precision constants in math or logical operations: (2 
> * x => 2.0 * x) and (coefficients[3] != 0) => (coefficients[3] != 0.0)
>
>(There are two additional recommendations not in this commit that I'll ask 
> about shortly.)
>
>See https://github.com/openjdk/jdk/pull/6227#issuecomment-959757954
>  - 8176501: Method Shape.getBounds2D() incorrectly includes Bezier control 
> points in bounding box
>
>The bug writeup indicated they wanted Path2D#getBounds2D() to be more 
> accurate/concise. They didn't explicitly say they wanted CubicCurve2D and 
> QuadCurve2D to become more accurate too. But a preexisting unit test failed 
> when Path2D#getBounds2D() was updated and those other classes weren't. At 
> this point I considered either:
>A. Updating CubicCurve2D and QuadCurve2D to use the new more accurate 
> getBounds2D() or
>B. Updating the unit test to forgive the discrepancy.
>
>I chose A. Which might technically be seen as scope creep, but it feels 
> like a more holistic/better approach.
>
>This also includes a new unit test (in Path2D/UnitTest.java) that fails 
> without the changes in this commit.

Thanks Kevin, let me start again!

-

PR: https://git.openjdk.java.net/jdk/pull/6391


Re: RFR: JDK-8277175 : Add a parallel multiply method to BigInteger [v4]

2021-11-16 Thread Kevin Rushforth
On Tue, 16 Nov 2021 12:48:03 GMT, kabutz  wrote:

>> BigInteger currently uses three different algorithms for multiply. The 
>> simple quadratic algorithm, then the slightly better Karatsuba if we exceed 
>> a bit count and then Toom Cook 3 once we go into the several thousands of 
>> bits. Since Toom Cook 3 is a recursive algorithm, it is trivial to 
>> parallelize it. I have demonstrated this several times in conference talks. 
>> In order to be consistent with other classes such as Arrays and Collection, 
>> I have added a parallelMultiply() method. Internally we have added a 
>> parameter to the private multiply method to indicate whether the calculation 
>> should be done in parallel.
>> 
>> The performance improvements are as should be expected. Fibonacci of 100 
>> million (using a single-threaded Dijkstra's sum of squares version) 
>> completes in 9.2 seconds with the parallelMultiply() vs 25.3 seconds with 
>> the sequential multiply() method. This is on my 1-8-2 laptop. The final 
>> multiplications are with very large numbers, which then benefit from the 
>> parallelization of Toom-Cook 3.  Fibonacci 100 million is a 347084 bit 
>> number.
>> 
>> We have also parallelized the private square() method. Internally, the 
>> square() method defaults to be sequential.
>> 
>> 
>> Benchmark  (n)  Mode  Cnt  Score 
>>  Error  Units
>> BigIntegerParallelMultiply.multiply100ss4 68,043 
>> ±   25,317  ms/op
>> BigIntegerParallelMultiply.multiply   1000ss4   1073,095 
>> ±  125,296  ms/op
>> BigIntegerParallelMultiply.multiply  1ss4  25317,535 
>> ± 5806,205  ms/op
>> BigIntegerParallelMultiply.parallelMultiply100ss4 56,552 
>> ±   22,368  ms/op
>> BigIntegerParallelMultiply.parallelMultiply   1000ss4536,193 
>> ±   37,393  ms/op
>> BigIntegerParallelMultiply.parallelMultiply  1ss4   9274,657 
>> ±  826,197  ms/op
>
> kabutz has updated the pull request with a new target base due to a merge or 
> a rebase. The pull request now contains four commits:
> 
>  - Update comments
>  - Added parallelMultiply() method to BigInteger to allow large 
> multiplications to run in parallel
>  - 8176501: Method Shape.getBounds2D() incorrectly includes Bezier control 
> points in bounding box
>
>Addressing some of Laurent's code review recommendations/comments:
>
>1. use the convention t for the parametric variable x(t),y(t)
>2. solve the quadratic equation using QuadCurve2d.solveQuadratic() or like 
> Helpers.quadraticRoots()
>3. always use braces for x = (a < b) ? ...
>4. always use double-precision constants in math or logical operations: (2 
> * x => 2.0 * x) and (coefficients[3] != 0) => (coefficients[3] != 0.0)
>
>(There are two additional recommendations not in this commit that I'll ask 
> about shortly.)
>
>See https://github.com/openjdk/jdk/pull/6227#issuecomment-959757954
>  - 8176501: Method Shape.getBounds2D() incorrectly includes Bezier control 
> points in bounding box
>
>The bug writeup indicated they wanted Path2D#getBounds2D() to be more 
> accurate/concise. They didn't explicitly say they wanted CubicCurve2D and 
> QuadCurve2D to become more accurate too. But a preexisting unit test failed 
> when Path2D#getBounds2D() was updated and those other classes weren't. At 
> this point I considered either:
>A. Updating CubicCurve2D and QuadCurve2D to use the new more accurate 
> getBounds2D() or
>B. Updating the unit test to forgive the discrepancy.
>
>I chose A. Which might technically be seen as scope creep, but it feels 
> like a more holistic/better approach.
>
>This also includes a new unit test (in Path2D/UnitTest.java) that fails 
> without the changes in this commit.

You still have the extra unwanted commits in your branch.

Normally you should not force push to your branch once a review is started, but 
in this case the only other choice would be to abandon this PR and create a new 
one.

Regardless of whether you close this PR and create a new one or continue with 
this PR, you should rebase your branch on top of the latest jdk master such 
that there is only a single commit with the changes you are proposing. It 
should _not_ touch any files from `java.desktop`.

-

PR: https://git.openjdk.java.net/jdk/pull/6391


Re: RFR: JDK-8277175 : Add a parallel multiply method to BigInteger [v4]

2021-11-16 Thread kabutz
> BigInteger currently uses three different algorithms for multiply. The simple 
> quadratic algorithm, then the slightly better Karatsuba if we exceed a bit 
> count and then Toom Cook 3 once we go into the several thousands of bits. 
> Since Toom Cook 3 is a recursive algorithm, it is trivial to parallelize it. 
> I have demonstrated this several times in conference talks. In order to be 
> consistent with other classes such as Arrays and Collection, I have added a 
> parallelMultiply() method. Internally we have added a parameter to the 
> private multiply method to indicate whether the calculation should be done in 
> parallel.
> 
> The performance improvements are as should be expected. Fibonacci of 100 
> million (using a single-threaded Dijkstra's sum of squares version) completes 
> in 9.2 seconds with the parallelMultiply() vs 25.3 seconds with the 
> sequential multiply() method. This is on my 1-8-2 laptop. The final 
> multiplications are with very large numbers, which then benefit from the 
> parallelization of Toom-Cook 3.  Fibonacci 100 million is a 347084 bit number.
> 
> We have also parallelized the private square() method. Internally, the 
> square() method defaults to be sequential.
> 
> 
> Benchmark  (n)  Mode  Cnt  Score  
> Error  Units
> BigIntegerParallelMultiply.multiply100ss4 68,043 
> ±   25,317  ms/op
> BigIntegerParallelMultiply.multiply   1000ss4   1073,095 
> ±  125,296  ms/op
> BigIntegerParallelMultiply.multiply  1ss4  25317,535 
> ± 5806,205  ms/op
> BigIntegerParallelMultiply.parallelMultiply100ss4 56,552 
> ±   22,368  ms/op
> BigIntegerParallelMultiply.parallelMultiply   1000ss4536,193 
> ±   37,393  ms/op
> BigIntegerParallelMultiply.parallelMultiply  1ss4   9274,657 
> ±  826,197  ms/op

kabutz has updated the pull request with a new target base due to a merge or a 
rebase. The pull request now contains four commits:

 - Update comments
 - Added parallelMultiply() method to BigInteger to allow large multiplications 
to run in parallel
 - 8176501: Method Shape.getBounds2D() incorrectly includes Bezier control 
points in bounding box
   
   Addressing some of Laurent's code review recommendations/comments:
   
   1. use the convention t for the parametric variable x(t),y(t)
   2. solve the quadratic equation using QuadCurve2d.solveQuadratic() or like 
Helpers.quadraticRoots()
   3. always use braces for x = (a < b) ? ...
   4. always use double-precision constants in math or logical operations: (2 * 
x => 2.0 * x) and (coefficients[3] != 0) => (coefficients[3] != 0.0)
   
   (There are two additional recommendations not in this commit that I'll ask 
about shortly.)
   
   See https://github.com/openjdk/jdk/pull/6227#issuecomment-959757954
 - 8176501: Method Shape.getBounds2D() incorrectly includes Bezier control 
points in bounding box
   
   The bug writeup indicated they wanted Path2D#getBounds2D() to be more 
accurate/concise. They didn't explicitly say they wanted CubicCurve2D and 
QuadCurve2D to become more accurate too. But a preexisting unit test failed 
when Path2D#getBounds2D() was updated and those other classes weren't. At this 
point I considered either:
   A. Updating CubicCurve2D and QuadCurve2D to use the new more accurate 
getBounds2D() or
   B. Updating the unit test to forgive the discrepancy.
   
   I chose A. Which might technically be seen as scope creep, but it feels like 
a more holistic/better approach.
   
   This also includes a new unit test (in Path2D/UnitTest.java) that fails 
without the changes in this commit.

-

Changes: https://git.openjdk.java.net/jdk/pull/6391/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6391&range=03
  Stats: 731 lines in 8 files changed: 565 ins; 136 del; 30 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6391.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6391/head:pull/6391

PR: https://git.openjdk.java.net/jdk/pull/6391