On Sun, 9 Oct 2022 06:53:19 GMT, Xue-Lei Andrew Fan <xue...@openjdk.org> wrote:

> Hi,
> 
> May I have this update reviewed?  With this update, the result will be always 
> reduced in EC limbs addition and subtraction operations in the JDK 
> implementation.
> 
> In the current implementation, the EC limbs addition and subtraction result 
> is not reduced before the value is returned.  This behavior is different from 
> multiplication and square.  To get it right, a maximum addition limit is 
> introduced for EC limbs addition and subtraction.  If the limit is reached, 
> an exception will be thrown so as to indicate an EC implementation problem.  
> The exception is not recoverable from the viewpoint of an application.
> 
> The design is error prone as the EC implementation code must be carefully 
> checked so that the limit cannot reach. If a reach is noticed, a reduce 
> operation would have to be hard-coded additionally. In the FieldGen.java 
> implementation, the limit can only be 1 or 2 as the implementation code is 
> only able to handle 2.  But the FieldGen.java does not really check if 1 or 2 
> really works for the specific filed generation parameters.
> 
> The design impact the performance as well. Because of this limit, the maximum 
> limb size is 28 bits for 2 max adding limit. Otherwise there are integer 
> (long) overflow issues. For example for 256 bits curves, 10 limbs are 
> required for 28-bit limb size; and 9 limbs for 29-bit size. By reducing 1 
> limbs from 10 to 9, the Signature performance could get improved by 20%.
> 
> In the IntegerPolynomial class description, it is said "All IntegerPolynomial 
> implementations allow at most one addition before multiplication. Additions 
> after that will result in an ArithmeticException." It's too strict to follow 
> without exam the code very carefully. Indeed, the implementation does not 
> really follow the spec, and 2 addition may be allowed.
> 
> It would be nice if there is no addition limit, and then we are free from 
> these issues. It is doable by reducing the limbs in the adding and 
> subtraction operations, as other operations as multiplication. And then the 
> code related to the limit is not necessary any longer. The code can do any 
> many additions as required.
> 
> The benchmark for ECDSA Signature is checked in order to see how much the 
> performance could be impacted. Here are the benchmark numbers before the 
> patch applied:
> 
> Benchmark        (messageLength)   Mode  Cnt     Score    Error  Units
> Signatures.sign               64  thrpt   15  1414.463 ±  9.616  ops/s
> Signatures.sign              512  thrpt   15  1409.023 ± 14.636  ops/s
> Signatures.sign             2048  thrpt   15  1414.488 ±  4.728  ops/s
> Signatures.sign            16384  thrpt   15  1385.685 ±  4.476  ops/s
> 
> 
> and here are the numbers with this patch applied:
> 
> Benchmark        (messageLength)   Mode  Cnt     Score    Error  Units
> Signatures.sign               64  thrpt   15  1293.658 ±  8.358  ops/s
> Signatures.sign              512  thrpt   15  1298.404 ±  4.344  ops/s
> Signatures.sign             2048  thrpt   15  1289.732 ± 19.748  ops/s
> Signatures.sign            16384  thrpt   15  1278.980 ± 13.483  ops/s
> 
> 
> From the numbers, it looks like the performance impact is about 10%.  
> However, the performance impact could get rewarded by using less limbs.  For 
> examples for secp256r1, I updated the limbs from [10 to 9 for integer 
> operations](https://github.com/openjdk/jdk/pull/10398), and run the benchmark 
> together with this patch, I could see the performance improvement as follow:
> 
> Benchmark        (messageLength)   Mode  Cnt     Score    Error  Units
> Signatures.sign               64  thrpt   15  1578.568 ± 11.000  ops/s
> Signatures.sign              512  thrpt   15  1596.058 ±  4.184  ops/s
> Signatures.sign             2048  thrpt   15  1591.519 ±  6.444  ops/s
> Signatures.sign            16384  thrpt   15  1563.480 ±  6.837  ops/s
> ``` 
> 
> The following are numbers for secp256r1 key generation, without this patch:
> 
> Benchmark                      Mode  Cnt     Score   Error  Units
> KeyPairGenerators.keyPairGen  thrpt   15  1531.026 ± 9.869  ops/s
> 
> 
> With this patch:
> 
> Benchmark                      Mode  Cnt     Score    Error  Units
> KeyPairGenerators.keyPairGen  thrpt   15  1367.162 ± 50.322  ops/s
> 
> 
> with improved limbs (from 10 to 9):
> 
> Benchmark                      Mode  Cnt     Score    Error  Units
> KeyPairGenerators.keyPairGen  thrpt   15  1713.097 ± 8.003  ops/s
> 
> If considering this PR together with [PR for 
> JDK-8294248](https://github.com/openjdk/jdk/pull/10398), performance 
> improvement could be expected for EC crypto primitives.
> 
> Thanks,
> Xuelei

Close it for now.  I will open it again when I have a solution for better 
performance.

-------------

PR: https://git.openjdk.org/jdk/pull/10624

Reply via email to