On Mon, 27 Apr 2026 12:22:30 GMT, Ferenc Rakoczi <[email protected]> wrote:
>> An aarch64 implementation of the MontgomeryIntegerPolynomial256.mult() >> method and IntegerPolynomial.conditionalAssign(). Since 64-bit >> multiplication is not supported on Neon and manually performing this >> operation with 32-bit limbs is slower than with GPRs, a hybrid neon/gpr >> approach is used. Neon instructions are used to compute intermediate values >> used in the last two iterations of the main "loop", while the GPRs compute >> the first few iterations. At the method level this improves performance by >> ~9% and at the API level roughly 5%. >> >> >> >> --------- >> - [x] I confirm that I make this contribution in accordance with the >> [OpenJDK Interim AI Policy](https://openjdk.org/legal/ai). > > Ferenc Rakoczi has updated the pull request with a new target base due to a > merge or a rebase. The pull request now contains three commits: > > - Merged master. > - Removing a jar file. > - 8355216: Accelerate P-256 arithmetic on aarch64 (revived) @ferakocz > At the method level this improves performance by ~9% and at the API level > roughly 5%. Can you provide more information about this? What hardware did you use? What benchmarks did you run? ------------- PR Comment: https://git.openjdk.org/jdk/pull/30941#issuecomment-4328137632
