On Thu, 25 Jun 2026 11:05:30 GMT, Ehsan Behrangi <[email protected]> wrote:

> The current AArch64 implementation of ArraysSupport.vectorizedHashCode 
> processes polynomial reductions in relatively small groups, which limits 
> parallelism in the hash accumulation path for large arrays.
> 
> This change increases polynomial batch size to 16-element groups using a 
> larger precomputed powers-of-31 table. The updated implementation enables 
> more independent multiply operations and reduces dependency chains in the 
> main hashing loop.
> 
> The optimization also reduces generated stub size for all supported element 
> types, lowering instruction cache pressure in hot hashing workloads.
> 
> The optimization applies to boolean[], byte[], char[], short[], and int[] 
> array hashing paths and is enabled only for array lengths >= 8. Shorter 
> arrays continue to use the existing scalar implementation.
> 
> Generated stub size reduction:
> 
> 
> | Element type | New size | JDK 25 size | Reduction | 
> | ------------ | -------- | ----------- | --------- |
> | boolean      | 332 B    | 428 B       | -96 B     |
> | byte         | 332 B    | 428 B       | -96 B     |
> | char         | 332 B    | 408 B       | -76 B     |
> | short        | 332 B    | 408 B       | -76 B     |
> | int          | 300 B    | 324 B       | -24 B     |
> 
> ## BYTE[] Arrays.hashCode throughput (ops/ms):
> Lengths below 8 use the existing scalar path and are therefore expected to 
> show no meaningful change.
> 
> | Length | Baseline | New    | Improvement |
> |--------|----------|--------|-------------|
> | 2      | 696842   | 681572 | -2.2%       |
> | 7      | 349082   | 349392 | +0.1%       |
> | 8      | 309193   | 395677 | +28.0%      |
> | 9      | 294240   | 367510 | +24.9%      |
> | 15     | 160372   | 202718 | +26.4%      |
> | 16     | 241651   | 348854 | +44.4%      |
> | 17     | 228929   | 308820 | +34.9%      |
> | 23     | 139463   | 186679 | +33.9%      |
> | 24     | 177955   | 253809 | +42.6%      |
> | 25     | 173594   | 253786 | +46.2%      |
> | 31     | 113638   | 159672 | +40.5%      |
> | 32     | 164228   | 214765 | +30.8%      |
> | 33     | 155093   | 199425 | +28.6%      |
> | 47     | 103190   | 135190 | +31.0%      |
> | 48     | 116600   | 145178 | +24.5%      |
> | 49     | 112067   | 163144 | +45.6%      |
> | 63     | 79978    | 116111 | +45.2%      |
> | 64     | 104182   | 130175 | +25.0%      |
> | 65     | 101735   | 125010 | +22.9%      |
> 
> 
> ## CHAR[] Arrays.hashCode throughput (ops/ms)
> 
> | Length | Baseline | New    | Improvement |
> |--------|----------|--------|-------------|
> | 2      | 696254   | 696646 | +0.1%       |
> | 7      |...

src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 9197:

> 9195:     bool widen_signed = false;
> 9196: 
> 9197:     auto widen = [&](FloatRegister dst1,

Suggestion:

    auto widen = [&_masm](FloatRegister dst1,

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/31674#discussion_r3474152279

Reply via email to