On Thu, 25 Jun 2026 11:54:56 GMT, Andrew Haley <[email protected]> wrote:

>> Ehsan Behrangi has refreshed the contents of this pull request, and previous 
>> commits have been removed. The incremental views will show differences 
>> compared to the previous content of the PR. The pull request contains one 
>> new commit since the last revision:
>> 
>>   8385513: AArch64: Improve ArraysSupport.vectorizedHashCode performance for 
>> large arrays
>>   
>>   The current AArch64 implementation of ArraysSupport.vectorizedHashCode
>>   processes polynomial reductions in relatively small groups, which limits
>>   parallelism in the hash accumulation path for large arrays.
>>   
>>   This change increases polynomial batch size to 16-element groups using a
>>   larger precomputed powers-of-31 table. The updated implementation enables
>>   more independent multiply operations and reduces dependency chains in the
>>   main hashing loop.
>>   
>>   The optimization also reduces generated stub size for all supported
>>   element types, lowering instruction cache pressure in hot hashing
>>   workloads.
>>   
>>   The optimization applies to boolean[], byte[], char[], short[], and
>>   int[] array hashing paths and is enabled only for array lengths >= 8.
>>   Shorter arrays continue to use the existing scalar implementation.
>>   
>>   Generated stub size reduction:
>>   | Element type | New size | JDK 25 size | Reduction |
>>   | ------------ | -------- | ----------- | --------- |
>>   | boolean      | 332 B    | 428 B       | -96 B     |
>>   | byte         | 332 B    | 428 B       | -96 B     |
>>   | char         | 332 B    | 408 B       | -76 B     |
>>   | short        | 332 B    | 408 B       | -76 B     |
>>   | int          | 300 B    | 324 B       | -24 B     |
>>   
>>   ----------------------------------------------------
>>   BYTE[] Arrays.hashCode throughput (ops/ms):
>>   Lengths below 8 use the existing scalar path and are therefore expected to 
>> show no meaningful change.
>>   
>>   | Length | Baseline | New    | Improvement |
>>   |--------|----------|--------|-------------|
>>   | 2      | 696842   | 681572 | -2.2%       |
>>   | 7      | 349082   | 349392 | +0.1%       |
>>   | 8      | 309193   | 395677 | +28.0%      |
>>   | 9      | 294240   | 367510 | +24.9%      |
>>   | 15     | 160372   | 202718 | +26.4%      |
>>   | 16     | 241651   | 348854 | +44.4%      |
>>   | 17     | 228929   | 308820 | +34.9%      |
>>   | 23     | 139463   | 186679 | +33.9%      |
>>   | 24     | 177955   | 253809 | +42.6%      |
>>   | 25     | 173594   | 253786 | +46.2%      |
>>   | 31     | 113638   ...
>
> src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 9197:
> 
>> 9195:     bool widen_signed = false;
>> 9196: 
>> 9197:     auto widen = [&](FloatRegister dst1,
> 
> Suggestion:
> 
>     auto widen = [&_masm](FloatRegister dst1,

Updated the helper lambdas to use explicit captures; the signed-widening mode 
is now passed as an explicit parameter.

> src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 9427:
> 
>> 9425:     __ emit_int32(1u);           // 31^0
>> 9426: 
>> 9427:     __ bind(L_after_table);
> 
> Use a `for` loop here to generate powers of 31.

Replaced the handwritten power table with a loop using intpow().

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/31674#discussion_r3482261134
PR Review Comment: https://git.openjdk.org/jdk/pull/31674#discussion_r3482256844

Reply via email to