On Thu, 25 Jun 2026 11:54:56 GMT, Andrew Haley <[email protected]> wrote:
>> Ehsan Behrangi has refreshed the contents of this pull request, and previous >> commits have been removed. The incremental views will show differences >> compared to the previous content of the PR. The pull request contains one >> new commit since the last revision: >> >> 8385513: AArch64: Improve ArraysSupport.vectorizedHashCode performance for >> large arrays >> >> The current AArch64 implementation of ArraysSupport.vectorizedHashCode >> processes polynomial reductions in relatively small groups, which limits >> parallelism in the hash accumulation path for large arrays. >> >> This change increases polynomial batch size to 16-element groups using a >> larger precomputed powers-of-31 table. The updated implementation enables >> more independent multiply operations and reduces dependency chains in the >> main hashing loop. >> >> The optimization also reduces generated stub size for all supported >> element types, lowering instruction cache pressure in hot hashing >> workloads. >> >> The optimization applies to boolean[], byte[], char[], short[], and >> int[] array hashing paths and is enabled only for array lengths >= 8. >> Shorter arrays continue to use the existing scalar implementation. >> >> Generated stub size reduction: >> | Element type | New size | JDK 25 size | Reduction | >> | ------------ | -------- | ----------- | --------- | >> | boolean | 332 B | 428 B | -96 B | >> | byte | 332 B | 428 B | -96 B | >> | char | 332 B | 408 B | -76 B | >> | short | 332 B | 408 B | -76 B | >> | int | 300 B | 324 B | -24 B | >> >> ---------------------------------------------------- >> BYTE[] Arrays.hashCode throughput (ops/ms): >> Lengths below 8 use the existing scalar path and are therefore expected to >> show no meaningful change. >> >> | Length | Baseline | New | Improvement | >> |--------|----------|--------|-------------| >> | 2 | 696842 | 681572 | -2.2% | >> | 7 | 349082 | 349392 | +0.1% | >> | 8 | 309193 | 395677 | +28.0% | >> | 9 | 294240 | 367510 | +24.9% | >> | 15 | 160372 | 202718 | +26.4% | >> | 16 | 241651 | 348854 | +44.4% | >> | 17 | 228929 | 308820 | +34.9% | >> | 23 | 139463 | 186679 | +33.9% | >> | 24 | 177955 | 253809 | +42.6% | >> | 25 | 173594 | 253786 | +46.2% | >> | 31 | 113638 ... > > src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 9197: > >> 9195: bool widen_signed = false; >> 9196: >> 9197: auto widen = [&](FloatRegister dst1, > > Suggestion: > > auto widen = [&_masm](FloatRegister dst1, Updated the helper lambdas to use explicit captures; the signed-widening mode is now passed as an explicit parameter. > src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 9427: > >> 9425: __ emit_int32(1u); // 31^0 >> 9426: >> 9427: __ bind(L_after_table); > > Use a `for` loop here to generate powers of 31. Replaced the handwritten power table with a loop using intpow(). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/31674#discussion_r3482261134 PR Review Comment: https://git.openjdk.org/jdk/pull/31674#discussion_r3482256844
