On Thu, 25 Jun 2026 11:05:30 GMT, Ehsan Behrangi <[email protected]> wrote:
> The current AArch64 implementation of ArraysSupport.vectorizedHashCode > processes polynomial reductions in relatively small groups, which limits > parallelism in the hash accumulation path for large arrays. > > This change increases polynomial batch size to 16-element groups using a > larger precomputed powers-of-31 table. The updated implementation enables > more independent multiply operations and reduces dependency chains in the > main hashing loop. > > The optimization also reduces generated stub size for all supported element > types, lowering instruction cache pressure in hot hashing workloads. > > The optimization applies to boolean[], byte[], char[], short[], and int[] > array hashing paths and is enabled only for array lengths >= 8. Shorter > arrays continue to use the existing scalar implementation. > > Generated stub size reduction: > > > | Element type | New size | JDK 25 size | Reduction | > | ------------ | -------- | ----------- | --------- | > | boolean | 332 B | 428 B | -96 B | > | byte | 332 B | 428 B | -96 B | > | char | 332 B | 408 B | -76 B | > | short | 332 B | 408 B | -76 B | > | int | 300 B | 324 B | -24 B | > > ## BYTE[] Arrays.hashCode throughput (ops/ms): > Lengths below 8 use the existing scalar path and are therefore expected to > show no meaningful change. > > | Length | Baseline | New | Improvement | > |--------|----------|--------|-------------| > | 2 | 696842 | 681572 | -2.2% | > | 7 | 349082 | 349392 | +0.1% | > | 8 | 309193 | 395677 | +28.0% | > | 9 | 294240 | 367510 | +24.9% | > | 15 | 160372 | 202718 | +26.4% | > | 16 | 241651 | 348854 | +44.4% | > | 17 | 228929 | 308820 | +34.9% | > | 23 | 139463 | 186679 | +33.9% | > | 24 | 177955 | 253809 | +42.6% | > | 25 | 173594 | 253786 | +46.2% | > | 31 | 113638 | 159672 | +40.5% | > | 32 | 164228 | 214765 | +30.8% | > | 33 | 155093 | 199425 | +28.6% | > | 47 | 103190 | 135190 | +31.0% | > | 48 | 116600 | 145178 | +24.5% | > | 49 | 112067 | 163144 | +45.6% | > | 63 | 79978 | 116111 | +45.2% | > | 64 | 104182 | 130175 | +25.0% | > | 65 | 101735 | 125010 | +22.9% | > > > ## CHAR[] Arrays.hashCode throughput (ops/ms) > > | Length | Baseline | New | Improvement | > |--------|----------|--------|-------------| > | 2 | 696254 | 696646 | +0.1% | > | 7 |... src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 9427: > 9425: __ emit_int32(1u); // 31^0 > 9426: > 9427: __ bind(L_after_table); Use a `for` loop here to generate powers of 31. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/31674#discussion_r3474115547
