> The current AArch64 implementation of ArraysSupport.vectorizedHashCode > processes polynomial reductions in relatively small groups, which limits > parallelism in the hash accumulation path for large arrays. > > This change increases polynomial batch size to 16-element groups using a > larger precomputed powers-of-31 table. The updated implementation enables > more independent multiply operations and reduces dependency chains in the > main hashing loop. > > The optimization also reduces generated stub size for all supported element > types, lowering instruction cache pressure in hot hashing workloads. > > The optimization applies to boolean[], byte[], char[], short[], and int[] > array hashing paths and is enabled only for array lengths >= 8. Shorter > arrays continue to use the existing scalar implementation. > > Generated stub size reduction: > > > | Element type | New size | JDK 25 size | Reduction | > | ------------ | -------- | ----------- | --------- | > | boolean | 332 B | 428 B | -96 B | > | byte | 332 B | 428 B | -96 B | > | char | 332 B | 408 B | -76 B | > | short | 332 B | 408 B | -76 B | > | int | 300 B | 324 B | -24 B | > > ## BYTE[] Arrays.hashCode throughput (ops/ms): > Lengths below 8 use the existing scalar path and are therefore expected to > show no meaningful change. > > | Length | Baseline | New | Improvement | > |--------|----------|--------|-------------| > | 2 | 696842 | 681572 | -2.2% | > | 7 | 349082 | 349392 | +0.1% | > | 8 | 309193 | 395677 | +28.0% | > | 9 | 294240 | 367510 | +24.9% | > | 15 | 160372 | 202718 | +26.4% | > | 16 | 241651 | 348854 | +44.4% | > | 17 | 228929 | 308820 | +34.9% | > | 23 | 139463 | 186679 | +33.9% | > | 24 | 177955 | 253809 | +42.6% | > | 25 | 173594 | 253786 | +46.2% | > | 31 | 113638 | 159672 | +40.5% | > | 32 | 164228 | 214765 | +30.8% | > | 33 | 155093 | 199425 | +28.6% | > | 47 | 103190 | 135190 | +31.0% | > | 48 | 116600 | 145178 | +24.5% | > | 49 | 112067 | 163144 | +45.6% | > | 63 | 79978 | 116111 | +45.2% | > | 64 | 104182 | 130175 | +25.0% | > | 65 | 101735 | 125010 | +22.9% | > > > ## CHAR[] Arrays.hashCode throughput (ops/ms) > > | Length | Baseline | New | Improvement | > |--------|----------|--------|-------------| > | 2 | 696254 | 696646 | +0.1% | > | 7 |...
Ehsan Behrangi has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: 8385513: AArch64: Improve ArraysSupport.vectorizedHashCode performance for large arrays The current AArch64 implementation of ArraysSupport.vectorizedHashCode processes polynomial reductions in relatively small groups, which limits parallelism in the hash accumulation path for large arrays. This change increases polynomial batch size to 16-element groups using a larger precomputed powers-of-31 table. The updated implementation enables more independent multiply operations and reduces dependency chains in the main hashing loop. The optimization also reduces generated stub size for all supported element types, lowering instruction cache pressure in hot hashing workloads. The optimization applies to boolean[], byte[], char[], short[], and int[] array hashing paths and is enabled only for array lengths >= 8. Shorter arrays continue to use the existing scalar implementation. Generated stub size reduction: | Element type | New size | JDK 25 size | Reduction | | ------------ | -------- | ----------- | --------- | | boolean | 332 B | 428 B | -96 B | | byte | 332 B | 428 B | -96 B | | char | 332 B | 408 B | -76 B | | short | 332 B | 408 B | -76 B | | int | 300 B | 324 B | -24 B | ---------------------------------------------------- BYTE[] Arrays.hashCode throughput (ops/ms): Lengths below 8 use the existing scalar path and are therefore expected to show no meaningful change. | Length | Baseline | New | Improvement | |--------|----------|--------|-------------| | 2 | 696842 | 681572 | -2.2% | | 7 | 349082 | 349392 | +0.1% | | 8 | 309193 | 395677 | +28.0% | | 9 | 294240 | 367510 | +24.9% | | 15 | 160372 | 202718 | +26.4% | | 16 | 241651 | 348854 | +44.4% | | 17 | 228929 | 308820 | +34.9% | | 23 | 139463 | 186679 | +33.9% | | 24 | 177955 | 253809 | +42.6% | | 25 | 173594 | 253786 | +46.2% | | 31 | 113638 | 159672 | +40.5% | | 32 | 164228 | 214765 | +30.8% | | 33 | 155093 | 199425 | +28.6% | | 47 | 103190 | 135190 | +31.0% | | 48 | 116600 | 145178 | +24.5% | | 49 | 112067 | 163144 | +45.6% | | 63 | 79978 | 116111 | +45.2% | | 64 | 104182 | 130175 | +25.0% | | 65 | 101735 | 125010 | +22.9% | ------------------------------------------- CHAR[] Arrays.hashCode throughput (ops/ms) | Length | Baseline | New | Improvement | |--------|----------|--------|-------------| | 2 | 696254 | 696646 | +0.1% | | 7 | 351199 | 347674 | -1.0% | | 8 | 307065 | 398830 | +29.9% | | 9 | 279152 | 373828 | +33.9% | | 15 | 168873 | 211161 | +25.0% | | 16 | 246685 | 359181 | +45.6% | | 17 | 231574 | 319731 | +38.1% | | 23 | 140617 | 193354 | +37.5% | | 24 | 188697 | 289453 | +53.4% | | 25 | 181149 | 265244 | +46.4% | | 31 | 114859 | 168630 | +46.8% | | 32 | 178221 | 207204 | +16.3% | | 33 | 171169 | 231739 | +35.4% | | 47 | 105332 | 145419 | +38.1% | | 48 | 120754 | 197517 | +63.6% | | 49 | 115156 | 184969 | +60.6% | | 63 | 83664 | 127759 | +52.7% | | 64 | 119575 | 154688 | +29.4% | | 65 | 116870 | 147749 | +26.4% | ------------------------------------------- SHORT[] Arrays.hashCode throughput (ops/ms) | Length | Baseline | New | Improvement | |--------|----------|--------|-------------| | 2 | 697735 | 696917 | -0.1% | | 7 | 350484 | 348131 | -0.7% | | 8 | 305960 | 398837 | +30.4% | | 9 | 279146 | 367976 | +31.8% | | 15 | 167151 | 211794 | +26.7% | | 16 | 246754 | 358048 | +45.1% | | 17 | 231731 | 321910 | +38.9% | | 23 | 139937 | 188696 | +34.8% | | 24 | 184464 | 289120 | +56.7% | | 25 | 181133 | 265296 | +46.5% | | 31 | 114607 | 167787 | +46.4% | | 32 | 178193 | 259802 | +45.8% | | 33 | 171439 | 231916 | +35.3% | | 47 | 105341 | 145975 | +38.6% | | 48 | 120779 | 197006 | +63.1% | | 49 | 115701 | 185225 | +60.1% | | 63 | 83677 | 127688 | +52.6% | | 64 | 112239 | 155357 | +38.4% | | 65 | 116872 | 147735 | +26.4% | ------------------------------------------- INT[] Arrays.hashCode throughput (ops/ms) | Length | Baseline | New | Improvement | |--------|----------|--------|-------------| | 2 | 697667 | 697866 | +0.0% | | 7 | 351776 | 349918 | -0.5% | | 8 | 279132 | 398794 | +42.9% | | 9 | 282044 | 369000 | +30.8% | | 15 | 216797 | 212897 | -1.8% | | 16 | 228853 | 376437 | +64.5% | | 17 | 206776 | 310186 | +50.0% | | 23 | 168377 | 198746 | +18.0% | | 24 | 184100 | 278781 | +51.5% | | 25 | 172023 | 253821 | +47.6% | | 31 | 138354 | 171569 | +24.0% | | 32 | 173431 | 253249 | +46.0% | | 33 | 164210 | 232667 | +41.7% | | 47 | 117697 | 146898 | +24.8% | | 48 | 139514 | 192511 | +38.0% | | 49 | 134649 | 158293 | +17.6% | | 63 | 101384 | 132083 | +30.3% | | 64 | 118405 | 160644 | +35.7% | | 65 | 112848 | 146963 | +30.2% | ------------- Changes: - all: https://git.openjdk.org/jdk/pull/31674/files - new: https://git.openjdk.org/jdk/pull/31674/files/940c07cd..ba6fe311 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=31674&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=31674&range=00-01 Stats: 38 lines in 1 file changed: 2 ins; 12 del; 24 mod Patch: https://git.openjdk.org/jdk/pull/31674.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/31674/head:pull/31674 PR: https://git.openjdk.org/jdk/pull/31674
