> The current AArch64 implementation of ArraysSupport.vectorizedHashCode 
> processes polynomial reductions in relatively small groups, which limits 
> parallelism in the hash accumulation path for large arrays.
> 
> This change increases polynomial batch size to 16-element groups using a 
> larger precomputed powers-of-31 table. The updated implementation enables 
> more independent multiply operations and reduces dependency chains in the 
> main hashing loop.
> 
> The optimization also reduces generated stub size for all supported element 
> types, lowering instruction cache pressure in hot hashing workloads.
> 
> The optimization applies to boolean[], byte[], char[], short[], and int[] 
> array hashing paths and is enabled only for array lengths >= 8. Shorter 
> arrays continue to use the existing scalar implementation.
> 
> Generated stub size reduction:
> 
> 
> | Element type | New size | JDK 25 size | Reduction | 
> | ------------ | -------- | ----------- | --------- |
> | boolean      | 332 B    | 428 B       | -96 B     |
> | byte         | 332 B    | 428 B       | -96 B     |
> | char         | 332 B    | 408 B       | -76 B     |
> | short        | 332 B    | 408 B       | -76 B     |
> | int          | 300 B    | 324 B       | -24 B     |
> 
> ## BYTE[] Arrays.hashCode throughput (ops/ms):
> Lengths below 8 use the existing scalar path and are therefore expected to 
> show no meaningful change.
> 
> | Length | Baseline | New    | Improvement |
> |--------|----------|--------|-------------|
> | 2      | 696842   | 681572 | -2.2%       |
> | 7      | 349082   | 349392 | +0.1%       |
> | 8      | 309193   | 395677 | +28.0%      |
> | 9      | 294240   | 367510 | +24.9%      |
> | 15     | 160372   | 202718 | +26.4%      |
> | 16     | 241651   | 348854 | +44.4%      |
> | 17     | 228929   | 308820 | +34.9%      |
> | 23     | 139463   | 186679 | +33.9%      |
> | 24     | 177955   | 253809 | +42.6%      |
> | 25     | 173594   | 253786 | +46.2%      |
> | 31     | 113638   | 159672 | +40.5%      |
> | 32     | 164228   | 214765 | +30.8%      |
> | 33     | 155093   | 199425 | +28.6%      |
> | 47     | 103190   | 135190 | +31.0%      |
> | 48     | 116600   | 145178 | +24.5%      |
> | 49     | 112067   | 163144 | +45.6%      |
> | 63     | 79978    | 116111 | +45.2%      |
> | 64     | 104182   | 130175 | +25.0%      |
> | 65     | 101735   | 125010 | +22.9%      |
> 
> 
> ## CHAR[] Arrays.hashCode throughput (ops/ms)
> 
> | Length | Baseline | New    | Improvement |
> |--------|----------|--------|-------------|
> | 2      | 696254   | 696646 | +0.1%       |
> | 7      |...

Ehsan Behrangi has refreshed the contents of this pull request, and previous 
commits have been removed. The incremental views will show differences compared 
to the previous content of the PR. The pull request contains one new commit 
since the last revision:

  8385513: AArch64: Improve ArraysSupport.vectorizedHashCode performance for 
large arrays
  
  The current AArch64 implementation of ArraysSupport.vectorizedHashCode
  processes polynomial reductions in relatively small groups, which limits
  parallelism in the hash accumulation path for large arrays.
  
  This change increases polynomial batch size to 16-element groups using a
  larger precomputed powers-of-31 table. The updated implementation enables
  more independent multiply operations and reduces dependency chains in the
  main hashing loop.
  
  The optimization also reduces generated stub size for all supported
  element types, lowering instruction cache pressure in hot hashing
  workloads.
  
  The optimization applies to boolean[], byte[], char[], short[], and
  int[] array hashing paths and is enabled only for array lengths >= 8.
  Shorter arrays continue to use the existing scalar implementation.
  
  Generated stub size reduction:
  | Element type | New size | JDK 25 size | Reduction |
  | ------------ | -------- | ----------- | --------- |
  | boolean      | 332 B    | 428 B       | -96 B     |
  | byte         | 332 B    | 428 B       | -96 B     |
  | char         | 332 B    | 408 B       | -76 B     |
  | short        | 332 B    | 408 B       | -76 B     |
  | int          | 300 B    | 324 B       | -24 B     |
  
  ----------------------------------------------------
  BYTE[] Arrays.hashCode throughput (ops/ms):
  Lengths below 8 use the existing scalar path and are therefore expected to 
show no meaningful change.
  
  | Length | Baseline | New    | Improvement |
  |--------|----------|--------|-------------|
  | 2      | 696842   | 681572 | -2.2%       |
  | 7      | 349082   | 349392 | +0.1%       |
  | 8      | 309193   | 395677 | +28.0%      |
  | 9      | 294240   | 367510 | +24.9%      |
  | 15     | 160372   | 202718 | +26.4%      |
  | 16     | 241651   | 348854 | +44.4%      |
  | 17     | 228929   | 308820 | +34.9%      |
  | 23     | 139463   | 186679 | +33.9%      |
  | 24     | 177955   | 253809 | +42.6%      |
  | 25     | 173594   | 253786 | +46.2%      |
  | 31     | 113638   | 159672 | +40.5%      |
  | 32     | 164228   | 214765 | +30.8%      |
  | 33     | 155093   | 199425 | +28.6%      |
  | 47     | 103190   | 135190 | +31.0%      |
  | 48     | 116600   | 145178 | +24.5%      |
  | 49     | 112067   | 163144 | +45.6%      |
  | 63     | 79978    | 116111 | +45.2%      |
  | 64     | 104182   | 130175 | +25.0%      |
  | 65     | 101735   | 125010 | +22.9%      |
  
  -------------------------------------------
  CHAR[] Arrays.hashCode throughput (ops/ms)
  | Length | Baseline | New    | Improvement |
  |--------|----------|--------|-------------|
  | 2      | 696254   | 696646 | +0.1%       |
  | 7      | 351199   | 347674 | -1.0%       |
  | 8      | 307065   | 398830 | +29.9%      |
  | 9      | 279152   | 373828 | +33.9%      |
  | 15     | 168873   | 211161 | +25.0%      |
  | 16     | 246685   | 359181 | +45.6%      |
  | 17     | 231574   | 319731 | +38.1%      |
  | 23     | 140617   | 193354 | +37.5%      |
  | 24     | 188697   | 289453 | +53.4%      |
  | 25     | 181149   | 265244 | +46.4%      |
  | 31     | 114859   | 168630 | +46.8%      |
  | 32     | 178221   | 207204 | +16.3%      |
  | 33     | 171169   | 231739 | +35.4%      |
  | 47     | 105332   | 145419 | +38.1%      |
  | 48     | 120754   | 197517 | +63.6%      |
  | 49     | 115156   | 184969 | +60.6%      |
  | 63     | 83664    | 127759 | +52.7%      |
  | 64     | 119575   | 154688 | +29.4%      |
  | 65     | 116870   | 147749 | +26.4%      |
  
  -------------------------------------------
  SHORT[] Arrays.hashCode throughput (ops/ms)
  
  | Length | Baseline | New    | Improvement |
  |--------|----------|--------|-------------|
  | 2      | 697735   | 696917 | -0.1%       |
  | 7      | 350484   | 348131 | -0.7%       |
  | 8      | 305960   | 398837 | +30.4%      |
  | 9      | 279146   | 367976 | +31.8%      |
  | 15     | 167151   | 211794 | +26.7%      |
  | 16     | 246754   | 358048 | +45.1%      |
  | 17     | 231731   | 321910 | +38.9%      |
  | 23     | 139937   | 188696 | +34.8%      |
  | 24     | 184464   | 289120 | +56.7%      |
  | 25     | 181133   | 265296 | +46.5%      |
  | 31     | 114607   | 167787 | +46.4%      |
  | 32     | 178193   | 259802 | +45.8%      |
  | 33     | 171439   | 231916 | +35.3%      |
  | 47     | 105341   | 145975 | +38.6%      |
  | 48     | 120779   | 197006 | +63.1%      |
  | 49     | 115701   | 185225 | +60.1%      |
  | 63     | 83677    | 127688 | +52.6%      |
  | 64     | 112239   | 155357 | +38.4%      |
  | 65     | 116872   | 147735 | +26.4%      |
  
  -------------------------------------------
  INT[] Arrays.hashCode throughput (ops/ms)
  
  | Length | Baseline | New    | Improvement |
  |--------|----------|--------|-------------|
  | 2      | 697667   | 697866 | +0.0%       |
  | 7      | 351776   | 349918 | -0.5%       |
  | 8      | 279132   | 398794 | +42.9%      |
  | 9      | 282044   | 369000 | +30.8%      |
  | 15     | 216797   | 212897 | -1.8%       |
  | 16     | 228853   | 376437 | +64.5%      |
  | 17     | 206776   | 310186 | +50.0%      |
  | 23     | 168377   | 198746 | +18.0%      |
  | 24     | 184100   | 278781 | +51.5%      |
  | 25     | 172023   | 253821 | +47.6%      |
  | 31     | 138354   | 171569 | +24.0%      |
  | 32     | 173431   | 253249 | +46.0%      |
  | 33     | 164210   | 232667 | +41.7%      |
  | 47     | 117697   | 146898 | +24.8%      |
  | 48     | 139514   | 192511 | +38.0%      |
  | 49     | 134649   | 158293 | +17.6%      |
  | 63     | 101384   | 132083 | +30.3%      |
  | 64     | 118405   | 160644 | +35.7%      |
  | 65     | 112848   | 146963 | +30.2%      |

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/31674/files
  - new: https://git.openjdk.org/jdk/pull/31674/files/940c07cd..ba6fe311

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=31674&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=31674&range=00-01

  Stats: 38 lines in 1 file changed: 2 ins; 12 del; 24 mod
  Patch: https://git.openjdk.org/jdk/pull/31674.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/31674/head:pull/31674

PR: https://git.openjdk.org/jdk/pull/31674

Reply via email to