RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops

Ludovic Henry Fri, 04 Mar 2022 08:26:07 -0800

Despite the hash value being cached for Strings, computing the hash still 
represents a significant CPU usage for applications handling lots of text.


Even though it would be generally better to do it through an enhancement to the 
autovectorizer, the complexity of doing it by hand is trivial and the gain is 
sizable (2x speedup) even without the Vector API. The algorithm has been 
proposed by Richard Startin and Paul Sandoz [1].

Speedup are as follows on a `Intel(R) Xeon(R) E-2276G CPU @ 3.80GHz`


Benchmark                                  (size)  Mode  Cnt     Score   Error  
Units
StringHashCode.Algorithm.scalar                 0  avgt          1.799          
ns/op
StringHashCode.Algorithm.scalar                 1  avgt          3.233          
ns/op
StringHashCode.Algorithm.scalar                10  avgt          6.617          
ns/op
StringHashCode.Algorithm.scalar               100  avgt         61.481          
ns/op
StringHashCode.Algorithm.scalar              1000  avgt        631.690          
ns/op
StringHashCode.Algorithm.scalar             10000  avgt       6293.611          
ns/op
StringHashCode.Algorithm.scalarUnrolled8        0  avgt          1.890          
ns/op
StringHashCode.Algorithm.scalarUnrolled8        1  avgt          3.494          
ns/op
StringHashCode.Algorithm.scalarUnrolled8       10  avgt          9.050          
ns/op
StringHashCode.Algorithm.scalarUnrolled8      100  avgt         31.725          
ns/op
StringHashCode.Algorithm.scalarUnrolled8     1000  avgt        321.031          
ns/op
StringHashCode.Algorithm.scalarUnrolled8    10000  avgt       3203.838          
ns/op
StringHashCode.Algorithm.scalarUnrolled16       0  avgt          1.953          
ns/op
StringHashCode.Algorithm.scalarUnrolled16       1  avgt          3.485          
ns/op
StringHashCode.Algorithm.scalarUnrolled16      10  avgt          7.041          
ns/op
StringHashCode.Algorithm.scalarUnrolled16     100  avgt         30.975          
ns/op
StringHashCode.Algorithm.scalarUnrolled16    1000  avgt        316.616          
ns/op
StringHashCode.Algorithm.scalarUnrolled16   10000  avgt       3208.658          
ns/op


At Datadog, we handle a great amount of text (through logs management for 
example), and hashing String represents a large part of our CPU usage. It's 
very unlikely that we are the only one as String.hashCode is such a core 
feature of the JVM-based languages with its use in HashMap for example. Having 
even only a 2x speedup would allow us to save thousands of CPU cores per month 
and improve correspondingly the energy/carbon impact.

[1] 
https://static.rainfocus.com/oracle/oow18/sess/1525822677955001tLqU/PF/codeone18-vector-API-DEV5081_1540354883936001Q3Sv.pdf

-------------

Commit messages:
 - 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops

Changes: https://git.openjdk.java.net/jdk/pull/7700/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=7700&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8282664
  Stats: 168 lines in 4 files changed: 165 ins; 0 del; 3 mod
  Patch: https://git.openjdk.java.net/jdk/pull/7700.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/7700/head:pull/7700

PR: https://git.openjdk.java.net/jdk/pull/7700

RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops

Reply via email to