Re: [PR] AVRO-4060: Use JDK to Hash Byte Array in UTF8 [avro]

via GitHub Sat, 01 Feb 2025 20:20:28 -0800


belugabehr commented on code in PR #3175:
URL: https://github.com/apache/avro/pull/3175#discussion_r1938392751



##########
lang/java/avro/src/main/java/org/apache/avro/util/Utf8.java:
##########
@@ -173,9 +178,15 @@ public int hashCode() {
     if (h == 0) {
       byte[] bytes = this.bytes;
       int length = this.length;
-      h = 1;
-      for (int i = 0; i < length; i++) {
-        h = h * 31 + bytes[i];
+      // If the array is filled, use the underlying JDK hash functionality.
+      // Starting with JDK 21, the underlying implementation is vectorized.
+      if (length > 7 && bytes.length == length) {
+        h = Arrays.hashCode(bytes);

Review Comment:
   That is to say, the underlying JDK implementation will not vectorize if the 
array is less than 8 bytes, so there's no reason to jump into 
`Arrays.hashCode(bytes)`, it just falls back into the serial execution anyway. 
There is some other overhead involved as well, so it is just best to bypass it 
if the array is sufficiently small.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] AVRO-4060: Use JDK to Hash Byte Array in UTF8 [avro]

Reply via email to