Re: [PR] AVRO-4060: Use JDK to Hash Byte Array in UTF8 [avro]

via GitHub Mon, 30 Dec 2024 01:56:20 -0800


martin-g commented on code in PR #3175:
URL: https://github.com/apache/avro/pull/3175#discussion_r1899426862



##########
lang/java/avro/src/main/java/org/apache/avro/util/Utf8.java:
##########
@@ -173,9 +178,15 @@ public int hashCode() {
     if (h == 0) {
       byte[] bytes = this.bytes;
       int length = this.length;
-      h = 1;
-      for (int i = 0; i < length; i++) {
-        h = h * 31 + bytes[i];
+      // If the array is filled, use the underlying JDK hash functionality.
+      // Starting with JDK 21, the underlying implementation is vectorized.
+      if (length > 7 && bytes.length == length) {
+        h = Arrays.hashCode(bytes);

Review Comment:
   How does ` Arrays.hashCode(bytes)` behave when the length is smaller ?
   Doesn't it fall back to serial execution internally for anything that is not 
vectorizable ?



##########
lang/java/avro/src/test/java/org/apache/avro/util/TestUtf8.java:
##########
@@ -99,6 +99,26 @@ void hashCodeReused() {
     assertEquals(4122302, u.hashCode());
   }
 
+  /**
+   * There are two different code paths that hashcode() can call depending on 
the
+   * state of the internal buffer. If the buffer is full (string length eq. 
buffer
+   * length) then the JDK hashcode function can be used. This function can is

Review Comment:
   `This function can is vectorized ...` sounds incorrect



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] AVRO-4060: Use JDK to Hash Byte Array in UTF8 [avro]

Reply via email to