martin-g commented on code in PR #3175:
URL: https://github.com/apache/avro/pull/3175#discussion_r1899426862
##########
lang/java/avro/src/main/java/org/apache/avro/util/Utf8.java:
##########
@@ -173,9 +178,15 @@ public int hashCode() {
if (h == 0) {
byte[] bytes = this.bytes;
int length = this.length;
- h = 1;
- for (int i = 0; i < length; i++) {
- h = h * 31 + bytes[i];
+ // If the array is filled, use the underlying JDK hash functionality.
+ // Starting with JDK 21, the underlying implementation is vectorized.
+ if (length > 7 && bytes.length == length) {
+ h = Arrays.hashCode(bytes);
Review Comment:
How does ` Arrays.hashCode(bytes)` behave when the length is smaller ?
Doesn't it fall back to serial execution internally for anything that is not
vectorizable ?
##########
lang/java/avro/src/test/java/org/apache/avro/util/TestUtf8.java:
##########
@@ -99,6 +99,26 @@ void hashCodeReused() {
assertEquals(4122302, u.hashCode());
}
+ /**
+ * There are two different code paths that hashcode() can call depending on
the
+ * state of the internal buffer. If the buffer is full (string length eq.
buffer
+ * length) then the JDK hashcode function can be used. This function can is
Review Comment:
`This function can is vectorized ...` sounds incorrect
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]