BsoBird commented on code in PR #5543:
URL: https://github.com/apache/hive/pull/5543#discussion_r1843538394


##########
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/MurmurHashStringColStringCol.java:
##########
@@ -47,12 +47,15 @@ protected void hash(ColumnVector inputColVector1, 
ColumnVector inputColVector2,
     BytesColumnVector inV1 = (BytesColumnVector) inputColVector1;
     BytesColumnVector inV2 = (BytesColumnVector) inputColVector2;
 
+    int idx1 = inputColVector1.isRepeating ? 0 : i;
+    int idx2 = inputColVector2.isRepeating ? 0 : i;
+
     // hash of value from 1. column
-    int hash = inV1.isNull[i] ? 0
-      : Murmur3.hash32(inV1.vector[i], inV1.start[i], inV1.length[i], 
Murmur3.DEFAULT_SEED);
+    int hash = inV1.isNull[idx1] ? 0

Review Comment:
   Hello, I have reviewed your patch. I also used a similar patch to test and 
fix the issue some time ago. I only fixed MurmurHashStringColStringCol.java, 
and the repair plan is similar to yours. The NPE indeed disappeared, but my 
calculation result is missing about 3739 data compared to the correct result. I 
am currently investigating whether the issue is caused by our hash-result=0 or 
if there is a data loss issue with Iceberg-vector-read itself...
   
   but maybe i'm wrong



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to