Re: [PR] HIVE-28411: Bucket Map Join on Iceberg tables [hive]

via GitHub Thu, 31 Oct 2024 09:43:45 -0700


deniskuzZ commented on code in PR #5409:
URL: https://github.com/apache/hive/pull/5409#discussion_r1824806132



##########
ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java:
##########
@@ -260,6 +275,18 @@ public byte[] getBytesForHash() {
         return allBytes;
       }
     }
+
+    public byte[] getBytesForIdentity() {

Review Comment:
   originally I wanted to ask the same :), why don't we reuse `getBytesForHash` 
and then noticed that we divide getStart() by 8
   HIVE-14680 : retain consistent splits /during/ (as opposed to across) LLAP 
failures on top of HIVE-14589
   ````
      // Explicitly using only the start offset of a split, and not the length. 
Splits generated on
       // block boundaries and stripe boundaries can vary slightly. Try hashing 
both to the same node.
       // There is the drawback of potentially hashing the same data on 
multiple nodes though, when a
       // large split is sent to 1 node, and a second invocation uses smaller 
chunks of the previous
       // large split and send them to different nodes.
   ````
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] HIVE-28411: Bucket Map Join on Iceberg tables [hive]

Reply via email to