deniskuzZ commented on code in PR #5409:
URL: https://github.com/apache/hive/pull/5409#discussion_r1824806132


##########
ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java:
##########
@@ -260,6 +275,18 @@ public byte[] getBytesForHash() {
         return allBytes;
       }
     }
+
+    public byte[] getBytesForIdentity() {

Review Comment:
   originally I wanted to ask why don't we reuse `getBytesForHash` and then 
noticed that we divide getStart() by 8
   HIVE-14680 : retain consistent splits /during/ (as opposed to across) LLAP 
failures on top of HIVE-14589
   ````
      // Explicitly using only the start offset of a split, and not the length. 
Splits generated on
       // block boundaries and stripe boundaries can vary slightly. Try hashing 
both to the same node.
       // There is the drawback of potentially hashing the same data on 
multiple nodes though, when a
       // large split is sent to 1 node, and a second invocation uses smaller 
chunks of the previous
       // large split and send them to different nodes.
   ````
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org

Reply via email to