deniskuzZ commented on code in PR #5409: URL: https://github.com/apache/hive/pull/5409#discussion_r1824806132
########## ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java: ########## @@ -260,6 +275,18 @@ public byte[] getBytesForHash() { return allBytes; } } + + public byte[] getBytesForIdentity() { Review Comment: originally I wanted to ask why don't we reuse `getBytesForHash` and then noticed that we divide getStart() by 8 HIVE-14680 : retain consistent splits /during/ (as opposed to across) LLAP failures on top of HIVE-14589 ```` // Explicitly using only the start offset of a split, and not the length. Splits generated on // block boundaries and stripe boundaries can vary slightly. Try hashing both to the same node. // There is the drawback of potentially hashing the same data on multiple nodes though, when a // large split is sent to 1 node, and a second invocation uses smaller chunks of the previous // large split and send them to different nodes. ```` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org