deniskuzZ commented on code in PR #5409:
URL: https://github.com/apache/hive/pull/5409#discussion_r1824806132
##########
ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java:
##########
@@ -260,6 +275,18 @@ public byte[] getBytesForHash() {
return allBytes;
}
}
+
+ public byte[] getBytesForIdentity() {
Review Comment:
originally I wanted to ask the same :), why don't we reuse `getBytesForHash`
and then noticed that we divide getStart() by 8
HIVE-14680 : retain consistent splits /during/ (as opposed to across) LLAP
failures on top of HIVE-14589
````
// Explicitly using only the start offset of a split, and not the length.
Splits generated on
// block boundaries and stripe boundaries can vary slightly. Try hashing
both to the same node.
// There is the drawback of potentially hashing the same data on
multiple nodes though, when a
// large split is sent to 1 node, and a second invocation uses smaller
chunks of the previous
// large split and send them to different nodes.
````
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]