Aggarwal-Raghav commented on code in PR #5391:
URL: https://github.com/apache/hive/pull/5391#discussion_r1732979618
##########
ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcNewSplit.java:
##########
@@ -101,7 +102,7 @@ public void readFields(DataInput in) throws IOException {
byte[] tailBuffer = new byte[tailLen];
in.readFully(tailBuffer);
OrcProto.FileTail fileTail = OrcProto.FileTail.parseFrom(tailBuffer);
- orcTail = new OrcTail(fileTail, null);
+ orcTail = new OrcTail(fileTail, new BufferChunk(0, 0), -1);
Review Comment:
@zhangbutao, In my opinion it is not because of orc version upgrade.
In tez on yarn flow, this issue is surfaced after orc version upgrade.
in tez on llap, i think this issue is since
[HIVE-15665](https://issues.apache.org/jira/browse/HIVE-15665 as
https://github.com/apache/hive/blob/d0d5d6d7d11b3eece0d0bc17b429cb30dec5dc79/llap-server/src/java/org/apache/hadoop/hive/llap/io/encoded/OrcEncodedDataReader.java#L669
requires serialized footer which with _`hive.orc.splits.include.file.footer
enabled`_ is null (earlier) and (empty buffer) in this PR, both won't help.
This code requires actual serialized buffer which I think can only be
obtained by _`extractFileTail`_ function call. because if we pass empty buffer
also, when I debugged, it is checking for last byte in the buffer that
represents postscript length and as in empty buffer case it is 0, so something
related to **_malformed ORC error is thrown_**
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]