Riju Trivedi created HIVE-29123:
-----------------------------------
Summary: Extend ProtobufInputFormat to handle EOFException for
partially written proto files.
Key: HIVE-29123
URL: https://issues.apache.org/jira/browse/HIVE-29123
Project: Hive
Issue Type: Improvement
Reporter: Riju Trivedi
Assignee: Riju Trivedi
HiveProtoLoggingHook in Hive and ProtoHistoryLoggingService in Tez logs query
execution, query plan, and other runtime statistics in protobuf files. These
proto files are exposed as EXTERNAL tables, read through
ProtobufMessageInputFormat.
An abrupt AM kill or OOM event can result in empty or partially written proto
files. Querying the table with empty/partially written files causes query
failure with EOFException.
{code:java}
Caused by: java.io.EOFException
at java.base/java.io.DataInputStream.readFully(DataInputStream.java:202)
at
org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:70)
at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:120)
at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2505)
at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2637)
at
org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:82)
at
org.apache.hadoop.hive.ql.io.protobuf.ProtobufMessageInputFormat$1.next(ProtobufMessageInputFormat.java:124)
at
org.apache.hadoop.hive.ql.io.protobuf.ProtobufMessageInputFormat$1.next(ProtobufMessageInputFormat.java:84)
at
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365)
... 24 more {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)