Eugene Koifman created HIVE-20327:
-------------------------------------
Summary: Compactor should gracefully handle 0 length files and
invalid orc files
Key: HIVE-20327
URL: https://issues.apache.org/jira/browse/HIVE-20327
Project: Hive
Issue Type: Improvement
Components: Transactions
Affects Versions: 2.0.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
Older versions of Streaming API did not handle interrupts well and could leave
0-length ORC files behind which cannot be read.
These should be just skipped.
Other cases of file where ORC Reader cannot be created
1. regular write (1 txn delta) where the client died and didn't properly close
the file - this delta should be aborted and never read
2. streaming ingest write (delta_x_y, x < y). There should always be a side
file if the file was not closed properly. (though it may still indicate that
length is 0)
If we check these cases and still can't create a reader, it should not silently
skip the file since the system thinks it contains at least some committed data
but the file is corrupted (and the side file doesn't point at a valid footer) -
we should never be in this situation and we should throw so that the end user
can try manual intervention (where the only option may be deleting the file)
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)