Eugene Koifman created HIVE-20327: ------------------------------------- Summary: Compactor should gracefully handle 0 length files and invalid orc files Key: HIVE-20327 URL: https://issues.apache.org/jira/browse/HIVE-20327 Project: Hive Issue Type: Improvement Components: Transactions Affects Versions: 2.0.0 Reporter: Eugene Koifman Assignee: Eugene Koifman
Older versions of Streaming API did not handle interrupts well and could leave 0-length ORC files behind which cannot be read. These should be just skipped. Other cases of file where ORC Reader cannot be created 1. regular write (1 txn delta) where the client died and didn't properly close the file - this delta should be aborted and never read 2. streaming ingest write (delta_x_y, x < y). There should always be a side file if the file was not closed properly. (though it may still indicate that length is 0) If we check these cases and still can't create a reader, it should not silently skip the file since the system thinks it contains at least some committed data but the file is corrupted (and the side file doesn't point at a valid footer) - we should never be in this situation and we should throw so that the end user can try manual intervention (where the only option may be deleting the file) -- This message was sent by Atlassian JIRA (v7.6.3#76005)