Pankit Thapar created HIVE-8137:
-----------------------------------

             Summary: Empty ORC file handling
                 Key: HIVE-8137
                 URL: https://issues.apache.org/jira/browse/HIVE-8137
             Project: Hive
          Issue Type: Improvement
          Components: File Formats
    Affects Versions: 0.13.1
            Reporter: Pankit Thapar
             Fix For: 0.14.0


Hive 13 does not handle reading of a zero size Orc File properly. An Orc file 
is suposed to have a post-script
which the ReaderIml class tries to read and initialize the footer with it. But 
in case, the file is empty 
or is of zero size, then it runs into an IndexOutOfBound Exception because of 
ReaderImpl trying to read in its constructor.

Code Snippet : 
//get length of PostScript
int psLen = buffer.get(readSize - 1) & 0xff; 
In the above code, readSize for an empty file is zero.

I see that ensureOrcFooter() method performs some sanity checks for footer , 
so, either we can move the above code snippet to ensureOrcFooter() and throw a 
"Malformed ORC file exception" or we can create a dummy Reader that does not 
initialize footer and basically has hasNext() set to false so that it returns 
false on the first call.
Basically, I would like to know what might be the correct way to handle an 
empty ORC file in a mapred job?
Should we neglect it and not throw an exception or we can throw an exeption 
that the ORC file is malformed.
Please let me know your thoughts on this.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to