James Kebinger created HIVE-4175:
------------------------------------

             Summary: Injection of emptyFile into input splits for empty 
partitions causes Deserializer to fail
                 Key: HIVE-4175
                 URL: https://issues.apache.org/jira/browse/HIVE-4175
             Project: Hive
          Issue Type: Bug
    Affects Versions: 0.10.0
         Environment: CDH4.2, using MR1
            Reporter: James Kebinger
            Priority: Minor


My deserializer is expecting to receive one of 2 different subclasses of 
Writable, but in certain circumstances it receives an empty instance of 
org.apache.hadoop.io.Text. This only happens for task attempts where I observe 
the file called "emptyFile" in the list of input splits. 

I'm doing queries over an external year/month/day partitioned table that have 
eagerly created partitions for, so as of today for example, I may do a query 
where year = 2013 and month = 3 which includes empty partitions.

In the course of investigation I downloaded the sequence files to confirm they 
were ok. Once I realized that processing of empty partitions was to blame, I am 
able to work around the issue by bounding my queries to populated partitions.

Can the need for the emptyFile be eliminated in the case where there's already 
a bunch of splits being processed? Failing that, can the mapper detect the 
current input is from emptyFile and not call the deserializer.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to