Yin Huai created HIVE-4868:
------------------------------

             Summary: When reading an ORC file by an MR job, some Mappers may 
not be able to process data in some cases
                 Key: HIVE-4868
                 URL: https://issues.apache.org/jira/browse/HIVE-4868
             Project: Hive
          Issue Type: Improvement
            Reporter: Yin Huai


Let's say a stripe of an ORC file is 256 MB and we set the split size for an MR 
job to 64 MB. Right now, splits are created based on byte ranges. 
Here is an example:
{code}
|<-The start of a stripe                |<-The end of a stripe
v                                       v
|---------------------------------------|
   ^                        ^ 
   |<- The start of a split |<- The end of a split
{\code}

So, for some Mappers, it is possible that there is no start of a stripe within 
the byte range of a split. Those Mappers will process 0 record. We can improve 
how splits are created for ORC.



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to