Yin Huai created HIVE-4868:
------------------------------
Summary: When reading an ORC file by an MR job, some Mappers may
not be able to process data in some cases
Key: HIVE-4868
URL: https://issues.apache.org/jira/browse/HIVE-4868
Project: Hive
Issue Type: Improvement
Reporter: Yin Huai
Let's say a stripe of an ORC file is 256 MB and we set the split size for an MR
job to 64 MB. Right now, splits are created based on byte ranges.
Here is an example:
{code}
|<-The start of a stripe |<-The end of a stripe
v v
|---------------------------------------|
^ ^
|<- The start of a split |<- The end of a split
{\code}
So, for some Mappers, it is possible that there is no start of a stripe within
the byte range of a split. Those Mappers will process 0 record. We can improve
how splits are created for ORC.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira