nicu marasoiu created MAPREDUCE-5287:
----------------------------------------

             Summary: Create a generic InputFormat wrapping any other 
InputFormat, to control the number of map tasks
                 Key: MAPREDUCE-5287
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5287
             Project: Hadoop Map/Reduce
          Issue Type: Improvement
          Components: mrv1, performance
            Reporter: nicu marasoiu


I wrote a generic InputFormat that wraps any other InputFormat, and creates 
CompositeInputSplits to reduce the number of map tasks in a controllable manner 
while preserving data locality. A correspondent CompositeRecordReader is 
written to iterate through underlying RecordReaders as created by the 
underlying InputFormat for each underlying raw split.

An application to this is to group TableSplits when the raw splits are coming 
from multiple regions and are filtered with key ranges. We use this to 
shard/distribute a time based incremental access to an hbase table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to