Combiner that aggregates all the mappers from a machine
-------------------------------------------------------

                 Key: HADOOP-5340
                 URL: https://issues.apache.org/jira/browse/HADOOP-5340
             Project: Hadoop Core
          Issue Type: New Feature
    Affects Versions: 0.19.1
            Reporter: Nathan Marz


>From what I can tell, the Combiner just aggregates data from a single map 
>task. It would be useful, especially during map-only jobs, to have a combiner 
>that aggregates data from all the map tasks on a given machine. My use case 
>for this is to vertically partition a set of records which start out in the 
>same files. By doing this in a map-only task, way too many files are created 
>(About 50 files are created per input split). By pumping all the data through 
>a reducer, a lot of unnecessary overhead occurs. With the proposed feature, I 
>would get 50*number of machines files rather than 50*number of input splits 
>files for this use case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to