[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins moved HADOOP-5340 to MAPREDUCE-2812:
------------------------------------------------

    Affects Version/s:     (was: 0.19.1)
                  Key: MAPREDUCE-2812  (was: HADOOP-5340)
              Project: Hadoop Map/Reduce  (was: Hadoop Common)

> Combiner that aggregates all the mappers from a machine
> -------------------------------------------------------
>
>                 Key: MAPREDUCE-2812
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2812
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Nathan Marz
>
> From what I can tell, the Combiner just aggregates data from a single map 
> task. It would be useful, especially during map-only jobs, to have a combiner 
> that aggregates data from all the map tasks on a given machine. My use case 
> for this is to vertically partition a set of records which start out in the 
> same files. By doing this in a map-only task, way too many files are created 
> (About 50 files are created per input split). By pumping all the data through 
> a reducer, a lot of unnecessary overhead occurs. With the proposed feature, I 
> would get 50*number of machines files rather than 50*number of input splits 
> files for this use case.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to