[ http://issues.apache.org/jira/browse/HADOOP-570?page=all ]

Doug Cutting resolved HADOOP-570.
---------------------------------

    Resolution: Duplicate

This is a duplicate of HADOOP-331.

> Map tasks may fail due to out of memory, if the number of reducers are 
> moderately big
> -------------------------------------------------------------------------------------
>
>                 Key: HADOOP-570
>                 URL: http://issues.apache.org/jira/browse/HADOOP-570
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Runping Qi
>
> Map tasks may fail due to out of memory, if the number of reducers are 
> moderately big. 
> In my case, I set child task heap size to 1GB, turned on compression for the 
> mapoutput files. 
> The average size of input records is about 30K (I don't know the variation 
> though). 
> A lot of map tasks failed due to out of memory when the number of reducers 
> was at 400 and higher.
> The number of reducers can be somewhat higher (as high as 800) if the 
> compression for the mapoutput files was off).
> This problem will impose a hard limit on the scalability of map/reduce 
> clusters.
> One possible solution to this problem is to let the mapper to write out 
> single map output file, 
> and then to perform sort/partition as a separate phrase. 
> his will also make it unnecessary for  the reducers to perform sort on 
> individual portions from mappers. 
> Rather, the reducers should just perform merge operations on the map output 
> files directly. 
> This may even allow the possibility of dynamically collect some statistics of 
>  the map outputs and 
> use the stats to drive the partition on the mapper side, and obtain the 
> optimal merge plan on the reducer side!
>  

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to