Map tasks may fail due to out of memory, if the number of reducers are 
moderately big
-------------------------------------------------------------------------------------

                 Key: HADOOP-570
                 URL: http://issues.apache.org/jira/browse/HADOOP-570
             Project: Hadoop
          Issue Type: Bug
          Components: mapred
            Reporter: Runping Qi



Map tasks may fail due to out of memory, if the number of reducers are 
moderately big. 
In my case, I set child task heap size to 1GB, turned on compression for the 
mapoutput files. 
The average size of input records is about 30K (I don't know the variation 
though). 
A lot of map tasks failed due to out of memory when the number of reducers was 
at 400 and higher.
The number of reducers can be somewhat higher (as high as 800) if the 
compression for the mapoutput files was off).
This problem will impose a hard limit on the scalability of map/reduce clusters.

One possible solution to this problem is to let the mapper to write out single 
map output file, 
and then to perform sort/partition as a separate phrase. 
his will also make it unnecessary for  the reducers to perform sort on 
individual portions from mappers. 
Rather, the reducers should just perform merge operations on the map output 
files directly. 
This may even allow the possibility of dynamically collect some statistics of  
the map outputs and 
use the stats to drive the partition on the mapper side, and obtain the optimal 
merge plan on the reducer side!
 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to