[ http://issues.apache.org/jira/browse/HADOOP-570?page=all ]
Doug Cutting resolved HADOOP-570. --------------------------------- Resolution: Duplicate This is a duplicate of HADOOP-331. > Map tasks may fail due to out of memory, if the number of reducers are > moderately big > ------------------------------------------------------------------------------------- > > Key: HADOOP-570 > URL: http://issues.apache.org/jira/browse/HADOOP-570 > Project: Hadoop > Issue Type: Bug > Components: mapred > Reporter: Runping Qi > > Map tasks may fail due to out of memory, if the number of reducers are > moderately big. > In my case, I set child task heap size to 1GB, turned on compression for the > mapoutput files. > The average size of input records is about 30K (I don't know the variation > though). > A lot of map tasks failed due to out of memory when the number of reducers > was at 400 and higher. > The number of reducers can be somewhat higher (as high as 800) if the > compression for the mapoutput files was off). > This problem will impose a hard limit on the scalability of map/reduce > clusters. > One possible solution to this problem is to let the mapper to write out > single map output file, > and then to perform sort/partition as a separate phrase. > his will also make it unnecessary for the reducers to perform sort on > individual portions from mappers. > Rather, the reducers should just perform merge operations on the map output > files directly. > This may even allow the possibility of dynamically collect some statistics of > the map outputs and > use the stats to drive the partition on the mapper side, and obtain the > optimal merge plan on the reducer side! > -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira