[ http://issues.apache.org/jira/browse/HADOOP-570?page=all ]
Doug Cutting resolved HADOOP-570.
---------------------------------
Resolution: Duplicate
This is a duplicate of HADOOP-331.
> Map tasks may fail due to out of memory, if the number of reducers are
> moderately big
> -------------------------------------------------------------------------------------
>
> Key: HADOOP-570
> URL: http://issues.apache.org/jira/browse/HADOOP-570
> Project: Hadoop
> Issue Type: Bug
> Components: mapred
> Reporter: Runping Qi
>
> Map tasks may fail due to out of memory, if the number of reducers are
> moderately big.
> In my case, I set child task heap size to 1GB, turned on compression for the
> mapoutput files.
> The average size of input records is about 30K (I don't know the variation
> though).
> A lot of map tasks failed due to out of memory when the number of reducers
> was at 400 and higher.
> The number of reducers can be somewhat higher (as high as 800) if the
> compression for the mapoutput files was off).
> This problem will impose a hard limit on the scalability of map/reduce
> clusters.
> One possible solution to this problem is to let the mapper to write out
> single map output file,
> and then to perform sort/partition as a separate phrase.
> his will also make it unnecessary for the reducers to perform sort on
> individual portions from mappers.
> Rather, the reducers should just perform merge operations on the map output
> files directly.
> This may even allow the possibility of dynamically collect some statistics of
> the map outputs and
> use the stats to drive the partition on the mapper side, and obtain the
> optimal merge plan on the reducer side!
>
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira