Protection against incorrectly configured reduces
-------------------------------------------------
Key: MAPREDUCE-1521
URL: https://issues.apache.org/jira/browse/MAPREDUCE-1521
Project: Hadoop Map/Reduce
Issue Type: Improvement
Components: jobtracker
Reporter: Arun C Murthy
Assignee: Arun C Murthy
Fix For: 0.22.0
We've seen a fair number of instances where naive users process huge data-sets
(>10TB) with badly mis-configured #reduces e.g. 1 reduce.
This is a significant problem on large clusters since it takes each attempt of
the reduce a long time to shuffle and then run into problems such as local
disk-space etc. Then it takes 4 such attempts.
Proposal: Come up with heuristics/configs to fail such jobs early.
Thoughts?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.