[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13064923#comment-13064923
 ] 

Aaron T. Myers commented on MAPREDUCE-2684:
-------------------------------------------

Hey Robert, is this not a duplicate of 
https://issues.apache.org/jira/browse/MAPREDUCE-2324 ?

> Job Tracker can starve reduces with very large input.
> -----------------------------------------------------
>
>                 Key: MAPREDUCE-2684
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2684
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobtracker
>    Affects Versions: 0.20.204.0
>            Reporter: Robert Joseph Evans
>            Assignee: Robert Joseph Evans
>
> If mapreduce.reduce.input.limit is mis-configured or if a cluster is just 
> running low on disk space in general then reduces with large a input may 
> never get scheduled causing the Job to never fail and never succeed, just 
> starve until the job is killed.
> The JobInProgess tries to guess at the size of the input to all reducers in a 
> job.  If the size is over mapreduce.reduce.input.limit then the job is 
> killed.  If it is not then findNewReduceTask() checks to see if the estimated 
> size is too big to fit on the node currently looking for work.  If it is not 
> then it will let some other task have a chance at the slot.
> The idea is to keep track of how often it happens that a Reduce Slot is 
> rejected because of the lack of space vs how often it succeeds and then guess 
> if the reduce tasks will ever be scheduled.
> So I would like some feedback on this.
> 1) How should we guess.  Someone who found the bug here suggested P1 + (P2 * 
> S), where S is the number of successful assignments.  Possibly P1 = 20 and P2 
> = 2.0.  I am not really sure.
> 2) What should we do when we guess that it will never get a slot?  Should we 
> fail the job or do we say, even though it might fail, well lets just schedule 
> the it and see if it really will fail.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to