[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13648540#comment-13648540
 ] 

Sandy Ryza commented on MAPREDUCE-4366:
---------------------------------------

Thanks for looking it over, Arun.  I'll upload a new patch that renames that 
local variable.

I was coming up against the following situation: an attempt that had a 
speculative attempt running completes, causing the job to complete.  
speculative(Map|Reduce)Tasks would be decremented because there were other 
running attempts.  When a job completes, the number of waiting tasks is 
decremented by pendingMaps (initialTasks=10 + speculativeTasks=0 - 
runningTasks=1 - finishedTasks=10 - failedTasks=0) = -1.  Thus, after the job 
had completed, there would still be 1 task counted as waiting for it.

There didn't seem to be a clear definition of speculative(Map|Reduce)Tasks, so 
the one I came up with is that the number of speculative(Map|Reduce)Tasks is 
the number of attempts running that are not on the critical path of the job 
completing.  This makes sense in the context of computing pending(Map|Reduce)s, 
which is the only place the variable is used.

I removed the decrement of speculative(Map|Reduce)Tasks on task completion 
because, by definition, an attempt that completes a task is on the critical 
path.  Any remaining running attempts will be failed and counted as 
speculative.  Similarly, the decrement is added where an attempt has failed, 
because an attempt that fails is only on the critical path if the task hasn't 
completed and there are no other speculative attempts.

Does that make sense?
                
> mapred metrics shows negative count of waiting maps and reduces
> ---------------------------------------------------------------
>
>                 Key: MAPREDUCE-4366
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4366
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobtracker
>    Affects Versions: 1.0.2
>            Reporter: Thomas Graves
>            Assignee: Sandy Ryza
>         Attachments: MAPREDUCE-4366-branch-1.patch
>
>
> Negative waiting_maps and waiting_reduces count is observed in the mapred 
> metrics.  MAPREDUCE-1238 partially fixed this but it appears there is still 
> issues as we are seeing it, but not as bad.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to