[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13659714#comment-13659714
 ] 

Sandy Ryza commented on MAPREDUCE-4366:
---------------------------------------

Thanks delving into this with me Arun.  First, please excuse in advance any 
errors I'm about to make here.  Trying to be careful, but the counting code is 
subtle and has been hard to think about.

bq. An option is to just call decWaiting(Maps|Reduces) in JIP.garbageCollect 
with JIP.num(Maps|Reduces)... currently if you follow the opposite side i.e 
addWaiting(Maps|Reduces), they are just static and are done at JIP.initTasks 
with num(Maps|Reduces). That would solve the immediate problem at hand?

Waiting maps and reduces are updated in the job tracker metrics every time that 
a task is launched is fails/completes, so this would not work unless I am 
missing something.

bq. The definition of speculative(Map|Reduce)Tasks, at least in my head, has 
been the number of task-attempts have an alternate...

This definition can lead to thinking there are fewer pending tasks than there 
actually are.  Consider the following situation:
My job has two maps.  Attempts are run for both of them.  One map gets a 
speculative attempt because it's running slow.  The other map's attempt fails.  
The speculative one completes.  initialMaps=2 + speculativeMaps=0 - 
runningMaps=1 - finishedMaps=1 - failedMaps=0.  So pendingMaps is now 0 even 
though we have a pending map task.  The way this has not caused jobs to starve 
is that the running speculative map will fail later on and bring pendingMaps 
back up to 1.

Wanted to make sure it was clear that the current behavior is wrong in an 
objective way.  If your stance is still that the code has been working so far 
and messing with it is just a bad idea, I trust your experience.  In that case, 
we could keep speculativeMapTasks how it is and have a separate variable, 
nonCriticalRunningTasks, that is used for updating the metrics?
                
> mapred metrics shows negative count of waiting maps and reduces
> ---------------------------------------------------------------
>
>                 Key: MAPREDUCE-4366
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4366
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobtracker
>    Affects Versions: 1.0.2
>            Reporter: Thomas Graves
>            Assignee: Sandy Ryza
>         Attachments: MAPREDUCE-4366-branch-1-1.patch, 
> MAPREDUCE-4366-branch-1.patch
>
>
> Negative waiting_maps and waiting_reduces count is observed in the mapred 
> metrics.  MAPREDUCE-1238 partially fixed this but it appears there is still 
> issues as we are seeing it, but not as bad.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to