[ 
https://issues.apache.org/jira/browse/MAPREDUCE-964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12759369#action_12759369
 ] 

Hemanth Yamijala commented on MAPREDUCE-964:
--------------------------------------------

I looked at this patch mainly from the point of view of verifying two 
invariants:

- Whenever we set a startTime in a TaskStatus, we need to set the finishTime as 
well.
- A finishTime must be set only if the startTime is > 0.

AFAIK, I think the first invariant is ensured from this patch. For the next, I 
tried tracing different code paths, and could see other places where the 
invariant was broken, though the cases identified in the bug seem to have been 
addressed. Rather than adding a fix in every place (which does not guarantee 
that the patch will continue to hold in the face of future changes), I think it 
is sensible to add a check in the setFinishTime and statusUpdate of TaskStatus 
itself, and ensure the invariant holds at the root. Let's log an INFO message 
when we see the invariant is not met to help us debug. Is there a good way to 
get the stack trace as well, which will be more useful ? Maybe 
Thread.currentThread().getStackTrace() will do the trick ?

Also, if we make this change, it would be simple to write a fast unit test that 
ensures the invariant is being satisfied.

Let's run MR-reliability tests on this couple of more times for sanity testing 
it's as good as the original fix.

> Inaccurate values in jobSummary logs
> ------------------------------------
>
>                 Key: MAPREDUCE-964
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-964
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.20.1
>            Reporter: Rajiv Chittajallu
>            Assignee: Sreekanth Ramakrishnan
>            Priority: Critical
>         Attachments: mapreduce-964-1.patch
>
>
> For some jobs the mapSlotSeconds is incorrect.
> negative value
> 09/09/01 18:31:44 INFOmapred.JobInProgress$JobSummary: 
> jobId=job_200908270718_4568,submitTime=1251823543976,launchTime=1251823554310,finishTime=1251829904565,
>             
> numMaps=7965,numSlotsPerMap=1,numReduces=40,numSlotsPerReduce=1,user=wile,queue=runner,status=SUCCEEDED,
>          
> mapSlotSeconds=-2503133523,reduceSlotsSeconds=186536,clusterMapCapacity=11262,clusterReduceCapacity=3754
> or too high
> 09/09/02 23:59:57 INFO mapred.JobInProgress$JobSummary: 
> jobId=job_200908270718_5861,submitTime=1251935672924,launchTime=1251935687698,finishTime=1251935997949,
>             
> numMaps=1026,numSlotsPerMap=1,numReduces=10,numSlotsPerReduce=1,user=dfsload,queue=gridops,status=SUCCEEDED,
>          
> mapSlotSeconds=1251949742,reduceSlotsSeconds=537,clusterMapCapacity=11262,clusterReduceCapacity=3754

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to