[ https://issues.apache.org/jira/browse/MAPREDUCE-964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12759369#action_12759369 ]
Hemanth Yamijala commented on MAPREDUCE-964: -------------------------------------------- I looked at this patch mainly from the point of view of verifying two invariants: - Whenever we set a startTime in a TaskStatus, we need to set the finishTime as well. - A finishTime must be set only if the startTime is > 0. AFAIK, I think the first invariant is ensured from this patch. For the next, I tried tracing different code paths, and could see other places where the invariant was broken, though the cases identified in the bug seem to have been addressed. Rather than adding a fix in every place (which does not guarantee that the patch will continue to hold in the face of future changes), I think it is sensible to add a check in the setFinishTime and statusUpdate of TaskStatus itself, and ensure the invariant holds at the root. Let's log an INFO message when we see the invariant is not met to help us debug. Is there a good way to get the stack trace as well, which will be more useful ? Maybe Thread.currentThread().getStackTrace() will do the trick ? Also, if we make this change, it would be simple to write a fast unit test that ensures the invariant is being satisfied. Let's run MR-reliability tests on this couple of more times for sanity testing it's as good as the original fix. > Inaccurate values in jobSummary logs > ------------------------------------ > > Key: MAPREDUCE-964 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-964 > Project: Hadoop Map/Reduce > Issue Type: Bug > Affects Versions: 0.20.1 > Reporter: Rajiv Chittajallu > Assignee: Sreekanth Ramakrishnan > Priority: Critical > Attachments: mapreduce-964-1.patch > > > For some jobs the mapSlotSeconds is incorrect. > negative value > 09/09/01 18:31:44 INFOmapred.JobInProgress$JobSummary: > jobId=job_200908270718_4568,submitTime=1251823543976,launchTime=1251823554310,finishTime=1251829904565, > > numMaps=7965,numSlotsPerMap=1,numReduces=40,numSlotsPerReduce=1,user=wile,queue=runner,status=SUCCEEDED, > > mapSlotSeconds=-2503133523,reduceSlotsSeconds=186536,clusterMapCapacity=11262,clusterReduceCapacity=3754 > or too high > 09/09/02 23:59:57 INFO mapred.JobInProgress$JobSummary: > jobId=job_200908270718_5861,submitTime=1251935672924,launchTime=1251935687698,finishTime=1251935997949, > > numMaps=1026,numSlotsPerMap=1,numReduces=10,numSlotsPerReduce=1,user=dfsload,queue=gridops,status=SUCCEEDED, > > mapSlotSeconds=1251949742,reduceSlotsSeconds=537,clusterMapCapacity=11262,clusterReduceCapacity=3754 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.