[
https://issues.apache.org/jira/browse/MAPREDUCE-964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12759369#action_12759369
]
Hemanth Yamijala commented on MAPREDUCE-964:
--------------------------------------------
I looked at this patch mainly from the point of view of verifying two
invariants:
- Whenever we set a startTime in a TaskStatus, we need to set the finishTime as
well.
- A finishTime must be set only if the startTime is > 0.
AFAIK, I think the first invariant is ensured from this patch. For the next, I
tried tracing different code paths, and could see other places where the
invariant was broken, though the cases identified in the bug seem to have been
addressed. Rather than adding a fix in every place (which does not guarantee
that the patch will continue to hold in the face of future changes), I think it
is sensible to add a check in the setFinishTime and statusUpdate of TaskStatus
itself, and ensure the invariant holds at the root. Let's log an INFO message
when we see the invariant is not met to help us debug. Is there a good way to
get the stack trace as well, which will be more useful ? Maybe
Thread.currentThread().getStackTrace() will do the trick ?
Also, if we make this change, it would be simple to write a fast unit test that
ensures the invariant is being satisfied.
Let's run MR-reliability tests on this couple of more times for sanity testing
it's as good as the original fix.
> Inaccurate values in jobSummary logs
> ------------------------------------
>
> Key: MAPREDUCE-964
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-964
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Affects Versions: 0.20.1
> Reporter: Rajiv Chittajallu
> Assignee: Sreekanth Ramakrishnan
> Priority: Critical
> Attachments: mapreduce-964-1.patch
>
>
> For some jobs the mapSlotSeconds is incorrect.
> negative value
> 09/09/01 18:31:44 INFOmapred.JobInProgress$JobSummary:
> jobId=job_200908270718_4568,submitTime=1251823543976,launchTime=1251823554310,finishTime=1251829904565,
>
> numMaps=7965,numSlotsPerMap=1,numReduces=40,numSlotsPerReduce=1,user=wile,queue=runner,status=SUCCEEDED,
>
> mapSlotSeconds=-2503133523,reduceSlotsSeconds=186536,clusterMapCapacity=11262,clusterReduceCapacity=3754
> or too high
> 09/09/02 23:59:57 INFO mapred.JobInProgress$JobSummary:
> jobId=job_200908270718_5861,submitTime=1251935672924,launchTime=1251935687698,finishTime=1251935997949,
>
> numMaps=1026,numSlotsPerMap=1,numReduces=10,numSlotsPerReduce=1,user=dfsload,queue=gridops,status=SUCCEEDED,
>
> mapSlotSeconds=1251949742,reduceSlotsSeconds=537,clusterMapCapacity=11262,clusterReduceCapacity=3754
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.