[ https://issues.apache.org/jira/browse/MAPREDUCE-5465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13963278#comment-13963278 ]
Jason Lowe commented on MAPREDUCE-5465: --------------------------------------- Thanks for picking this up, Ming. Some comments on the patch: - The new interface for TaskAttemptListener seems more complicated than necessary. IIUC it's not valid to call registerFinishingTask without having called unregister essentially at the same time, so wondering if we should just add a single method, something like setTaskFinishing(TaskAttemptId, WrappedJvmID) to denote when the task JVM has finished and we're simply waiting for the container to complete. A typical lifecycle could pass a null JVM to the unregister method if the JVM already finished, and we can still skip the finishing method and go straight from registerLauchedTask -> unregister if the task container exits unexpectedly. Either that or it seems the TaskAttemptExitHandler should not be part of TaskAttemptListener and instead conveyed separately to the TaskAttemptImpl (possibly via AppContext along with other things), since it seems tacked on and not related to the other workings of TaskAttemptListener. - It looks like we're only handling the successful task case. We don't want to proactively kill tasks that have reported failure just like we don't want to proactively kill tasks that have reported success. - Why are we transitioning from FINISHING_CONTAINER to SUCCESS_CONTAINER_CLEANUP rather than to SUCCEEDED when we receive a container completed event? The SUCCESS_CONTAINER_CLEANUP state is for waiting for a container completed event to arrive, but we are leaving the FINISHING_CONTAINER state due to the arrival of that very event. - The new properties should be added to mapred-default.xml for documentation purposes. - Suggestion: TaskAttemptFinishingMonitor may be a more accurate name than TaskAttemptExitHandler - Nit: The default values for the new properties should be named in MRJobConfig - Nit: comments should be formatted for 80 columns - Nit: In the comment for the new FINISHING_CONTAINER state: "That will a chance" should be "That will give a chance" - Nit: testFinshingAttemptTimedout -> testFinishingAttemptTimeout - Nit: testTaskAttemptDiognosticEventOnFinishing -> testTaskAttemptDiagnosticEventOnFinishing > Container killed before hprof dumps profile.out > ----------------------------------------------- > > Key: MAPREDUCE-5465 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5465 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mr-am, mrv2 > Affects Versions: trunk, 2.0.3-alpha > Reporter: Radim Kolar > Assignee: Ming Ma > Attachments: MAPREDUCE-5465-2.patch, MAPREDUCE-5465-3.patch, > MAPREDUCE-5465.patch > > > If there is profiling enabled for mapper or reducer then hprof dumps > profile.out at process exit. It is dumped after task signaled to AM that work > is finished. > AM kills container with finished work without waiting for hprof to finish > dumps. If hprof is dumping larger outputs (such as with depth=4 while depth=3 > works) , it could not finish dump in time before being killed making entire > dump unusable because cpu and heap stats are missing. > There needs to be better delay before container is killed if profiling is > enabled. -- This message was sent by Atlassian JIRA (v6.2#6252)