[jira] [Commented] (MAPREDUCE-5465) Container killed before hprof dumps profile.out

Jason Lowe (JIRA) Tue, 08 Apr 2014 11:29:42 -0700

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13963278#comment-13963278
 ]


Jason Lowe commented on MAPREDUCE-5465:
---------------------------------------

Thanks for picking this up, Ming.  Some comments on the patch:

- The new interface for TaskAttemptListener seems more complicated than 
necessary.  IIUC it's not valid to call registerFinishingTask without having 
called unregister essentially at the same time, so wondering if we should just 
add a single method, something like setTaskFinishing(TaskAttemptId, 
WrappedJvmID) to denote when the task JVM has finished and we're simply waiting 
for the container to complete.  A typical lifecycle could pass a null JVM to 
the unregister method if the JVM already finished, and we can still skip the 
finishing method and go straight from registerLauchedTask -> unregister if the 
task container exits unexpectedly.  Either that or it seems the 
TaskAttemptExitHandler should not be part of TaskAttemptListener and instead 
conveyed separately to the TaskAttemptImpl (possibly via AppContext along with 
other things), since it seems tacked on and not related to the other workings 
of TaskAttemptListener.
- It looks like we're only handling the successful task case.  We don't want to 
proactively kill tasks that have reported failure just like we don't want to 
proactively kill tasks that have reported success.
- Why are we transitioning from FINISHING_CONTAINER to 
SUCCESS_CONTAINER_CLEANUP rather than to SUCCEEDED when we receive a container 
completed event?  The SUCCESS_CONTAINER_CLEANUP state is for waiting for a 
container completed event to arrive, but we are leaving the FINISHING_CONTAINER 
state due to the arrival of that very event.
- The new properties should be added to mapred-default.xml for documentation 
purposes.
- Suggestion: TaskAttemptFinishingMonitor may be a more accurate name than 
TaskAttemptExitHandler
- Nit: The default values for the new properties should be named in MRJobConfig
- Nit: comments should be formatted for 80 columns
- Nit: In the comment for the new FINISHING_CONTAINER state: "That will a 
chance" should be "That will give a chance"
- Nit: testFinshingAttemptTimedout -> testFinishingAttemptTimeout
- Nit: testTaskAttemptDiognosticEventOnFinishing -> 
testTaskAttemptDiagnosticEventOnFinishing


> Container killed before hprof dumps profile.out
> -----------------------------------------------
>
>                 Key: MAPREDUCE-5465
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5465
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mr-am, mrv2
>    Affects Versions: trunk, 2.0.3-alpha
>            Reporter: Radim Kolar
>            Assignee: Ming Ma
>         Attachments: MAPREDUCE-5465-2.patch, MAPREDUCE-5465-3.patch, 
> MAPREDUCE-5465.patch
>
>
> If there is profiling enabled for mapper or reducer then hprof dumps 
> profile.out at process exit. It is dumped after task signaled to AM that work 
> is finished.
> AM kills container with finished work without waiting for hprof to finish 
> dumps. If hprof is dumping larger outputs (such as with depth=4 while depth=3 
> works) , it could not finish dump in time before being killed making entire 
> dump unusable because cpu and heap stats are missing.
> There needs to be better delay before container is killed if profiling is 
> enabled.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MAPREDUCE-5465) Container killed before hprof dumps profile.out

Reply via email to