[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13964753#comment-13964753
 ] 

Ming Ma commented on MAPREDUCE-5465:
------------------------------------

Thanks Jason for the review. I will upload the updated patch soon. Want to 
comment on the couple points you mentioned.

1. Yes, putting finishTaskMonitor under TaskAttemptListenerImpl isn't clean, 
given TaskAttemptListenerImpl should only deal with TaskUmbilicalProtocol 
related. I will move it out to AppContext layer.
2.  Handling of TA_FAILMSG event.  TA_FAILMSG can be triggered by task JVM as 
well as user via "hadoop job -fail-task command". For the case where task JVM 
reports failure, yes, it can wait for the container to exit. For the case where 
end users send the command, it will need to clean up the container right away. 
I skipped that for simplicity. If we want to support that, it seems we will 
need a new event like TA_FAILMSG_BY_USER.
3. Why are we transitioning from FINISHING_CONTAINER to 
SUCCESS_CONTAINER_CLEANUP rather than to SUCCEEDED when we receive a container 
completed event? It was done for simplicity so that all successful states will 
go to SUCCESS_CONTAINER_CLEANUP first. But I agree it can go directly to 
SUCCEEDED when we receive a container completed event.

  

> Container killed before hprof dumps profile.out
> -----------------------------------------------
>
>                 Key: MAPREDUCE-5465
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5465
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mr-am, mrv2
>    Affects Versions: trunk, 2.0.3-alpha
>            Reporter: Radim Kolar
>            Assignee: Ming Ma
>         Attachments: MAPREDUCE-5465-2.patch, MAPREDUCE-5465-3.patch, 
> MAPREDUCE-5465.patch
>
>
> If there is profiling enabled for mapper or reducer then hprof dumps 
> profile.out at process exit. It is dumped after task signaled to AM that work 
> is finished.
> AM kills container with finished work without waiting for hprof to finish 
> dumps. If hprof is dumping larger outputs (such as with depth=4 while depth=3 
> works) , it could not finish dump in time before being killed making entire 
> dump unusable because cpu and heap stats are missing.
> There needs to be better delay before container is killed if profiling is 
> enabled.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to