[ 
https://issues.apache.org/jira/browse/TEZ-3128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15159263#comment-15159263
 ] 

Hitesh Shah edited comment on TEZ-3128 at 2/23/16 5:49 PM:
-----------------------------------------------------------

We do need to release/stop them before shutdown as there is no guarantee on 
when the AM will be killed ( think the default is less than a few seconds ) 
after unregistering if the AM still has pending work ( flushing events, etc). 
We will lose out on history data if we go with that approach. 

My point was whether we can get away with releasing running containers to YARN 
instead of calling stop on each of them via the NM proxy. If we cannot release 
them, then we need to reduce the timeout and use a new NM client proxy with the 
modified timeouts to stop the containers. 

  


was (Author: hitesh):
We do need to release/stop them before shutdown as there is no guarantee on 
when the AM will be killed after unregistering if the AM still has pending work 
( flushing events, etc).

My point was whether we can get away with releasing running containers to YARN 
instead of calling stop on each of them via the NM proxy. If we cannot release 
them, then we need to reduce the timeout and use a new NM client proxy with the 
modified timeouts to stop the containers. 

  

> Avoid stopping containers on the AM shutdown thread
> ---------------------------------------------------
>
>                 Key: TEZ-3128
>                 URL: https://issues.apache.org/jira/browse/TEZ-3128
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.8.0-alpha
>            Reporter: Siddharth Seth
>            Assignee: Tsuyoshi Ozawa
>              Labels: newbie
>         Attachments: TEZ-3128.001.patch, amJstack
>
>
> During an AM shutdown, the TaskCommunicator is also shutdown and it tries to 
> stop containers in the shutdown thread itself. This can cause the AM shutdown 
> to block if NMs are not available.
> This likely affects 0.7 as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to