[jira] [Updated] (APEXCORE-426) Support work preserving AM recovery

Sandesh (JIRA) Thu, 05 Jan 2017 16:48:22 -0800

     [ 
https://issues.apache.org/jira/browse/APEXCORE-426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Sandesh updated APEXCORE-426:
-----------------------------
    Description: 
On app master failure, the streaming containers should continue running. 

As of 2.2, YARN will automatically terminate all containers and the replacement 
app master will relaunch them. Once we move to a newer minimum Hadoop version, 
we should leverage work preserving restart.

The mechanism in Apex containers to locate the new master process are already 
in place.
 
Test Cases:
1. Kill the app-master - only app-master container id should change, all the 
other containers id should remain same.
2. Kill the app-master and few other containers, make sure that kill containers 
are recovered.



  was:
On app master failure, the streaming containers should continue running. 

As of 2.2, YARN will automatically terminate all containers and the replacement 
app master will relaunch them. Once we move to a newer minimum Hadoop version, 
we should leverage work preserving restart.

The mechanism in Apex containers to locate the new master process are already 
in place.
 
Test Cases:
1. Kill the app-master - only app-master container id should change, all the 
other containers id should remain same.
2. Kill the app-master and other few other containers, make sure that kill 
containers are recovered.




> Support work preserving AM recovery
> -----------------------------------
>
>                 Key: APEXCORE-426
>                 URL: https://issues.apache.org/jira/browse/APEXCORE-426
>             Project: Apache Apex Core
>          Issue Type: Improvement
>            Reporter: Thomas Weise
>            Assignee: Sandesh
>              Labels: apex-hadoop-version
>
> On app master failure, the streaming containers should continue running. 
> As of 2.2, YARN will automatically terminate all containers and the 
> replacement app master will relaunch them. Once we move to a newer minimum 
> Hadoop version, we should leverage work preserving restart.
> The mechanism in Apex containers to locate the new master process are already 
> in place.
>  
> Test Cases:
> 1. Kill the app-master - only app-master container id should change, all the 
> other containers id should remain same.
> 2. Kill the app-master and few other containers, make sure that kill 
> containers are recovered.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (APEXCORE-426) Support work preserving AM recovery

Reply via email to