[ https://issues.apache.org/jira/browse/APEXCORE-426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sandesh updated APEXCORE-426: ----------------------------- Description: On app master failure, the streaming containers should continue running. As of 2.2, YARN will automatically terminate all containers and the replacement app master will relaunch them. Once we move to a newer minimum Hadoop version, we should leverage work preserving restart. The mechanism in Apex containers to locate the new master process are already in place. Test Cases: 1. Kill the app-master - only app-master container id should change, all the other containers id should remain same. 2. Kill the app-master and few other containers, make sure that kill containers are recovered. was: On app master failure, the streaming containers should continue running. As of 2.2, YARN will automatically terminate all containers and the replacement app master will relaunch them. Once we move to a newer minimum Hadoop version, we should leverage work preserving restart. The mechanism in Apex containers to locate the new master process are already in place. Test Cases: 1. Kill the app-master - only app-master container id should change, all the other containers id should remain same. 2. Kill the app-master and other few other containers, make sure that kill containers are recovered. > Support work preserving AM recovery > ----------------------------------- > > Key: APEXCORE-426 > URL: https://issues.apache.org/jira/browse/APEXCORE-426 > Project: Apache Apex Core > Issue Type: Improvement > Reporter: Thomas Weise > Assignee: Sandesh > Labels: apex-hadoop-version > > On app master failure, the streaming containers should continue running. > As of 2.2, YARN will automatically terminate all containers and the > replacement app master will relaunch them. Once we move to a newer minimum > Hadoop version, we should leverage work preserving restart. > The mechanism in Apex containers to locate the new master process are already > in place. > > Test Cases: > 1. Kill the app-master - only app-master container id should change, all the > other containers id should remain same. > 2. Kill the app-master and few other containers, make sure that kill > containers are recovered. -- This message was sent by Atlassian JIRA (v6.3.4#6332)