[ https://issues.apache.org/jira/browse/AMBARI-24380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hari Sekhon updated AMBARI-24380: --------------------------------- Description: Ambari rolling-restart of HBase RegionServers failed to detect that RegionServers were not coming back online, continued to take down the rest of the RegionServers in the cluster. Adding the following JVM tuning setting to HBASE_OPTS in hbase-env.sh template near the start of the options: {code:java} -XX:G1NewSizePercent=3{code} before the following option (which was set a couple options further along, it needs to go after this option): {code:java} -XX:+UnlockExperimentalVMOptions{code} This resulted in both HMaster and RegionServer startup failures, but Ambari did not detect that the RegionServers were not coming back online, and proceeded to take down the rest of the RegionServers. Ambari should have checked that the first RegionServer restarted successfully and stayed up for the default 120 second rolling window via API checks on the RegionServer and that it is properly re-registered with active HMaster before moving on to the second RegionServer. Also, Ambari should refuse to continue with any rolling restart if no HMasters are online, see linked ticket AMBARI-24699. was: Ambari rolling-restart of HBase RegionServers failed to detect that RegionServers were not coming back online, continued to take down the rest of the RegionServers in the cluster. Adding the following JVM tuning setting to HBASE_OPTS in hbase-env.sh template near the start of the options: {code:java} -XX:G1NewSizePercent=3{code} before the following option (which was set a couple options further along, it needs to go after this option): {code:java} -XX:+UnlockExperimentalVMOptions{code} This resulted in both HMaster and RegionServer startup failures, but Ambari did not detect that the RegionServers were not coming back online, and proceeded to take down the rest of the RegionServers. Ambari should have checked that the first RegionServer restarted successfully and stayed up for the default 120 second rolling window via API checks on the RegionServer and that it is properly re-registered with active HMaster before moving on to the second RegionServer. Also, Ambari should refuse to continue with any rolling restart if no HMasters are online, see > Ambari HBase Rolling Restart failed to check RegionServers restarted > successfully, continued to take down rest of RegionServers! > -------------------------------------------------------------------------------------------------------------------------------- > > Key: AMBARI-24380 > URL: https://issues.apache.org/jira/browse/AMBARI-24380 > Project: Ambari > Issue Type: Bug > Components: ambari-server > Affects Versions: 2.5.2 > Reporter: Hari Sekhon > Priority: Critical > > Ambari rolling-restart of HBase RegionServers failed to detect that > RegionServers were not coming back online, continued to take down the rest of > the RegionServers in the cluster. > Adding the following JVM tuning setting to HBASE_OPTS in hbase-env.sh > template near the start of the options: > {code:java} > -XX:G1NewSizePercent=3{code} > before the following option (which was set a couple options further along, it > needs to go after this option): > {code:java} > -XX:+UnlockExperimentalVMOptions{code} > This resulted in both HMaster and RegionServer startup failures, but Ambari > did not detect that the RegionServers were not coming back online, and > proceeded to take down the rest of the RegionServers. > Ambari should have checked that the first RegionServer restarted successfully > and stayed up for the default 120 second rolling window via API checks on the > RegionServer and that it is properly re-registered with active HMaster before > moving on to the second RegionServer. > Also, Ambari should refuse to continue with any rolling restart if no > HMasters are online, see linked ticket AMBARI-24699. -- This message was sent by Atlassian JIRA (v7.6.3#76005)