[ 
https://issues.apache.org/jira/browse/AMBARI-24380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sekhon updated AMBARI-24380:
---------------------------------
    Description: 
Ambari rolling-restart of HBase RegionServers failed to detect that 
RegionServers were not coming back online, continued to take down the rest of 
the RegionServers in the cluster.

Adding the following JVM tuning setting to HBASE_OPTS in hbase-env.sh template 
near the start of the options:
{code:java}
-XX:G1NewSizePercent=3{code}
before the following option (which was set a couple options further along, it 
needs to go after this option):
{code:java}
-XX:+UnlockExperimentalVMOptions{code}
This resulted in both HMaster and RegionServer startup failures, but Ambari did 
not detect that the RegionServers were not coming back online, and proceeded to 
take down the rest of the RegionServers.

Ambari should have checked that the first RegionServer restarted successfully 
and stayed up for the default 120 second rolling window via API checks on the 
RegionServer and that it is properly re-registered with active HMaster before 
moving on to the second RegionServer.

Also, Ambari should refuse to continue with any rolling restart if no HMasters 
are online, see 

  was:
Ambari rolling-restart of HBase RegionServers failed to detect that 
RegionServers were not coming back online, continued to take down the rest of 
the RegionServers in the cluster.

Adding the following JVM tuning setting to HBASE_OPTS in hbase-env.sh template 
near the start of the options:
{code:java}
-XX:G1NewSizePercent=3{code}
before the following option (which was set a couple options further along, it 
needs to go after this option):
{code:java}
-XX:+UnlockExperimentalVMOptions{code}
This resulted in both HMaster and RegionServer startup failures, but Ambari did 
not detect that the RegionServers were not coming back online, and proceeded to 
take down the rest of the RegionServers.

Ambari should have checked that the first RegionServer restarted successfully 
and stayed up for the default 120 second rolling window before moving on to the 
second RegionServer.


> Ambari HBase Rolling Restart failed to check RegionServers restarted 
> successfully, continued to take down rest of RegionServers!
> --------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: AMBARI-24380
>                 URL: https://issues.apache.org/jira/browse/AMBARI-24380
>             Project: Ambari
>          Issue Type: Bug
>          Components: ambari-server
>    Affects Versions: 2.5.2
>            Reporter: Hari Sekhon
>            Priority: Critical
>
> Ambari rolling-restart of HBase RegionServers failed to detect that 
> RegionServers were not coming back online, continued to take down the rest of 
> the RegionServers in the cluster.
> Adding the following JVM tuning setting to HBASE_OPTS in hbase-env.sh 
> template near the start of the options:
> {code:java}
> -XX:G1NewSizePercent=3{code}
> before the following option (which was set a couple options further along, it 
> needs to go after this option):
> {code:java}
> -XX:+UnlockExperimentalVMOptions{code}
> This resulted in both HMaster and RegionServer startup failures, but Ambari 
> did not detect that the RegionServers were not coming back online, and 
> proceeded to take down the rest of the RegionServers.
> Ambari should have checked that the first RegionServer restarted successfully 
> and stayed up for the default 120 second rolling window via API checks on the 
> RegionServer and that it is properly re-registered with active HMaster before 
> moving on to the second RegionServer.
> Also, Ambari should refuse to continue with any rolling restart if no 
> HMasters are online, see 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to