[ https://issues.apache.org/jira/browse/HBASE-29400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Duo Zhang resolved HBASE-29400. ------------------------------- Fix Version/s: 2.7.0 3.0.0-beta-2 2.5.12 2.6.4 Hadoop Flags: Reviewed Release Note: Assignee: Duo Zhang Resolution: Fixed Pushed to all active branches. Thanks [~lupeng] for reviewing! > RollingBatchRestartRsAction may fail to start region server > ----------------------------------------------------------- > > Key: HBASE-29400 > URL: https://issues.apache.org/jira/browse/HBASE-29400 > Project: HBase > Issue Type: Improvement > Components: integration tests > Reporter: Duo Zhang > Assignee: Duo Zhang > Priority: Major > Labels: pull-request-available > Fix For: 2.7.0, 3.0.0-beta-2, 2.5.12, 2.6.4 > > > {noformat} > 2025-06-17T02:56:52,093 INFO [ChaosMonkey-2 {}] > actions.RollingBatchRestartRsAction: Killing regionserver > data04,16020,1750098538006 > 2025-06-17T02:56:52,093 INFO [ChaosMonkey-2 {}] > hbase.DistributedHBaseCluster: Aborting RS: data04,16020,1750098538006 > 2025-06-17T02:56:52,093 INFO [ChaosMonkey-2 {}] hbase.HBaseClusterManager: > Executing remote command: ps ux | grep proc_regionserver | grep -v grep | tr > -s ' ' | cut -d ' ' -f2 | xargs kill -s SIGKILL, hostname:data04 > 2025-06-17T02:56:52,093 INFO [ChaosMonkey-2 {}] util.Shell: Executing full > command [timeout 30 /usr/bin/ssh -i /home/zhangduo/.ssh/id_rsa_cluster -o > ConnectTimeout=10 data04 "source /etc/profile; > HBASE_CONF_DIR=/data/conf/hbase/conf setsid ps ux | grep proc_regionserver | > grep -v grep | tr -s ' ' | cut -d ' ' -f2 | xargs kill -s SIGKILL"] > 2025-06-17T02:56:52,544 INFO [ChaosMonkey-2 {}] hbase.HBaseClusterManager: > Executed remote command, exit code:0 , output: > 2025-06-17T02:56:52,544 INFO [ChaosMonkey-2 {}] > hbase.DistributedHBaseCluster: Waiting for service: regionserver to stop: > data04,16020,1750098538006 > 2025-06-17T02:56:52,544 INFO [ChaosMonkey-2 {}] hbase.HBaseClusterManager: > Executing remote command: ps ux | grep proc_regionserver | grep -v grep | tr > -s ' ' | cut -d ' ' -f2, hostname:data04 > 2025-06-17T02:56:52,544 INFO [ChaosMonkey-2 {}] util.Shell: Executing full > command [timeout 30 /usr/bin/ssh -i /home/zhangduo/.ssh/id_rsa_cluster -o > ConnectTimeout=10 data04 "source /etc/profile; > HBASE_CONF_DIR=/data/conf/hbase/conf setsid ps ux | grep proc_regionserver | > grep -v grep | tr -s ' ' | cut -d ' ' -f2"] > 2025-06-17T02:56:52,803 INFO [ChaosMonkey-2 {}] hbase.HBaseClusterManager: > Executed remote command, exit code:0 , output: > 2025-06-17T02:56:52,809 INFO [ChaosMonkey-2 {}] > actions.RollingBatchRestartRsAction: Killed regionserver > data04,16020,1750098538006. Reported num of rs:5 > 2025-06-17T02:56:52,809 INFO [ChaosMonkey-2 {}] > actions.RollingBatchRestartRsAction: Sleeping for:354 > 2025-06-17T02:56:53,163 INFO [ChaosMonkey-2 {}] > actions.RollingBatchRestartRsAction: Starting regionserver data04:16020 > 2025-06-17T02:56:53,163 INFO [ChaosMonkey-2 {}] > hbase.DistributedHBaseCluster: Starting RS on: data04 > 2025-06-17T02:56:53,163 INFO [ChaosMonkey-2 {}] hbase.HBaseClusterManager: > Executing remote command: > /home/zhangduo/packages/hbase/hbase/bin/hbase-daemon.sh start regionserver, > hostname:data04 > 2025-06-17T02:56:53,163 INFO [ChaosMonkey-2 {}] util.Shell: Executing full > command [timeout 30 /usr/bin/ssh -i /home/zhangduo/.ssh/id_rsa_cluster -o > ConnectTimeout=10 data04 "source /etc/profile; > HBASE_CONF_DIR=/data/conf/hbase/conf setsid > /home/zhangduo/packages/hbase/hbase/bin/hbase-daemon.sh start regionserver"] > 2025-06-17T02:56:53,473 INFO [ChaosMonkey-2 {}] hbase.HBaseClusterManager: > Executed remote command, exit code:0 , output:regionserver running as process > 1948033. Stop it first. > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)