[ 
https://issues.apache.org/jira/browse/AMBARI-16914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shankar Venkataraman updated AMBARI-16914:
------------------------------------------
    Description: 
Ambari seems to issue a formal shutdown to a Region server but quickly (30 
seconds)  follows it up with SIGKILL. On a full loaded HBase system with about 
200 regions per region server and active transaction flow, there is no way a RS 
can stop in 30 seconds. This has caused many issues in production including a 
memstore corruption. Why not use the shutdown script that comes up HBase?

2016-05-24 15:36:19,191 - 
Execute['/usr/hdp/current/hbase-regionserver/bin/hbase-daemon.sh --config 
/usr/hdp/current/hbase-regionserver/conf stop regionserver'] {'only_if': 
'ambari-sudo.sh  -H -E test -f /var/run/hbase/hbase-hbase-regionserver.pid && 
ps -p `ambari-sudo.sh  -H -E cat /var/run/hbase/hbase-hbase-regionserver.pid` 
>/dev/null 2>&1', 'on_timeout': '! ( ambari-sudo.sh  -H -E test -f 
/var/run/hbase/hbase-hbase-regionserver.pid && ps -p `ambari-sudo.sh  -H -E cat 
/var/run/hbase/hbase-hbase-regionserver.pid` >/dev/null 2>&1 ) || 
ambari-sudo.sh -H -E kill -9 `ambari-sudo.sh  -H -E cat 
/var/run/hbase/hbase-hbase-regionserver.pid`', 'timeout': 30, 'user': 'hbase'}
2016-05-24 15:36:50,982 - Executing '! ( ambari-sudo.sh  -H -E test -f 
/var/run/hbase/hbase-hbase-regionserver.pid && ps -p `ambari-sudo.sh  -H -E cat 
/var/run/hbase/hbase-hbase-regionserver.pid` >/dev/null 2>&1 ) || 
ambari-sudo.sh -H -E kill -9 `ambari-sudo.sh  -H -E cat 
/var/run/hbase/hbase-hbase-regionserver.pid`'. Reason: Execution of 
'ambari-sudo.sh su hbase -l -s /bin/bash -c 'export  
PATH='"'"'/usr/sbin:/sbin:/usr/lib/ambari-server/*:/sbin:/usr/sbin:/bin:/usr/bin:/var/lib/ambari-agent'"'"'
 ; /usr/hdp/current/hbase-regionserver/bin/hbase-daemon.sh --config 
/usr/hdp/current/hbase-regionserver/conf stop regionserver'' was killed due 
timeout after 30 seconds
2016-05-24 15:36:51,053 - File['/var/run/hbase/hbase-hbase-regionserver.pid'] 
{'action': ['delete']}
2016-05-24 15:36:51,054 - Deleting 
File['/var/run/hbase/hbase-hbase-regionserver.pid'


> Ambari uses too small a window for region server shutdown
> ---------------------------------------------------------
>
>                 Key: AMBARI-16914
>                 URL: https://issues.apache.org/jira/browse/AMBARI-16914
>             Project: Ambari
>          Issue Type: Bug
>          Components: ambari-web
>    Affects Versions: 2.2.1
>            Reporter: Shankar Venkataraman
>
> Ambari seems to issue a formal shutdown to a Region server but quickly (30 
> seconds)  follows it up with SIGKILL. On a full loaded HBase system with 
> about 200 regions per region server and active transaction flow, there is no 
> way a RS can stop in 30 seconds. This has caused many issues in production 
> including a memstore corruption. Why not use the shutdown script that comes 
> up HBase?
> 2016-05-24 15:36:19,191 - 
> Execute['/usr/hdp/current/hbase-regionserver/bin/hbase-daemon.sh --config 
> /usr/hdp/current/hbase-regionserver/conf stop regionserver'] {'only_if': 
> 'ambari-sudo.sh  -H -E test -f /var/run/hbase/hbase-hbase-regionserver.pid && 
> ps -p `ambari-sudo.sh  -H -E cat /var/run/hbase/hbase-hbase-regionserver.pid` 
> >/dev/null 2>&1', 'on_timeout': '! ( ambari-sudo.sh  -H -E test -f 
> /var/run/hbase/hbase-hbase-regionserver.pid && ps -p `ambari-sudo.sh  -H -E 
> cat /var/run/hbase/hbase-hbase-regionserver.pid` >/dev/null 2>&1 ) || 
> ambari-sudo.sh -H -E kill -9 `ambari-sudo.sh  -H -E cat 
> /var/run/hbase/hbase-hbase-regionserver.pid`', 'timeout': 30, 'user': 'hbase'}
> 2016-05-24 15:36:50,982 - Executing '! ( ambari-sudo.sh  -H -E test -f 
> /var/run/hbase/hbase-hbase-regionserver.pid && ps -p `ambari-sudo.sh  -H -E 
> cat /var/run/hbase/hbase-hbase-regionserver.pid` >/dev/null 2>&1 ) || 
> ambari-sudo.sh -H -E kill -9 `ambari-sudo.sh  -H -E cat 
> /var/run/hbase/hbase-hbase-regionserver.pid`'. Reason: Execution of 
> 'ambari-sudo.sh su hbase -l -s /bin/bash -c 'export  
> PATH='"'"'/usr/sbin:/sbin:/usr/lib/ambari-server/*:/sbin:/usr/sbin:/bin:/usr/bin:/var/lib/ambari-agent'"'"'
>  ; /usr/hdp/current/hbase-regionserver/bin/hbase-daemon.sh --config 
> /usr/hdp/current/hbase-regionserver/conf stop regionserver'' was killed due 
> timeout after 30 seconds
> 2016-05-24 15:36:51,053 - File['/var/run/hbase/hbase-hbase-regionserver.pid'] 
> {'action': ['delete']}
> 2016-05-24 15:36:51,054 - Deleting 
> File['/var/run/hbase/hbase-hbase-regionserver.pid'



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to