[ https://issues.apache.org/jira/browse/AMBARI-16914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andrew Onischuk updated AMBARI-16914: ------------------------------------- Status: Patch Available (was: Open) > Ambari uses too small a window for region server shutdown > --------------------------------------------------------- > > Key: AMBARI-16914 > URL: https://issues.apache.org/jira/browse/AMBARI-16914 > Project: Ambari > Issue Type: Bug > Components: ambari-web > Affects Versions: 2.2.1 > Reporter: Shankar Venkataraman > Attachments: AMBARI-16914.patch > > > Ambari seems to issue a formal shutdown to a Region server but quickly (30 > seconds) follows it up with SIGKILL. On a full loaded HBase system with > about 200 regions per region server and active transaction flow, there is no > way a RS can stop in 30 seconds. This has caused many issues in production > including a memstore corruption. Why not use the shutdown script that comes > with HBase? > 2016-05-24 15:36:19,191 - > Execute['/usr/hdp/current/hbase-regionserver/bin/hbase-daemon.sh --config > /usr/hdp/current/hbase-regionserver/conf stop regionserver'] {'only_if': > 'ambari-sudo.sh -H -E test -f /var/run/hbase/hbase-hbase-regionserver.pid && > ps -p `ambari-sudo.sh -H -E cat /var/run/hbase/hbase-hbase-regionserver.pid` > >/dev/null 2>&1', 'on_timeout': '! ( ambari-sudo.sh -H -E test -f > /var/run/hbase/hbase-hbase-regionserver.pid && ps -p `ambari-sudo.sh -H -E > cat /var/run/hbase/hbase-hbase-regionserver.pid` >/dev/null 2>&1 ) || > ambari-sudo.sh -H -E kill -9 `ambari-sudo.sh -H -E cat > /var/run/hbase/hbase-hbase-regionserver.pid`', 'timeout': 30, 'user': 'hbase'} > 2016-05-24 15:36:50,982 - Executing '! ( ambari-sudo.sh -H -E test -f > /var/run/hbase/hbase-hbase-regionserver.pid && ps -p `ambari-sudo.sh -H -E > cat /var/run/hbase/hbase-hbase-regionserver.pid` >/dev/null 2>&1 ) || > ambari-sudo.sh -H -E kill -9 `ambari-sudo.sh -H -E cat > /var/run/hbase/hbase-hbase-regionserver.pid`'. Reason: Execution of > 'ambari-sudo.sh su hbase -l -s /bin/bash -c 'export > PATH='"'"'/usr/sbin:/sbin:/usr/lib/ambari-server/*:/sbin:/usr/sbin:/bin:/usr/bin:/var/lib/ambari-agent'"'"' > ; /usr/hdp/current/hbase-regionserver/bin/hbase-daemon.sh --config > /usr/hdp/current/hbase-regionserver/conf stop regionserver'' was killed due > timeout after 30 seconds > 2016-05-24 15:36:51,053 - File['/var/run/hbase/hbase-hbase-regionserver.pid'] > {'action': ['delete']} > 2016-05-24 15:36:51,054 - Deleting > File['/var/run/hbase/hbase-hbase-regionserver.pid' -- This message was sent by Atlassian JIRA (v6.3.4#6332)