[
https://issues.apache.org/jira/browse/HBASE-26468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Viraj Jasani resolved HBASE-26468.
----------------------------------
Fix Version/s: (was: 2.3.8)
Hadoop Flags: Reviewed
Resolution: Fixed
Thanks for this nice contribution [~shahrs87] and thanks for the reviews
[~zhangduo] [~gjacoby].
> Region Server doesn't exit cleanly incase it crashes.
> -----------------------------------------------------
>
> Key: HBASE-26468
> URL: https://issues.apache.org/jira/browse/HBASE-26468
> Project: HBase
> Issue Type: Bug
> Components: regionserver
> Affects Versions: 1.6.0
> Reporter: Rushabh Shah
> Assignee: Rushabh Shah
> Priority: Major
> Fix For: 2.5.0, 3.0.0-alpha-2, 1.7.2, 2.4.9
>
>
> Observed this in our production cluster running 1.6 version.
> RS crashed due to some reason but the process was still running. On debugging
> more, found out there was 1 non-daemon thread running and that was not
> allowing RS to exit cleanly. Our clusters are managed by Ambari and have auto
> restart capability within them. But since the process was running and pid
> file was present, Ambari also couldn't do much. There will be some bug where
> we will miss to stop some non daemon thread. Shutdown hook will not be called
> unless one of the following 2 conditions are met:
> # The Java virtual machine shuts down in response to two kinds of events:
> The program exits normally, when the last non-daemon thread exits or when the
> exit (equivalently, System.exit) method is invoked, or
> # The virtual machine is terminated in response to a user interrupt, such as
> typing ^C, or a system-wide event, such as user logoff or system shutdown.
> Considering the first condition, when the last non-daemon thread exits or
> when the exit method is invoked.
> Below is the code snippet from
> [HRegionServerCommandLine.java|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServerCommandLine.java#L51]
> {code:java}
> private int start() throws Exception {
> try {
> if (LocalHBaseCluster.isLocal(conf)) {
> // Ignore this.
> } else {
> HRegionServer hrs =
> HRegionServer.constructRegionServer(regionServerClass, conf);
> hrs.start();
> hrs.join();
> if (hrs.isAborted()) {
> throw new RuntimeException("HRegionServer Aborted");
> }
> }
> } catch (Throwable t) {
> LOG.error("Region server exiting", t);
> return 1;
> }
> return 0;
> }
> {code}
> Within HRegionServer, there is a subtle difference between when a server is
> aborted v/s when it is stopped. If it is stopped, then isAborted will return
> false and it will exit with return code 0.
> Below is the code from
> [ServerCommandLine.java|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/util/ServerCommandLine.java#L147]
> {code:java}
> public void doMain(String args[]) {
> try {
> int ret = ToolRunner.run(HBaseConfiguration.create(), this, args);
> if (ret != 0) {
> System.exit(ret);
> }
> } catch (Exception e) {
> LOG.error("Failed to run", e);
> System.exit(-1);
> }
> }
> {code}
> If return code is 0, then it won't call System.exit. This means JVM will wait
> to call ShutdownHook until all non daemon threads are stopped which means
> infinite wait if we don't close all non-daemon threads cleanly.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)