[jira] [Commented] (HBASE-26468) Region Server doesn't exit cleanly incase it crashes.
[ https://issues.apache.org/jira/browse/HBASE-26468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17452182#comment-17452182 ] Hudson commented on HBASE-26468: Results for branch master [build #457 on builds.a.o|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/master/457/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/master/457/General_20Nightly_20Build_20Report/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/master/457/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/] (x) {color:red}-1 jdk11 hadoop3 checks{color} -- For more information [see jdk11 report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/master/457/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > Region Server doesn't exit cleanly incase it crashes. > - > > Key: HBASE-26468 > URL: https://issues.apache.org/jira/browse/HBASE-26468 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 1.6.0 >Reporter: Rushabh Shah >Assignee: Rushabh Shah >Priority: Major > Fix For: 2.5.0, 3.0.0-alpha-2, 1.7.2, 2.4.9 > > > Observed this in our production cluster running 1.6 version. > RS crashed due to some reason but the process was still running. On debugging > more, found out there was 1 non-daemon thread running and that was not > allowing RS to exit cleanly. Our clusters are managed by Ambari and have auto > restart capability within them. But since the process was running and pid > file was present, Ambari also couldn't do much. There will be some bug where > we will miss to stop some non daemon thread. Shutdown hook will not be called > unless one of the following 2 conditions are met: > # The Java virtual machine shuts down in response to two kinds of events: > The program exits normally, when the last non-daemon thread exits or when the > exit (equivalently, System.exit) method is invoked, or > # The virtual machine is terminated in response to a user interrupt, such as > typing ^C, or a system-wide event, such as user logoff or system shutdown. > Considering the first condition, when the last non-daemon thread exits or > when the exit method is invoked. > Below is the code snippet from > [HRegionServerCommandLine.java|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServerCommandLine.java#L51] > {code:java} > private int start() throws Exception { > try { > if (LocalHBaseCluster.isLocal(conf)) { > // Ignore this. > } else { > HRegionServer hrs = > HRegionServer.constructRegionServer(regionServerClass, conf); > hrs.start(); > hrs.join(); > if (hrs.isAborted()) { > throw new RuntimeException("HRegionServer Aborted"); > } > } > } catch (Throwable t) { > LOG.error("Region server exiting", t); > return 1; > } > return 0; > } > {code} > Within HRegionServer, there is a subtle difference between when a server is > aborted v/s when it is stopped. If it is stopped, then isAborted will return > false and it will exit with return code 0. > Below is the code from > [ServerCommandLine.java|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/util/ServerCommandLine.java#L147] > {code:java} > public void doMain(String args[]) { > try { > int ret = ToolRunner.run(HBaseConfiguration.create(), this, args); > if (ret != 0) { > System.exit(ret); > } > } catch (Exception e) { > LOG.error("Failed to run", e); > System.exit(-1); > } > } > {code} > If return code is 0, then it won't call System.exit. This means JVM will wait > to call ShutdownHook until all non daemon threads are stopped which means > infinite wait if we don't close all non-daemon threads cleanly. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HBASE-26468) Region Server doesn't exit cleanly incase it crashes.
[ https://issues.apache.org/jira/browse/HBASE-26468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17452163#comment-17452163 ] Hudson commented on HBASE-26468: Results for branch branch-2.4 [build #248 on builds.a.o|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.4/248/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.4/248/General_20Nightly_20Build_20Report/] (/) {color:green}+1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.4/248/JDK8_20Nightly_20Build_20Report_20_28Hadoop2_29/] (/) {color:green}+1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.4/248/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/] (/) {color:green}+1 jdk11 hadoop3 checks{color} -- For more information [see jdk11 report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.4/248/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > Region Server doesn't exit cleanly incase it crashes. > - > > Key: HBASE-26468 > URL: https://issues.apache.org/jira/browse/HBASE-26468 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 1.6.0 >Reporter: Rushabh Shah >Assignee: Rushabh Shah >Priority: Major > Fix For: 2.5.0, 3.0.0-alpha-2, 1.7.2, 2.4.9 > > > Observed this in our production cluster running 1.6 version. > RS crashed due to some reason but the process was still running. On debugging > more, found out there was 1 non-daemon thread running and that was not > allowing RS to exit cleanly. Our clusters are managed by Ambari and have auto > restart capability within them. But since the process was running and pid > file was present, Ambari also couldn't do much. There will be some bug where > we will miss to stop some non daemon thread. Shutdown hook will not be called > unless one of the following 2 conditions are met: > # The Java virtual machine shuts down in response to two kinds of events: > The program exits normally, when the last non-daemon thread exits or when the > exit (equivalently, System.exit) method is invoked, or > # The virtual machine is terminated in response to a user interrupt, such as > typing ^C, or a system-wide event, such as user logoff or system shutdown. > Considering the first condition, when the last non-daemon thread exits or > when the exit method is invoked. > Below is the code snippet from > [HRegionServerCommandLine.java|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServerCommandLine.java#L51] > {code:java} > private int start() throws Exception { > try { > if (LocalHBaseCluster.isLocal(conf)) { > // Ignore this. > } else { > HRegionServer hrs = > HRegionServer.constructRegionServer(regionServerClass, conf); > hrs.start(); > hrs.join(); > if (hrs.isAborted()) { > throw new RuntimeException("HRegionServer Aborted"); > } > } > } catch (Throwable t) { > LOG.error("Region server exiting", t); > return 1; > } > return 0; > } > {code} > Within HRegionServer, there is a subtle difference between when a server is > aborted v/s when it is stopped. If it is stopped, then isAborted will return > false and it will exit with return code 0. > Below is the code from > [ServerCommandLine.java|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/util/ServerCommandLine.java#L147] > {code:java} > public void doMain(String args[]) { > try { > int ret = ToolRunner.run(HBaseConfiguration.create(), this, args); > if (ret != 0) { > System.exit(ret); > } > } catch (Exception e) { > LOG.error("Failed to run", e); > System.exit(-1); > } > } > {code} > If return code is 0, then it won't call System.exit. This means JVM will wait > to call ShutdownHook until all non daemon threads are stopped which means > infinite wait if we don't close all non-daemon threads cleanly. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HBASE-26468) Region Server doesn't exit cleanly incase it crashes.
[ https://issues.apache.org/jira/browse/HBASE-26468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17451960#comment-17451960 ] Hudson commented on HBASE-26468: Results for branch branch-2 [build #406 on builds.a.o|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/406/]: (x) *{color:red}-1 overall{color}* details (if available): (x) {color:red}-1 general checks{color} -- For more information [see general report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/406/General_20Nightly_20Build_20Report/] (/) {color:green}+1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/406/JDK8_20Nightly_20Build_20Report_20_28Hadoop2_29/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/406/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/] (x) {color:red}-1 jdk11 hadoop3 checks{color} -- For more information [see jdk11 report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/406/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > Region Server doesn't exit cleanly incase it crashes. > - > > Key: HBASE-26468 > URL: https://issues.apache.org/jira/browse/HBASE-26468 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 1.6.0 >Reporter: Rushabh Shah >Assignee: Rushabh Shah >Priority: Major > Fix For: 2.5.0, 3.0.0-alpha-2, 1.7.2, 2.4.9 > > > Observed this in our production cluster running 1.6 version. > RS crashed due to some reason but the process was still running. On debugging > more, found out there was 1 non-daemon thread running and that was not > allowing RS to exit cleanly. Our clusters are managed by Ambari and have auto > restart capability within them. But since the process was running and pid > file was present, Ambari also couldn't do much. There will be some bug where > we will miss to stop some non daemon thread. Shutdown hook will not be called > unless one of the following 2 conditions are met: > # The Java virtual machine shuts down in response to two kinds of events: > The program exits normally, when the last non-daemon thread exits or when the > exit (equivalently, System.exit) method is invoked, or > # The virtual machine is terminated in response to a user interrupt, such as > typing ^C, or a system-wide event, such as user logoff or system shutdown. > Considering the first condition, when the last non-daemon thread exits or > when the exit method is invoked. > Below is the code snippet from > [HRegionServerCommandLine.java|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServerCommandLine.java#L51] > {code:java} > private int start() throws Exception { > try { > if (LocalHBaseCluster.isLocal(conf)) { > // Ignore this. > } else { > HRegionServer hrs = > HRegionServer.constructRegionServer(regionServerClass, conf); > hrs.start(); > hrs.join(); > if (hrs.isAborted()) { > throw new RuntimeException("HRegionServer Aborted"); > } > } > } catch (Throwable t) { > LOG.error("Region server exiting", t); > return 1; > } > return 0; > } > {code} > Within HRegionServer, there is a subtle difference between when a server is > aborted v/s when it is stopped. If it is stopped, then isAborted will return > false and it will exit with return code 0. > Below is the code from > [ServerCommandLine.java|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/util/ServerCommandLine.java#L147] > {code:java} > public void doMain(String args[]) { > try { > int ret = ToolRunner.run(HBaseConfiguration.create(), this, args); > if (ret != 0) { > System.exit(ret); > } > } catch (Exception e) { > LOG.error("Failed to run", e); > System.exit(-1); > } > } > {code} > If return code is 0, then it won't call System.exit. This means JVM will wait > to call ShutdownHook until all non daemon threads are stopped which means > infinite wait if we don't close all non-daemon threads cleanly. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HBASE-26468) Region Server doesn't exit cleanly incase it crashes.
[ https://issues.apache.org/jira/browse/HBASE-26468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17451895#comment-17451895 ] Hudson commented on HBASE-26468: Results for branch branch-1 [build #186 on builds.a.o|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-1/186/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-1/186//General_Nightly_Build_Report/] (x) {color:red}-1 jdk7 checks{color} -- For more information [see jdk7 report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-1/186//JDK7_Nightly_Build_Report/] (/) {color:green}+1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-1/186//JDK8_Nightly_Build_Report_(Hadoop2)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. > Region Server doesn't exit cleanly incase it crashes. > - > > Key: HBASE-26468 > URL: https://issues.apache.org/jira/browse/HBASE-26468 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 1.6.0 >Reporter: Rushabh Shah >Assignee: Rushabh Shah >Priority: Major > Fix For: 2.5.0, 3.0.0-alpha-2, 1.7.2, 2.4.9 > > > Observed this in our production cluster running 1.6 version. > RS crashed due to some reason but the process was still running. On debugging > more, found out there was 1 non-daemon thread running and that was not > allowing RS to exit cleanly. Our clusters are managed by Ambari and have auto > restart capability within them. But since the process was running and pid > file was present, Ambari also couldn't do much. There will be some bug where > we will miss to stop some non daemon thread. Shutdown hook will not be called > unless one of the following 2 conditions are met: > # The Java virtual machine shuts down in response to two kinds of events: > The program exits normally, when the last non-daemon thread exits or when the > exit (equivalently, System.exit) method is invoked, or > # The virtual machine is terminated in response to a user interrupt, such as > typing ^C, or a system-wide event, such as user logoff or system shutdown. > Considering the first condition, when the last non-daemon thread exits or > when the exit method is invoked. > Below is the code snippet from > [HRegionServerCommandLine.java|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServerCommandLine.java#L51] > {code:java} > private int start() throws Exception { > try { > if (LocalHBaseCluster.isLocal(conf)) { > // Ignore this. > } else { > HRegionServer hrs = > HRegionServer.constructRegionServer(regionServerClass, conf); > hrs.start(); > hrs.join(); > if (hrs.isAborted()) { > throw new RuntimeException("HRegionServer Aborted"); > } > } > } catch (Throwable t) { > LOG.error("Region server exiting", t); > return 1; > } > return 0; > } > {code} > Within HRegionServer, there is a subtle difference between when a server is > aborted v/s when it is stopped. If it is stopped, then isAborted will return > false and it will exit with return code 0. > Below is the code from > [ServerCommandLine.java|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/util/ServerCommandLine.java#L147] > {code:java} > public void doMain(String args[]) { > try { > int ret = ToolRunner.run(HBaseConfiguration.create(), this, args); > if (ret != 0) { > System.exit(ret); > } > } catch (Exception e) { > LOG.error("Failed to run", e); > System.exit(-1); > } > } > {code} > If return code is 0, then it won't call System.exit. This means JVM will wait > to call ShutdownHook until all non daemon threads are stopped which means > infinite wait if we don't close all non-daemon threads cleanly. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HBASE-26468) Region Server doesn't exit cleanly incase it crashes.
[ https://issues.apache.org/jira/browse/HBASE-26468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17451737#comment-17451737 ] Rushabh Shah commented on HBASE-26468: -- Thank you [~vjasani] for the review and the merge ! Thank you [~zhangduo] [~gjacoby] for the review and feedback ! > Region Server doesn't exit cleanly incase it crashes. > - > > Key: HBASE-26468 > URL: https://issues.apache.org/jira/browse/HBASE-26468 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 1.6.0 >Reporter: Rushabh Shah >Assignee: Rushabh Shah >Priority: Major > Fix For: 2.5.0, 3.0.0-alpha-2, 1.7.2, 2.4.9 > > > Observed this in our production cluster running 1.6 version. > RS crashed due to some reason but the process was still running. On debugging > more, found out there was 1 non-daemon thread running and that was not > allowing RS to exit cleanly. Our clusters are managed by Ambari and have auto > restart capability within them. But since the process was running and pid > file was present, Ambari also couldn't do much. There will be some bug where > we will miss to stop some non daemon thread. Shutdown hook will not be called > unless one of the following 2 conditions are met: > # The Java virtual machine shuts down in response to two kinds of events: > The program exits normally, when the last non-daemon thread exits or when the > exit (equivalently, System.exit) method is invoked, or > # The virtual machine is terminated in response to a user interrupt, such as > typing ^C, or a system-wide event, such as user logoff or system shutdown. > Considering the first condition, when the last non-daemon thread exits or > when the exit method is invoked. > Below is the code snippet from > [HRegionServerCommandLine.java|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServerCommandLine.java#L51] > {code:java} > private int start() throws Exception { > try { > if (LocalHBaseCluster.isLocal(conf)) { > // Ignore this. > } else { > HRegionServer hrs = > HRegionServer.constructRegionServer(regionServerClass, conf); > hrs.start(); > hrs.join(); > if (hrs.isAborted()) { > throw new RuntimeException("HRegionServer Aborted"); > } > } > } catch (Throwable t) { > LOG.error("Region server exiting", t); > return 1; > } > return 0; > } > {code} > Within HRegionServer, there is a subtle difference between when a server is > aborted v/s when it is stopped. If it is stopped, then isAborted will return > false and it will exit with return code 0. > Below is the code from > [ServerCommandLine.java|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/util/ServerCommandLine.java#L147] > {code:java} > public void doMain(String args[]) { > try { > int ret = ToolRunner.run(HBaseConfiguration.create(), this, args); > if (ret != 0) { > System.exit(ret); > } > } catch (Exception e) { > LOG.error("Failed to run", e); > System.exit(-1); > } > } > {code} > If return code is 0, then it won't call System.exit. This means JVM will wait > to call ShutdownHook until all non daemon threads are stopped which means > infinite wait if we don't close all non-daemon threads cleanly. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HBASE-26468) Region Server doesn't exit cleanly incase it crashes.
[ https://issues.apache.org/jira/browse/HBASE-26468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17448017#comment-17448017 ] Rushabh Shah commented on HBASE-26468: -- > Maybe we could add a delay? For example, if the process does not exit for 30 > seconds, we call System.exit to force quit, and the return value should be > something other than 0 to indicate that this is a force terminate. Sounds like a good idea. Thank you [~zhangduo] ! > Region Server doesn't exit cleanly incase it crashes. > - > > Key: HBASE-26468 > URL: https://issues.apache.org/jira/browse/HBASE-26468 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 1.6.0 >Reporter: Rushabh Shah >Assignee: Rushabh Shah >Priority: Major > Fix For: 2.5.0, 3.0.0-alpha-2, 1.7.2, 2.3.8, 2.4.9 > > > Observed this in our production cluster running 1.6 version. > RS crashed due to some reason but the process was still running. On debugging > more, found out there was 1 non-daemon thread running and that was not > allowing RS to exit cleanly. Our clusters are managed by Ambari and have auto > restart capability within them. But since the process was running and pid > file was present, Ambari also couldn't do much. There will be some bug where > we will miss to stop some non daemon thread. Shutdown hook will not be called > unless one of the following 2 conditions are met: > # The Java virtual machine shuts down in response to two kinds of events: > The program exits normally, when the last non-daemon thread exits or when the > exit (equivalently, System.exit) method is invoked, or > # The virtual machine is terminated in response to a user interrupt, such as > typing ^C, or a system-wide event, such as user logoff or system shutdown. > Considering the first condition, when the last non-daemon thread exits or > when the exit method is invoked. > Below is the code snippet from > [HRegionServerCommandLine.java|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServerCommandLine.java#L51] > {code:java} > private int start() throws Exception { > try { > if (LocalHBaseCluster.isLocal(conf)) { > // Ignore this. > } else { > HRegionServer hrs = > HRegionServer.constructRegionServer(regionServerClass, conf); > hrs.start(); > hrs.join(); > if (hrs.isAborted()) { > throw new RuntimeException("HRegionServer Aborted"); > } > } > } catch (Throwable t) { > LOG.error("Region server exiting", t); > return 1; > } > return 0; > } > {code} > Within HRegionServer, there is a subtle difference between when a server is > aborted v/s when it is stopped. If it is stopped, then isAborted will return > false and it will exit with return code 0. > Below is the code from > [ServerCommandLine.java|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/util/ServerCommandLine.java#L147] > {code:java} > public void doMain(String args[]) { > try { > int ret = ToolRunner.run(HBaseConfiguration.create(), this, args); > if (ret != 0) { > System.exit(ret); > } > } catch (Exception e) { > LOG.error("Failed to run", e); > System.exit(-1); > } > } > {code} > If return code is 0, then it won't call System.exit. This means JVM will wait > to call ShutdownHook until all non daemon threads are stopped which means > infinite wait if we don't close all non-daemon threads cleanly. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HBASE-26468) Region Server doesn't exit cleanly incase it crashes.
[ https://issues.apache.org/jira/browse/HBASE-26468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17447467#comment-17447467 ] Duo Zhang commented on HBASE-26468: --- Maybe we could add a delay? For example, if the process does not exit for 30 seconds, we call System.exit to force quit, and the return value should be something other than 0 to indicate that this is a force terminate. > Region Server doesn't exit cleanly incase it crashes. > - > > Key: HBASE-26468 > URL: https://issues.apache.org/jira/browse/HBASE-26468 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 1.6.0 >Reporter: Rushabh Shah >Assignee: Rushabh Shah >Priority: Major > Fix For: 2.5.0, 3.0.0-alpha-2, 1.7.2, 2.3.8, 2.4.9 > > > Observed this in our production cluster running 1.6 version. > RS crashed due to some reason but the process was still running. On debugging > more, found out there was 1 non-daemon thread running and that was not > allowing RS to exit cleanly. Our clusters are managed by Ambari and have auto > restart capability within them. But since the process was running and pid > file was present, Ambari also couldn't do much. There will be some bug where > we will miss to stop some non daemon thread. Shutdown hook will not be called > unless one of the following 2 conditions are met: > # The Java virtual machine shuts down in response to two kinds of events: > The program exits normally, when the last non-daemon thread exits or when the > exit (equivalently, System.exit) method is invoked, or > # The virtual machine is terminated in response to a user interrupt, such as > typing ^C, or a system-wide event, such as user logoff or system shutdown. > Considering the first condition, when the last non-daemon thread exits or > when the exit method is invoked. > Below is the code snippet from > [HRegionServerCommandLine.java|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServerCommandLine.java#L51] > {code:java} > private int start() throws Exception { > try { > if (LocalHBaseCluster.isLocal(conf)) { > // Ignore this. > } else { > HRegionServer hrs = > HRegionServer.constructRegionServer(regionServerClass, conf); > hrs.start(); > hrs.join(); > if (hrs.isAborted()) { > throw new RuntimeException("HRegionServer Aborted"); > } > } > } catch (Throwable t) { > LOG.error("Region server exiting", t); > return 1; > } > return 0; > } > {code} > Within HRegionServer, there is a subtle difference between when a server is > aborted v/s when it is stopped. If it is stopped, then isAborted will return > false and it will exit with return code 0. > Below is the code from > [ServerCommandLine.java|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/util/ServerCommandLine.java#L147] > {code:java} > public void doMain(String args[]) { > try { > int ret = ToolRunner.run(HBaseConfiguration.create(), this, args); > if (ret != 0) { > System.exit(ret); > } > } catch (Exception e) { > LOG.error("Failed to run", e); > System.exit(-1); > } > } > {code} > If return code is 0, then it won't call System.exit. This means JVM will wait > to call ShutdownHook until all non daemon threads are stopped which means > infinite wait if we don't close all non-daemon threads cleanly. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HBASE-26468) Region Server doesn't exit cleanly incase it crashes.
[ https://issues.apache.org/jira/browse/HBASE-26468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17447451#comment-17447451 ] Viraj Jasani commented on HBASE-26468: -- [~zhangduo] Although HBASE-26480 will have it's own fix, but I think we might still want to make this change. Graceful exit of JVM with status code 0 is a behaviour change, but I feel it's for the good. With this behaviour, we might not be able to know of any other non-daemon threads not shutting down properly but on the other hand, we will not see zombie processes either (disallowing CD/Monitoring systems to automatically start services when significant regionservers are affected). > Region Server doesn't exit cleanly incase it crashes. > - > > Key: HBASE-26468 > URL: https://issues.apache.org/jira/browse/HBASE-26468 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 1.6.0 >Reporter: Rushabh Shah >Assignee: Rushabh Shah >Priority: Major > Fix For: 2.5.0, 3.0.0-alpha-2, 1.7.2, 2.3.8, 2.4.9 > > > Observed this in our production cluster running 1.6 version. > RS crashed due to some reason but the process was still running. On debugging > more, found out there was 1 non-daemon thread running and that was not > allowing RS to exit cleanly. Our clusters are managed by Ambari and have auto > restart capability within them. But since the process was running and pid > file was present, Ambari also couldn't do much. There will be some bug where > we will miss to stop some non daemon thread. Shutdown hook will not be called > unless one of the following 2 conditions are met: > # The Java virtual machine shuts down in response to two kinds of events: > The program exits normally, when the last non-daemon thread exits or when the > exit (equivalently, System.exit) method is invoked, or > # The virtual machine is terminated in response to a user interrupt, such as > typing ^C, or a system-wide event, such as user logoff or system shutdown. > Considering the first condition, when the last non-daemon thread exits or > when the exit method is invoked. > Below is the code snippet from > [HRegionServerCommandLine.java|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServerCommandLine.java#L51] > {code:java} > private int start() throws Exception { > try { > if (LocalHBaseCluster.isLocal(conf)) { > // Ignore this. > } else { > HRegionServer hrs = > HRegionServer.constructRegionServer(regionServerClass, conf); > hrs.start(); > hrs.join(); > if (hrs.isAborted()) { > throw new RuntimeException("HRegionServer Aborted"); > } > } > } catch (Throwable t) { > LOG.error("Region server exiting", t); > return 1; > } > return 0; > } > {code} > Within HRegionServer, there is a subtle difference between when a server is > aborted v/s when it is stopped. If it is stopped, then isAborted will return > false and it will exit with return code 0. > Below is the code from > [ServerCommandLine.java|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/util/ServerCommandLine.java#L147] > {code:java} > public void doMain(String args[]) { > try { > int ret = ToolRunner.run(HBaseConfiguration.create(), this, args); > if (ret != 0) { > System.exit(ret); > } > } catch (Exception e) { > LOG.error("Failed to run", e); > System.exit(-1); > } > } > {code} > If return code is 0, then it won't call System.exit. This means JVM will wait > to call ShutdownHook until all non daemon threads are stopped which means > infinite wait if we don't close all non-daemon threads cleanly. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HBASE-26468) Region Server doesn't exit cleanly incase it crashes.
[ https://issues.apache.org/jira/browse/HBASE-26468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17447429#comment-17447429 ] Rushabh Shah commented on HBASE-26468: -- [~zhangduo] Created this jira with more details on which non daemon thread: HBASE-26480 > Region Server doesn't exit cleanly incase it crashes. > - > > Key: HBASE-26468 > URL: https://issues.apache.org/jira/browse/HBASE-26468 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 1.6.0 >Reporter: Rushabh Shah >Assignee: Rushabh Shah >Priority: Major > Fix For: 2.5.0, 3.0.0-alpha-2, 1.7.2, 2.3.8, 2.4.9 > > > Observed this in our production cluster running 1.6 version. > RS crashed due to some reason but the process was still running. On debugging > more, found out there was 1 non-daemon thread running and that was not > allowing RS to exit cleanly. Our clusters are managed by Ambari and have auto > restart capability within them. But since the process was running and pid > file was present, Ambari also couldn't do much. There will be some bug where > we will miss to stop some non daemon thread. Shutdown hook will not be called > unless one of the following 2 conditions are met: > # The Java virtual machine shuts down in response to two kinds of events: > The program exits normally, when the last non-daemon thread exits or when the > exit (equivalently, System.exit) method is invoked, or > # The virtual machine is terminated in response to a user interrupt, such as > typing ^C, or a system-wide event, such as user logoff or system shutdown. > Considering the first condition, when the last non-daemon thread exits or > when the exit method is invoked. > Below is the code snippet from > [HRegionServerCommandLine.java|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServerCommandLine.java#L51] > {code:java} > private int start() throws Exception { > try { > if (LocalHBaseCluster.isLocal(conf)) { > // Ignore this. > } else { > HRegionServer hrs = > HRegionServer.constructRegionServer(regionServerClass, conf); > hrs.start(); > hrs.join(); > if (hrs.isAborted()) { > throw new RuntimeException("HRegionServer Aborted"); > } > } > } catch (Throwable t) { > LOG.error("Region server exiting", t); > return 1; > } > return 0; > } > {code} > Within HRegionServer, there is a subtle difference between when a server is > aborted v/s when it is stopped. If it is stopped, then isAborted will return > false and it will exit with return code 0. > Below is the code from > [ServerCommandLine.java|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/util/ServerCommandLine.java#L147] > {code:java} > public void doMain(String args[]) { > try { > int ret = ToolRunner.run(HBaseConfiguration.create(), this, args); > if (ret != 0) { > System.exit(ret); > } > } catch (Exception e) { > LOG.error("Failed to run", e); > System.exit(-1); > } > } > {code} > If return code is 0, then it won't call System.exit. This means JVM will wait > to call ShutdownHook until all non daemon threads are stopped which means > infinite wait if we don't close all non-daemon threads cleanly. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HBASE-26468) Region Server doesn't exit cleanly incase it crashes.
[ https://issues.apache.org/jira/browse/HBASE-26468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17447292#comment-17447292 ] Duo Zhang commented on HBASE-26468: --- So which thread does not exit cleanly? > Region Server doesn't exit cleanly incase it crashes. > - > > Key: HBASE-26468 > URL: https://issues.apache.org/jira/browse/HBASE-26468 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 1.6.0 >Reporter: Rushabh Shah >Assignee: Rushabh Shah >Priority: Major > Fix For: 2.5.0, 3.0.0-alpha-2, 1.7.2, 2.3.8, 2.4.9 > > > Observed this in our production cluster running 1.6 version. > RS crashed due to some reason but the process was still running. On debugging > more, found out there was 1 non-daemon thread running and that was not > allowing RS to exit cleanly. Our clusters are managed by Ambari and have auto > restart capability within them. But since the process was running and pid > file was present, Ambari also couldn't do much. There will be some bug where > we will miss to stop some non daemon thread. Shutdown hook will not be called > unless one of the following 2 conditions are met: > # The Java virtual machine shuts down in response to two kinds of events: > The program exits normally, when the last non-daemon thread exits or when the > exit (equivalently, System.exit) method is invoked, or > # The virtual machine is terminated in response to a user interrupt, such as > typing ^C, or a system-wide event, such as user logoff or system shutdown. > Considering the first condition, when the last non-daemon thread exits or > when the exit method is invoked. > Below is the code snippet from > [HRegionServerCommandLine.java|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServerCommandLine.java#L51] > {code:java} > private int start() throws Exception { > try { > if (LocalHBaseCluster.isLocal(conf)) { > // Ignore this. > } else { > HRegionServer hrs = > HRegionServer.constructRegionServer(regionServerClass, conf); > hrs.start(); > hrs.join(); > if (hrs.isAborted()) { > throw new RuntimeException("HRegionServer Aborted"); > } > } > } catch (Throwable t) { > LOG.error("Region server exiting", t); > return 1; > } > return 0; > } > {code} > Within HRegionServer, there is a subtle difference between when a server is > aborted v/s when it is stopped. If it is stopped, then isAborted will return > false and it will exit with return code 0. > Below is the code from > [ServerCommandLine.java|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/util/ServerCommandLine.java#L147] > {code:java} > public void doMain(String args[]) { > try { > int ret = ToolRunner.run(HBaseConfiguration.create(), this, args); > if (ret != 0) { > System.exit(ret); > } > } catch (Exception e) { > LOG.error("Failed to run", e); > System.exit(-1); > } > } > {code} > If return code is 0, then it won't call System.exit. This means JVM will wait > to call ShutdownHook until all non daemon threads are stopped which means > infinite wait if we don't close all non-daemon threads cleanly. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HBASE-26468) Region Server doesn't exit cleanly incase it crashes.
[ https://issues.apache.org/jira/browse/HBASE-26468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17447284#comment-17447284 ] Viraj Jasani commented on HBASE-26468: -- FYI [~zhangduo] > Region Server doesn't exit cleanly incase it crashes. > - > > Key: HBASE-26468 > URL: https://issues.apache.org/jira/browse/HBASE-26468 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 1.6.0 >Reporter: Rushabh Shah >Assignee: Rushabh Shah >Priority: Major > > Observed this in our production cluster running 1.6 version. > RS crashed due to some reason but the process was still running. On debugging > more, found out there was 1 non-daemon thread running and that was not > allowing RS to exit cleanly. Our clusters are managed by Ambari and have auto > restart capability within them. But since the process was running and pid > file was present, Ambari also couldn't do much. There will be some bug where > we will miss to stop some non daemon thread. Shutdown hook will not be called > unless one of the following 2 conditions are met: > # The Java virtual machine shuts down in response to two kinds of events: > The program exits normally, when the last non-daemon thread exits or when the > exit (equivalently, System.exit) method is invoked, or > # The virtual machine is terminated in response to a user interrupt, such as > typing ^C, or a system-wide event, such as user logoff or system shutdown. > Considering the first condition, when the last non-daemon thread exits or > when the exit method is invoked. > Below is the code snippet from > [HRegionServerCommandLine.java|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServerCommandLine.java#L51] > {code:java} > private int start() throws Exception { > try { > if (LocalHBaseCluster.isLocal(conf)) { > // Ignore this. > } else { > HRegionServer hrs = > HRegionServer.constructRegionServer(regionServerClass, conf); > hrs.start(); > hrs.join(); > if (hrs.isAborted()) { > throw new RuntimeException("HRegionServer Aborted"); > } > } > } catch (Throwable t) { > LOG.error("Region server exiting", t); > return 1; > } > return 0; > } > {code} > Within HRegionServer, there is a subtle difference between when a server is > aborted v/s when it is stopped. If it is stopped, then isAborted will return > false and it will exit with return code 0. > Below is the code from > [ServerCommandLine.java|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/util/ServerCommandLine.java#L147] > {code:java} > public void doMain(String args[]) { > try { > int ret = ToolRunner.run(HBaseConfiguration.create(), this, args); > if (ret != 0) { > System.exit(ret); > } > } catch (Exception e) { > LOG.error("Failed to run", e); > System.exit(-1); > } > } > {code} > If return code is 0, then it won't call System.exit. This means JVM will wait > to call ShutdownHook until all non daemon threads are stopped which means > infinite wait if we don't close all non-daemon threads cleanly. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HBASE-26468) Region Server doesn't exit cleanly incase it crashes.
[ https://issues.apache.org/jira/browse/HBASE-26468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17446829#comment-17446829 ] Rushabh Shah commented on HBASE-26468: -- Created PR for master, branch-2 and branch-1. Please review. > Region Server doesn't exit cleanly incase it crashes. > - > > Key: HBASE-26468 > URL: https://issues.apache.org/jira/browse/HBASE-26468 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 1.6.0 >Reporter: Rushabh Shah >Assignee: Rushabh Shah >Priority: Major > > Observed this in our production cluster running 1.6 version. > RS crashed due to some reason but the process was still running. On debugging > more, found out there was 1 non-daemon thread running and that was not > allowing RS to exit cleanly. Our clusters are managed by Ambari and have auto > restart capability within them. But since the process was running and pid > file was present, Ambari also couldn't do much. There will be some bug where > we will miss to stop some non daemon thread. Shutdown hook will not be called > unless one of the following 2 conditions are met: > # The Java virtual machine shuts down in response to two kinds of events: > The program exits normally, when the last non-daemon thread exits or when the > exit (equivalently, System.exit) method is invoked, or > # The virtual machine is terminated in response to a user interrupt, such as > typing ^C, or a system-wide event, such as user logoff or system shutdown. > Considering the first condition, when the last non-daemon thread exits or > when the exit method is invoked. > Below is the code snippet from > [HRegionServerCommandLine.java|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServerCommandLine.java#L51] > {code:java} > private int start() throws Exception { > try { > if (LocalHBaseCluster.isLocal(conf)) { > // Ignore this. > } else { > HRegionServer hrs = > HRegionServer.constructRegionServer(regionServerClass, conf); > hrs.start(); > hrs.join(); > if (hrs.isAborted()) { > throw new RuntimeException("HRegionServer Aborted"); > } > } > } catch (Throwable t) { > LOG.error("Region server exiting", t); > return 1; > } > return 0; > } > {code} > Within HRegionServer, there is a subtle difference between when a server is > aborted v/s when it is stopped. If it is stopped, then isAborted will return > false and it will exit with return code 0. > Below is the code from > [ServerCommandLine.java|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/util/ServerCommandLine.java#L147] > {code:java} > public void doMain(String args[]) { > try { > int ret = ToolRunner.run(HBaseConfiguration.create(), this, args); > if (ret != 0) { > System.exit(ret); > } > } catch (Exception e) { > LOG.error("Failed to run", e); > System.exit(-1); > } > } > {code} > If return code is 0, then it won't call System.exit. This means JVM will wait > to call ShutdownHook until all non daemon threads are stopped which means > infinite wait if we don't close all non-daemon threads cleanly. -- This message was sent by Atlassian Jira (v8.20.1#820001)