[ 
https://issues.apache.org/jira/browse/HBASE-26468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17447467#comment-17447467
 ] 

Duo Zhang commented on HBASE-26468:
-----------------------------------

Maybe we could add a delay? For example, if the process does not exit for 30 
seconds, we call System.exit to force quit, and the return value should be 
something other than 0 to indicate that this is a force terminate.

> Region Server doesn't exit cleanly incase it crashes.
> -----------------------------------------------------
>
>                 Key: HBASE-26468
>                 URL: https://issues.apache.org/jira/browse/HBASE-26468
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 1.6.0
>            Reporter: Rushabh Shah
>            Assignee: Rushabh Shah
>            Priority: Major
>             Fix For: 2.5.0, 3.0.0-alpha-2, 1.7.2, 2.3.8, 2.4.9
>
>
> Observed this in our production cluster running 1.6 version.
> RS crashed due to some reason but the process was still running. On debugging 
> more, found out there was 1 non-daemon thread running and that was not 
> allowing RS to exit cleanly. Our clusters are managed by Ambari and have auto 
> restart capability within them. But since the process was running and pid 
> file was present, Ambari also couldn't do much. There will be some bug where 
> we will miss to stop some non daemon thread. Shutdown hook will not be called 
> unless one of the following 2 conditions are met:
> # The Java virtual machine shuts down in response to two kinds of events:
> The program exits normally, when the last non-daemon thread exits or when the 
> exit (equivalently, System.exit) method is invoked, or
> # The virtual machine is terminated in response to a user interrupt, such as 
> typing ^C, or a system-wide event, such as user logoff or system shutdown.
> Considering the first condition, when the last non-daemon thread exits or 
> when the exit method is invoked.
> Below is the code snippet from 
> [HRegionServerCommandLine.java|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServerCommandLine.java#L51]
> {code:java}
>   private int start() throws Exception {
>     try {
>       if (LocalHBaseCluster.isLocal(conf)) {
>          // Ignore this.
>       } else {
>         HRegionServer hrs = 
> HRegionServer.constructRegionServer(regionServerClass, conf);
>         hrs.start();
>         hrs.join();
>         if (hrs.isAborted()) {
>           throw new RuntimeException("HRegionServer Aborted");
>         }
>       }
>     } catch (Throwable t) {
>       LOG.error("Region server exiting", t);
>       return 1;
>     }
>     return 0;
>   }
> {code}
> Within HRegionServer, there is a subtle difference between when a server is 
> aborted v/s when it is stopped. If it is stopped, then isAborted will return 
> false and it will exit with return code 0.
> Below is the code from 
> [ServerCommandLine.java|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/util/ServerCommandLine.java#L147]
> {code:java}
>   public void doMain(String args[]) {
>     try {
>       int ret = ToolRunner.run(HBaseConfiguration.create(), this, args);
>       if (ret != 0) {
>         System.exit(ret);
>       }
>     } catch (Exception e) {
>       LOG.error("Failed to run", e);
>       System.exit(-1);
>     }
>   }
> {code}
> If return code is 0, then it won't call System.exit. This means JVM will wait 
> to call ShutdownHook until all non daemon threads are stopped which means 
> infinite wait if we don't close all non-daemon threads cleanly.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to