Amitanand Aiyer created HBASE-7242:
--------------------------------------

             Summary: Use Runtime.exit() instead of Runtime.halt() upon HLog 
flush failures
                 Key: HBASE-7242
                 URL: https://issues.apache.org/jira/browse/HBASE-7242
             Project: HBase
          Issue Type: Brainstorming
            Reporter: Amitanand Aiyer
            Priority: Minor


Hey Guys,
  Should we use Runtime.exit() instead of Runtime.halt(), when we fail a Hlog 
sync. 

 The key difference is that Runtime.exit() is going to invoke the shutdown 
hooks; while Runtime.halt() does not.


 Why we might need this: 
   We had a HDFS name node reboot today on one of our cells, and this caused 
multiple region servers to abort because they could not sync the Hlog.

   However, since multiple RS died simultaneously, this seemed like a 
co-related failure to the master. The master waits for the
Znode to expire; but, this could take up to few minutes after RS death (this 
setting is in place so that we can withstand rack switch reboots, lasting a 
couple of minutes, without region movement).

  If the shutdown hooks are called, RS will close the ZK connection, causing a 
immediate Znode expiry. This might help cut down the unavailability as 
Regions can begin to get assigned faster.


 While, we do want to abort on Hlog failure, I do not think it would hurt 
giving the JVM a few seconds to shutdown gracefully. Please let me know
If I am missing something.

Thanks,
-Amit

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to