[ https://issues.apache.org/jira/browse/HBASE-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jonathan Gray updated HBASE-3419: --------------------------------- Attachment: HBASE-3419-v3.patch Rebased on branch. > If re-transition to OPENING during log replay fails, server aborts. Instead, > should just cancel region open. > ------------------------------------------------------------------------------------------------------------- > > Key: HBASE-3419 > URL: https://issues.apache.org/jira/browse/HBASE-3419 > Project: HBase > Issue Type: Bug > Components: regionserver, zookeeper > Affects Versions: 0.90.0, 0.92.0 > Reporter: Jonathan Gray > Assignee: Jonathan Gray > Priority: Critical > Fix For: 0.90.1, 0.92.0 > > Attachments: HBASE-3419-v1.patch, HBASE-3419-v2.patch, > HBASE-3419-v3.patch > > > The {{Progressable}} used on region open to tickle the ZK OPENING node to > prevent the master from timing out a region open operation will currently > abort the RegionServer if this fails for some reason. However it could be > "normal" for an RS to have a region open operation aborted by the master, so > should just handle as it does other places by reverting the open. > We had a cluster trip over some other issue (for some reason, the tickle was > not happening in < 30 seconds, so master was timing out every time). Because > of the abort on BadVersion, this eventually led to every single RS aborting > itself eventually taking down the cluster. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira