[ https://issues.apache.org/jira/browse/HBASE-21611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sergey Shelukhin updated HBASE-21611: ------------------------------------- Summary: REGION_STATE_TRANSITION_CONFIRM_CLOSED should interact better with crash procedure (was: REGION_STATE_TRANSITION_CONFIRM_CLOSED should interact better with crash procedure.) > REGION_STATE_TRANSITION_CONFIRM_CLOSED should interact better with crash > procedure > ---------------------------------------------------------------------------------- > > Key: HBASE-21611 > URL: https://issues.apache.org/jira/browse/HBASE-21611 > Project: HBase > Issue Type: Bug > Reporter: Sergey Shelukhin > Priority: Major > > 1) Not a bug per se, since HDFS is not supposed to lose files, just a bit > fragile. > When a dead server's WAL directory is deleted (due to a manual intervention, > or some issue with HDFS) while some regions are in CLOSING state on that > server, they get stuck forever in REGION_STATE_TRANSITION_CONFIRM_CLOSED - > REGION_STATE_TRANSITION_CLOSE - "give up and mark the procedure as complete, > the parent procedure will take care of this" loop. There's no crash procedure > for the server so nobody ever takes care of that. > 2) Under normal circumstances, when a large WAL is being split, this same > loop keeps spamming the logs and wasting resources for no reason, until the > crash procedure completes. There's no reason for it to retry - it should just > wait for crash procedure. -- This message was sent by Atlassian JIRA (v7.6.3#76005)