Sergey Shelukhin created HBASE-21611: ----------------------------------------
Summary: REGION_STATE_TRANSITION_CONFIRM_CLOSED should interact better with crash procedure. Key: HBASE-21611 URL: https://issues.apache.org/jira/browse/HBASE-21611 Project: HBase Issue Type: Bug Reporter: Sergey Shelukhin 1) Not a bug per se, since HDFS is not supposed to lose files, just a bit fragile. When a dead server's WAL directory is deleted (due to a manual intervention, or some issue with HDFS) while some regions are in CLOSING state on that server, they get stuck forever in REGION_STATE_TRANSITION_CONFIRM_CLOSED - REGION_STATE_TRANSITION_CLOSE - "give up and mark the procedure as complete, the parent procedure will take care of this" loop. There's no crash procedure for the server so nobody ever takes care of that. 2) Under normal circumstances, when a large WAL is being split, this same loop keeps spamming the logs and wasting resources for no reason, until the crash procedure completes. There's no reason for it to retry - it should just wait for crash procedure. -- This message was sent by Atlassian JIRA (v7.6.3#76005)