Sergey Shelukhin created HBASE-21611:
----------------------------------------

             Summary: REGION_STATE_TRANSITION_CONFIRM_CLOSED should interact 
better with crash procedure.
                 Key: HBASE-21611
                 URL: https://issues.apache.org/jira/browse/HBASE-21611
             Project: HBase
          Issue Type: Bug
            Reporter: Sergey Shelukhin


1) Not a bug per se, since HDFS is not supposed to lose files, just a bit 
fragile.
When a dead server's WAL directory is deleted (due to a manual intervention, or 
some issue with HDFS) while some regions are in CLOSING state on that server, 
they get stuck forever in REGION_STATE_TRANSITION_CONFIRM_CLOSED - 
REGION_STATE_TRANSITION_CLOSE - "give up and mark the procedure as complete, 
the parent procedure will take care of this" loop. There's no crash procedure 
for the server so nobody ever takes care of that.

2) Under normal circumstances, when a large WAL is being split, this same loop 
keeps spamming the logs and wasting resources for no reason, until the crash 
procedure completes. There's no reason for it to retry - it should just wait 
for crash procedure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to