junegunn commented on code in PR #6797:
URL: https://github.com/apache/hbase/pull/6797#discussion_r2103706270


##########
hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/TransitRegionStateProcedure.java:
##########
@@ -392,10 +399,26 @@ private void closeRegion(MasterProcedureEnv env, 
RegionStateNode regionNode)
     ) {
       return;
     }
-    if (regionNode.isInState(State.OPEN, State.CLOSING, State.MERGING, 
State.SPLITTING)) {
-      // this is the normal case
-      ProcedureFutureUtil.suspendIfNecessary(this, this::setFuture,
-        env.getAssignmentManager().regionClosing(regionNode), env,
+
+    CompletableFuture<Void> future = null;
+    if (regionNode.isInState(STATES_EXPECTED_ON_CLOSING)) {
+      // This is the normal case
+      future = env.getAssignmentManager().regionClosing(regionNode);
+    } else if (regionNode.setState(State.CLOSED, State.FAILED_OPEN)) {
+      // FAILED_OPEN doesn't need further transition, immediately mark the 
region as closed
+      AssignmentManager am = env.getAssignmentManager();
+      am.getRegionStates().removeFromFailedOpen(regionNode.getRegionInfo());
+      future = am.getRegionStateStore().updateRegionLocation(regionNode);

Review Comment:
   Thanks for the feedback. Before I answer the questions, please take a look 
at 930b8d9826017b019989a0afe08405e71ec9fc2e. I added a few more steps so that 
the intention of this change is clearer.
   
   > We do not need to set the state to CLOSED when the state is FAILED_OPEN?
   
   So the question is: "Is `regionNode.setState(State.CLOSED, 
State.FAILED_OPEN)` really necessary? Can we just check 
`regionNode.isInState(State.FAILED_OPEN)` here?" Am I right?
   
   If we don't change the state,
   
   ```diff
   diff --git 
a/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/TransitRegionStateProcedure.java
 
b/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/TransitRegionStateProcedure.java
   index 6b5aaf6975..215e1245ed 100644
   --- 
a/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/TransitRegionStateProcedure.java
   +++ 
b/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/TransitRegionStateProcedure.java
   @@ -404,11 +404,13 @@ public class TransitRegionStateProcedure
        if (regionNode.isInState(STATES_EXPECTED_ON_CLOSING)) {
          // This is the normal case
          future = env.getAssignmentManager().regionClosing(regionNode);
   -    } else if (regionNode.setState(State.CLOSED, State.FAILED_OPEN)) {
   -      // FAILED_OPEN doesn't need further transition, immediately mark the 
region as closed
   +    } else if (regionNode.isInState(State.FAILED_OPEN)) {
   +      // Remove the region from RIT list to suppress periodic "RITs over 
threshold" messages
          AssignmentManager am = env.getAssignmentManager();
          am.getRegionStates().removeFromFailedOpen(regionNode.getRegionInfo());
   -      future = am.getRegionStateStore().updateRegionLocation(regionNode);
   +
   +      // FAILED_OPEN doesn't need further transition
   +      future = CompletableFuture.allOf();
        }
        if (future != null) {
          ProcedureFutureUtil.suspendIfNecessary(this, this::setFuture, future, 
env,
   ```
   
   we will run into an assertion error in `confirmClosed`:
   
   
https://github.com/apache/hbase/blob/cab6eb41543d6ca4ef57cab2d87a3a6e6796c88c/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/TransitRegionStateProcedure.java#L423-L449
   
   Because `FAILED_OPEN` is not covered in the conditions. If we add 
`FAILED_OPEN` here like so:
   
   ```diff
   diff --git 
a/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/TransitRegionStateProcedure.java
 
b/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/TransitRegionStateProcedure.java
   index 6b5aaf6975..7cf0941f20 100644
   --- 
a/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/TransitRegionStateProcedure.java
   +++ 
b/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/TransitRegionStateProcedure.java
   @@ -422,7 +424,7 @@ public class TransitRegionStateProcedure
    
      private Flow confirmClosed(MasterProcedureEnv env, RegionStateNode 
regionNode)
        throws IOException {
   -    if (regionNode.isInState(State.CLOSED)) {
   +    if (regionNode.isInState(State.CLOSED, State.FAILED_OPEN)) {
          retryCounter = null;
          if (lastState == 
RegionStateTransitionState.REGION_STATE_TRANSITION_CONFIRM_CLOSED) {
            // we are the last state, finish
   ```
   
   We can avoid the assertion error, but `CloseTableRegionsProcedure` will not 
finish and endlessly retry:
   
   ```
   There are still 1 unclosed region(s) for closing regions of table 
testDisableFailedOpenRegions when executing CloseTableRegionsProcedure, 
continue...
   ```
   
   So we also need to change the implementation of `numberOfUnclosedRegions` to 
account for `FAILED_OPEN` regions.
   
   
https://github.com/apache/hbase/blob/42715b681d05428c63eca926b5967d5fa64eb1a0/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java#L1147-L1163
   
   But I felt this was getting too complicated, and it's simpler to just change 
the state to `CLOSED`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to