junegunn commented on code in PR #6797: URL: https://github.com/apache/hbase/pull/6797#discussion_r2103706270
########## hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/TransitRegionStateProcedure.java: ########## @@ -392,10 +399,26 @@ private void closeRegion(MasterProcedureEnv env, RegionStateNode regionNode) ) { return; } - if (regionNode.isInState(State.OPEN, State.CLOSING, State.MERGING, State.SPLITTING)) { - // this is the normal case - ProcedureFutureUtil.suspendIfNecessary(this, this::setFuture, - env.getAssignmentManager().regionClosing(regionNode), env, + + CompletableFuture<Void> future = null; + if (regionNode.isInState(STATES_EXPECTED_ON_CLOSING)) { + // This is the normal case + future = env.getAssignmentManager().regionClosing(regionNode); + } else if (regionNode.setState(State.CLOSED, State.FAILED_OPEN)) { + // FAILED_OPEN doesn't need further transition, immediately mark the region as closed + AssignmentManager am = env.getAssignmentManager(); + am.getRegionStates().removeFromFailedOpen(regionNode.getRegionInfo()); + future = am.getRegionStateStore().updateRegionLocation(regionNode); Review Comment: Thanks for the feedback. Before I answer the questions, please take a look at 930b8d9826017b019989a0afe08405e71ec9fc2e. I added a few more steps so that the intention of this change is clearer. > We do not need to set the state to CLOSED when the state is FAILED_OPEN? So the question is: "Is `regionNode.setState(State.CLOSED, State.FAILED_OPEN)` really necessary? Can we just check `regionNode.isInState(State.FAILED_OPEN)` here?" Am I right? If we don't change the state, ```diff diff --git a/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/TransitRegionStateProcedure.java b/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/TransitRegionStateProcedure.java index 6b5aaf6975..215e1245ed 100644 --- a/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/TransitRegionStateProcedure.java +++ b/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/TransitRegionStateProcedure.java @@ -404,11 +404,13 @@ public class TransitRegionStateProcedure if (regionNode.isInState(STATES_EXPECTED_ON_CLOSING)) { // This is the normal case future = env.getAssignmentManager().regionClosing(regionNode); - } else if (regionNode.setState(State.CLOSED, State.FAILED_OPEN)) { - // FAILED_OPEN doesn't need further transition, immediately mark the region as closed + } else if (regionNode.isInState(State.FAILED_OPEN)) { + // Remove the region from RIT list to suppress periodic "RITs over threshold" messages AssignmentManager am = env.getAssignmentManager(); am.getRegionStates().removeFromFailedOpen(regionNode.getRegionInfo()); - future = am.getRegionStateStore().updateRegionLocation(regionNode); + + // FAILED_OPEN doesn't need further transition + future = CompletableFuture.allOf(); } if (future != null) { ProcedureFutureUtil.suspendIfNecessary(this, this::setFuture, future, env, ``` we will run into an assertion error in `confirmClosed`: https://github.com/apache/hbase/blob/cab6eb41543d6ca4ef57cab2d87a3a6e6796c88c/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/TransitRegionStateProcedure.java#L423-L449 Because `FAILED_OPEN` is not covered in the conditions. If we add `FAILED_OPEN` here like so: ```diff diff --git a/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/TransitRegionStateProcedure.java b/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/TransitRegionStateProcedure.java index 6b5aaf6975..7cf0941f20 100644 --- a/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/TransitRegionStateProcedure.java +++ b/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/TransitRegionStateProcedure.java @@ -422,7 +424,7 @@ public class TransitRegionStateProcedure private Flow confirmClosed(MasterProcedureEnv env, RegionStateNode regionNode) throws IOException { - if (regionNode.isInState(State.CLOSED)) { + if (regionNode.isInState(State.CLOSED, State.FAILED_OPEN)) { retryCounter = null; if (lastState == RegionStateTransitionState.REGION_STATE_TRANSITION_CONFIRM_CLOSED) { // we are the last state, finish ``` We can avoid the assertion error, but `CloseTableRegionsProcedure` will not finish and endlessly retry: ``` There are still 1 unclosed region(s) for closing regions of table testDisableFailedOpenRegions when executing CloseTableRegionsProcedure, continue... ``` So we also need to change the implementation of `numberOfUnclosedRegions` to account for `FAILED_OPEN` regions. https://github.com/apache/hbase/blob/42715b681d05428c63eca926b5967d5fa64eb1a0/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java#L1147-L1163 But I felt this was getting too complicated, and it's simpler to just change the state to `CLOSED`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org