[ https://issues.apache.org/jira/browse/HBASE-28405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17842246#comment-17842246 ]
Hudson commented on HBASE-28405: -------------------------------- Results for branch branch-2.4 [build #728 on builds.a.o|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.4/728/]: (/) *{color:green}+1 overall{color}* ---- details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.4/728/General_20Nightly_20Build_20Report/] (/) {color:green}+1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.4/728/JDK8_20Nightly_20Build_20Report_20_28Hadoop2_29/] (/) {color:green}+1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.4/728/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/] (/) {color:green}+1 jdk11 hadoop3 checks{color} -- For more information [see jdk11 report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.4/728/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > Region open procedure silently returns without notifying the parent proc > ------------------------------------------------------------------------ > > Key: HBASE-28405 > URL: https://issues.apache.org/jira/browse/HBASE-28405 > Project: HBase > Issue Type: Bug > Components: proc-v2, Region Assignment > Affects Versions: 2.4.17, 2.5.8 > Reporter: Aman Poonia > Assignee: Aman Poonia > Priority: Major > Labels: pull-request-available > Fix For: 2.6.0, 2.4.18, 3.0.0-beta-2, 2.5.9 > > > *We had a scenario in production where a merge operation had failed as below* > _2024-02-11 10:53:57,715 ERROR [PEWorker-31] > assignment.MergeTableRegionsProcedure - Error trying to merge > [a92008b76ccae47d55c590930b837036, f56752ae9f30fad9de5a80a8ba578e4b] in > table1 (in state=MERGE_TABLE_REGIONS_CLOSE_REGIONS)_ > _org.apache.hadoop.hbase.HBaseIOException: The parent region state=MERGING, > location=rs-229,60020,1707587658182, table=table1, > region=f56752ae9f30fad9de5a80a8ba578e4b is currently in transition, give up_ > _at > org.apache.hadoop.hbase.master.assignment.AssignmentManagerUtil.createUnassignProceduresForSplitOrMerge(AssignmentManagerUtil.java:120)_ > _at > org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.createUnassignProcedures(MergeTableRegionsProcedure.java:648)_ > _at > org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:205)_ > _at > org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:79)_ > _at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:188)_ > _at > org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:922)_ > _at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1650)_ > _at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1396)_ > _at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1000(ProcedureExecutor.java:75)_ > _at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.runProcedure(ProcedureExecutor.java:1964)_ > _at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:216)_ > _at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1991)_ > *Now when we do rollback of failed merge operation we see a issue where > region is in state opened until the RS holding it stopped.* > Rollback create a TRSP as below > _2024-02-11 10:53:57,719 DEBUG [PEWorker-31] procedure2.ProcedureExecutor - > Stored [pid=26674602, > state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE; > TransitRegionStateProcedure table=table1, > region=a92008b76ccae47d55c590930b837036, ASSIGN]_ > *and rollback finished successfully* > _2024-02-11 10:53:57,721 INFO [PEWorker-31] procedure2.ProcedureExecutor - > Rolled back pid=26673594, state=ROLLEDBACK, > exception=org.apache.hadoop.hbase.HBaseIOException via > master-merge-regions:org.apache.hadoop.hbase.HBaseIOException: The parent > region state=MERGING, location=rs-229,60020,1707587658182, table=table1, > region=f56752ae9f30fad9de5a80a8ba578e4b is currently in transition, give up; > MergeTableRegionsProcedure table=table1, > regions=[a92008b76ccae47d55c590930b837036, f56752ae9f30fad9de5a80a8ba578e4b], > force=false exec-time=1.4820 sec_ > *We create a procedure to open the region a92008b76ccae47d55c590930b837036. > Intrestingly we didnt close the region as creation of procedure to close > regions had thrown exception and not execution of procedure. When we run TRSP > it sends a OpenRegionProcedure which is handled by AssignRegionHandler. This > handlers on execution suggests that region is already online* > Sequence of events are as follow > _2024-02-11 10:53:58,919 INFO [PEWorker-58] assignment.RegionStateStore - > pid=26674602 updating hbase:meta row=a92008b76ccae47d55c590930b837036, > regionState=OPENING, regionLocation=rs-210,60020,1707596461539_ > _2024-02-11 10:53:58,920 INFO [PEWorker-58] procedure2.ProcedureExecutor - > Initialized subprocedures=[\\{pid=26675798, ppid=26674602, state=RUNNABLE; > OpenRegionProcedure a92008b76ccae47d55c590930b837036, > server=rs-210,60020,1707596461539}]_ > _2024-02-11 10:53:59,074 WARN [REGION-regionserver/rs-210:60020-10] > handler.AssignRegionHandler - Received OPEN for > table1,r1,1685436252488.a92008b76ccae47d55c590930b837036. which is already > online_ -- This message was sent by Atlassian Jira (v8.20.10#820010)