[ https://issues.apache.org/jira/browse/HBASE-28405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Aman Poonia updated HBASE-28405: -------------------------------- Summary: Region open procedure silently returns without notifying the parent proc (was: Region open procedure silently does nothing without notifying the parent proc) > Region open procedure silently returns without notifying the parent proc > ------------------------------------------------------------------------ > > Key: HBASE-28405 > URL: https://issues.apache.org/jira/browse/HBASE-28405 > Project: HBase > Issue Type: Bug > Components: proc-v2 > Affects Versions: 2.5.7 > Reporter: Aman Poonia > Assignee: Aman Poonia > Priority: Major > > We had a scenario in production where a merge operation had failed as below > _2024-02-11 10:53:57,715 ERROR [PEWorker-31] > assignment.MergeTableRegionsProcedure - Error trying to merge > [a92008b76ccae47d55c590930b837036, f56752ae9f30fad9de5a80a8ba578e4b] in > table1 (in state=MERGE_TABLE_REGIONS_CLOSE_REGIONS)_ > _org.apache.hadoop.hbase.HBaseIOException: The parent region state=MERGING, > location=rs-229,60020,1707587658182, table=table1, > region=f56752ae9f30fad9de5a80a8ba578e4b is currently in transition, give up_ > _at > org.apache.hadoop.hbase.master.assignment.AssignmentManagerUtil.createUnassignProceduresForSplitOrMerge(AssignmentManagerUtil.java:120)_ > _at > org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.createUnassignProcedures(MergeTableRegionsProcedure.java:648)_ > _at > org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:205)_ > _at > org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.executeFromState(MergeTableRegionsProcedure.java:79)_ > _at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:188)_ > _at > org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:922)_ > _at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1650)_ > _at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1396)_ > _at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1000(ProcedureExecutor.java:75)_ > _at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.runProcedure(ProcedureExecutor.java:1964)_ > _at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:216)_ > _at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1991)_ > Now when we do rollback of failed merge operation we see a issue where region > is in state opened until the RS holding it stopped. > Rollback create a TRSP as below > _2024-02-11 10:53:57,719 DEBUG [PEWorker-31] procedure2.ProcedureExecutor - > Stored [pid=26674602, > state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE; > TransitRegionStateProcedure table=table1, > region=a92008b76ccae47d55c590930b837036, ASSIGN]_ > and rollback finished successfully > _2024-02-11 10:53:57,721 INFO [PEWorker-31] procedure2.ProcedureExecutor - > Rolled back pid=26673594, state=ROLLEDBACK, > exception=org.apache.hadoop.hbase.HBaseIOException via > master-merge-regions:org.apache.hadoop.hbase.HBaseIOException: The parent > region state=MERGING, location=rs-229,60020,1707587658182, table=table1, > region=f56752ae9f30fad9de5a80a8ba578e4b is currently in transition, give up; > MergeTableRegionsProcedure table=table1, > regions=[a92008b76ccae47d55c590930b837036, f56752ae9f30fad9de5a80a8ba578e4b], > force=false exec-time=1.4820 sec_ > We create a procedure to open the region a92008b76ccae47d55c590930b837036 > Intrestingly we didnt close the region as creation of procedure to close > regions had thrown exception and not execution of procedure. > Now when we run TRSP it sends a OpenRegionProcedure which is handled by > AssignRegionHandler > This handlers on execution suggests that region is already online > Sequence of events are as follow > _2024-02-11 10:53:58,919 INFO [PEWorker-58] assignment.RegionStateStore - > pid=26674602 updating hbase:meta row=a92008b76ccae47d55c590930b837036, > regionState=OPENING, regionLocation=rs-210,60020,1707596461539_ > _2024-02-11 10:53:58,920 INFO [PEWorker-58] procedure2.ProcedureExecutor - > Initialized subprocedures=[\{pid=26675798, ppid=26674602, state=RUNNABLE; > OpenRegionProcedure a92008b76ccae47d55c590930b837036, > server=rs-210,60020,1707596461539}]_ > _2024-02-11 10:53:59,074 WARN [REGION-regionserver/rs-210:60020-10] > handler.AssignRegionHandler - Received OPEN for > table1,r1,1685436252488.a92008b76ccae47d55c590930b837036. which is already > online_ -- This message was sent by Atlassian Jira (v8.20.10#820010)