ModifyTableProcedure does not override Procedure#holdLock(), so its
lock on the table is released every time it yields. Only a few
procedures hold their locks for their whole execution.

On Wed, May 14, 2025 at 12:20 AM 张铎(Duo Zhang) <palomino...@gmail.com> wrote:
>
> Haven't looked at the details yet, but we have table/region lock for
> the procedure system, which should prevent ModifyTableProcedure and
> SplitTableRegionProcedure executed at the same time.
>
> Not sure what breaks the assumption.
>
> Charles Connell <cconn...@hubspot.com.invalid> 于2025年5月14日周三 06:39写道:
> >
> > Hi folks,
> >
> > I recently uncovered a problem in the ModifyTableProcedure that I want
> > to discuss with the community. I altered a table to increase the
> > region replica count from 1 to 2. This led to some of the secondary
> > replica regions being stuck opening, which was difficult to remediate,
> > although luckily did not impact database users. I believe that the
> > reason is because of an unfortunate interaction between the
> > ModifyTableProcedure and the SplitTableRegionProcedure.
> >
> > When adding new replicas, ModifyTableProcedure roughly does the following:
> > 1. Edits the table descriptor
> > 2. Closes and opens all existing regions
> > 3. Creates replication peer to support the new replica regions
> > 4. Opens new replica regions
> >
> > When closing a region in step 2, the act of closing it can trigger a
> > split. When the HMaster runs the SplitTableRegionProcedure, it roughly
> > does:
> > 1. Close the region to be split
> > 2. Split the store files
> > 3. Open the new daughter regions (including replicas)
> >
> > When a split is done as part of a ModifyTableProcedure that adds
> > replicas, we get the following combination of behavior:
> > 1. Edit the table descriptor
> > 2. Close and open all existing regions, adding a
> > SplitTableRegionProcedure to the HMaster's queue
> > 3. To begin the SplitTableRegionProcedure, close region to be split
> > 4. Split store files
> > 5. Open the new daughter regions, including replicas, completing the
> > SplitTableRegionProcedure
> > 6. Create replication peer to support the new replica regions
> > 7. Open new replica regions, even though some were already opened in
> > step 5. Double-opening doesn't work, and the open procedures are
> > stuck.
> >
> > For some proof, I've prepared selected log lines from the incident at
> > my company that prompted this email, which are at the end of this
> > email.
> >
> > In thinking about this problem, I looked at the code that triggers
> > splits. This is in the MemStoreFlusher class, which is invoked on
> > every region close. It sometimes does a split if HRegion#checkSplit()
> > returns non-empty. That method will return empty if the region's
> > "closing" flag is set, however the order of operations of a region
> > close means that the closing flag is not yet set when checkSplit()
> > checks it. This seems like an oversight to me, and perhaps the
> > solution is as simple as moving the `closing.set(true)` call to
> > somewhere earlier in HRegion#doClose(). However, I don't understand
> > what the full implications of this would be.
> >
> > I also thought about this from another angle: there is a span of time
> > when the ModifyTableProcedure has modified the table descriptor, but
> > has not yet opened the new replicas, and during that time, it assumes
> > that no other procedures will act upon the new information in the
> > table descriptor. In my case, the SplitTableRegionProcedure violated
> > this assumption. Perhaps this assumption should be made enforceable,
> > like somehow no region operations should be permitted by non-children
> > of the ModifyTableProcedure until it has completed.
> >
> > What do you all think?
> >
> > Thank you for your time,
> > Charles
> >
> > Logs from the incident:
> > // Start of ModifyTableProcedure
> > (hmaster) 2025-05-06T17:37:07,884
> > [RpcServer.default.FPBQ.Fifo.handler=28,queue=1,port=60000] INFO
> > org.apache.hadoop.hbase.master.HMaster:
> > Client=HBase2TasksJobs-setReadReplication//172.16.81.147 modify table
> > email-objects-1 from 'email-objects-1', {...} to 'email-objects-1',
> > {... REGION_REPLICATION => '2',...}
> >
> > // Re-open b85d893815b488f1282159d89e1cc86b to pick up new table descriptor
> > (hmaster) 2025-05-06T17:37:35,717 [PEWorker-2] INFO
> > org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Initialized
> > subprocedures=[...{pid=22035700, ppid=22034599,
> > state=RUNNABLE:REGION_STATE_TRANSITION_CLOSE;
> > TransitRegionStateProcedure table=email-objects-1,
> > region=b85d893815b488f1282159d89e1cc86b, REOPEN/MOVE}...]
> >
> > // Close b85d893815b488f1282159d89e1cc86b
> > (hmaster) 2025-05-06T17:37:35,816 [PEWorker-5] INFO
> > org.apache.hadoop.hbase.master.procedure.MasterProcedureScheduler:
> > Took xlock for pid=22035700, ppid=22034599,
> > state=RUNNABLE:REGION_STATE_TRANSITION_CLOSE;
> > TransitRegionStateProcedure table=email-objects-1,
> > region=b85d893815b488f1282159d89e1cc86b, REOPEN/MOVE
> > (hmaster) 2025-05-06T17:37:35,817 [PEWorker-5] INFO
> > org.apache.hadoop.hbase.master.assignment.RegionStateStore:
> > pid=22035700 updating hbase:meta row=b85d893815b488f1282159d89e1cc86b,
> > regionState=CLOSING,
> > regionLocation=na1-fruity-sensible-bus.iad02.hubinternal.net,60020,1746489557104
> > (hmaster) 2025-05-06T17:37:35,827 [PEWorker-5] INFO
> > org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Initialized
> > subprocedures=[{pid=22035776, ppid=22035700, state=RUNNABLE;
> > CloseRegionProcedure b85d893815b488f1282159d89e1cc86b,
> > server=na1-fruity-sensible-bus.iad02.hubinternal.net,60020,1746489557104}]
> > (regionserver) 2025-05-06 17:37:35.986 UTC Close
> > b85d893815b488f1282159d89e1cc86b
> > (regionserver) 2025-05-06 17:37:35.986 UTC Running close preflush of
> > b85d893815b488f1282159d89e1cc86b
> > // At this point I suspect the RegionServer sent a split request to the 
> > HMaster
> > (regionserver) 2025-05-06 17:37:35.987 UTC Flushing
> > b85d893815b488f1282159d89e1cc86b 2/2 column families, dataSize=8.25 MB
> > heapSize=10.12 MB
> > (regionserver) 2025-05-06 17:37:36.057 UTC Flushed memstore data
> > size=7.80 MB at sequenceid=26941273 (bloomFilter=true),
> > to=hdfs://pluto-hb2-a-prod:8020/hbase/data/default/email-objects-1/b85d893815b488f1282159d89e1cc86b/.tmp/e/7ffadb785b8e4ed88567437cc7633e09
> > (regionserver) 2025-05-06 17:37:36.117 UTC Flushed memstore data
> > size=455.16 KB at sequenceid=26941273 (bloomFilter=true),
> > to=hdfs://pluto-hb2-a-prod:8020/hbase/data/default/email-objects-1/b85d893815b488f1282159d89e1cc86b/.tmp/i/ec9f55f487a443d99636f4aa6dd297d7
> > (hmaster) 2025-05-06T17:37:36,174 [PEWorker-9] INFO
> > org.apache.hadoop.hbase.master.assignment.RegionStateStore:
> > pid=22035700 updating hbase:meta row=b85d893815b488f1282159d89e1cc86b,
> > regionState=CLOSED
> > (hmaster) 2025-05-06T17:37:36,182 [PEWorker-8] INFO
> > org.apache.hadoop.hbase.master.assignment.TransitRegionStateProcedure:
> > Starting pid=22035700, ppid=22034599,
> > state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE,
> > locked=true; TransitRegionStateProcedure table=email-objects-1,
> > region=b85d893815b488f1282159d89e1cc86b, REOPEN/MOVE; state=CLOSED,
> > location=na1-fruity-sensible-bus.iad02.hubinternal.net,60020,1746489557104;
> > forceNewPlan=false, retain=true
> > (hmaster) 2025-05-06T17:37:36,182 [PEWorker-9] INFO
> > org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished
> > pid=22035776, ppid=22035700, state=SUCCESS; CloseRegionProcedure
> > b85d893815b488f1282159d89e1cc86b,
> > server=na1-fruity-sensible-bus.iad02.hubinternal.net,60020,1746489557104
> > in 357 msec
> >
> > // Open b85d893815b488f1282159d89e1cc86b with new table descriptor
> > (hmaster) 2025-05-06T17:37:36,239 [PEWorker-6] INFO
> > org.apache.hadoop.hbase.master.assignment.RegionStateStore:
> > pid=22035700 updating hbase:meta row=b85d893815b488f1282159d89e1cc86b,
> > regionState=OPENING,
> > regionLocation=na1-fruity-sensible-bus.iad02.hubinternal.net,60020,1746489557104
> > (hmaster) 2025-05-06T17:37:36,245 [PEWorker-6] INFO
> > org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Initialized
> > subprocedures=[{pid=22035800, ppid=22035700, state=RUNNABLE;
> > OpenRegionProcedure b85d893815b488f1282159d89e1cc86b,
> > server=na1-fruity-sensible-bus.iad02.hubinternal.net,60020,1746489557104}]
> > (hmaster) 2025-05-06T17:37:36,529 [PEWorker-10] INFO
> > org.apache.hadoop.hbase.master.assignment.RegionStateStore:
> > pid=22035700 updating hbase:meta row=b85d893815b488f1282159d89e1cc86b,
> > regionState=OPEN, openSeqNum=26941277,
> > regionLocation=na1-fruity-sensible-bus.iad02.hubinternal.net,60020,1746489557104
> > (hmaster) 2025-05-06T17:37:36,539 [PEWorker-10] INFO
> > org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished
> > pid=22035800, ppid=22035700, state=SUCCESS; OpenRegionProcedure
> > b85d893815b488f1282159d89e1cc86b,
> > server=na1-fruity-sensible-bus.iad02.hubinternal.net,60020,1746489557104
> > in 290 msec
> > (hmaster) 2025-05-06T17:37:36,557 [PEWorker-7] INFO
> > org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished
> > pid=22035700, ppid=22034599, state=SUCCESS;
> > TransitRegionStateProcedure table=email-objects-1,
> > region=b85d893815b488f1282159d89e1cc86b, REOPEN/MOVE in 822 msec
> >
> > // The requested split of b85d893815b488f1282159d89e1cc86b begins on
> > the HMaster, beginning with closing the region. It works fine because
> > b85d893815b488f1282159d89e1cc86b is open.
> > (hmaster) 2025-05-06T17:37:43,225 [PEWorker-1] INFO
> > org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Initialized
> > subprocedures=[{pid=22036140, ppid=22036139,
> > state=RUNNABLE:REGION_STATE_TRANSITION_CLOSE;
> > TransitRegionStateProcedure table=email-objects-1,
> > region=b85d893815b488f1282159d89e1cc86b, UNASSIGN}, {pid=22036141,
> > ppid=22036139, state=RUNNABLE:REGION_STATE_TRANSITION_CLOSE;
> > TransitRegionStateProcedure table=email-objects-1,
> > region=b47e159aa157e066841e67ca1490a55d, UNASSIGN}]
> > (hmaster) 2025-05-06T17:37:43,231 [PEWorker-6] INFO
> > org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Initialized
> > subprocedures=[{pid=22036142, ppid=22036140, state=RUNNABLE;
> > CloseRegionProcedure b85d893815b488f1282159d89e1cc86b,
> > server=na1-fruity-sensible-bus.iad02.hubinternal.net,60020,1746489557104}]
> > (hmaster) 2025-05-06T17:37:45,352 [PEWorker-1] INFO
> > org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished
> > pid=22036142, ppid=22036140, state=SUCCESS; CloseRegionProcedure
> > b85d893815b488f1282159d89e1cc86b,
> > server=na1-fruity-sensible-bus.iad02.hubinternal.net,60020,1746489557104
> > in 2.1170 sec
> > (hmaster) 2025-05-06T17:37:45,355 [PEWorker-4] INFO
> > org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished
> > pid=22036140, ppid=22036139, state=SUCCESS;
> > TransitRegionStateProcedure table=email-objects-1,
> > region=b85d893815b488f1282159d89e1cc86b, UNASSIGN in 2.1270 sec
> > (hmaster) 2025-05-06T17:37:45,389 [PEWorker-3] INFO
> > org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure:
> > pid=22036139 splitting 7 storefiles,
> > region=b85d893815b488f1282159d89e1cc86b, threads=7
> >
> > // The region is now split.
> > // Open split daughters 942ae945978493ac3304a6eeb286a110 and
> > 2e97f125b567c33626891076fdb4f500, which is okay
> > // and also open daughter replicas f2edc050a8ea03ac9bef9554425b33e7
> > and f48f8695abfe36eba7211a8080808c65, which is not okay
> > (hmaster) 2025-05-06T17:37:46,515 [PEWorker-3] INFO
> > org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Initialized
> > subprocedures=[{pid=22036145, ppid=22036139,
> > state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE;
> > TransitRegionStateProcedure table=email-objects-1,
> > region=942ae945978493ac3304a6eeb286a110, ASSIGN}, {pid=22036146,
> > ppid=22036139, state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE;
> > TransitRegionStateProcedure table=email-objects-1,
> > region=2e97f125b567c33626891076fdb4f500, ASSIGN}, {pid=22036147,
> > ppid=22036139, state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE;
> > TransitRegionStateProcedure table=email-objects-1,
> > region=f2edc050a8ea03ac9bef9554425b33e7, ASSIGN}, {pid=22036148,
> > ppid=22036139, state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE;
> > TransitRegionStateProcedure table=email-objects-1,
> > region=f48f8695abfe36eba7211a8080808c65, ASSIGN}]
> > ...
> > (hmaster) 2025-05-06T17:37:46,960 [PEWorker-5] INFO
> > org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished
> > pid=22036145, ppid=22036139, state=SUCCESS;
> > TransitRegionStateProcedure table=email-objects-1,
> > region=942ae945978493ac3304a6eeb286a110, ASSIGN in 443 msec
> > (hmaster) 2025-05-06T17:37:46,961 [PEWorker-8] INFO
> > org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished
> > pid=22036146, ppid=22036139, state=SUCCESS;
> > TransitRegionStateProcedure table=email-objects-1,
> > region=2e97f125b567c33626891076fdb4f500, ASSIGN in 444 msec
> > (hmaster) 2025-05-06T17:37:47,027 [PEWorker-12] INFO
> > org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished
> > pid=22036147, ppid=22036139, state=SUCCESS;
> > TransitRegionStateProcedure table=email-objects-1,
> > region=f2edc050a8ea03ac9bef9554425b33e7, ASSIGN in 510 msec
> > (hmaster) 2025-05-06T17:37:47,101 [PEWorker-10] INFO
> > org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished
> > pid=22036148, ppid=22036139, state=SUCCESS;
> > TransitRegionStateProcedure table=email-objects-1,
> > region=f48f8695abfe36eba7211a8080808c65, ASSIGN in 582 msec
> > (hmaster) 2025-05-06T17:37:47,109 [PEWorker-11] INFO
> > org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished
> > pid=22036139, state=SUCCESS; SplitTableRegionProcedure
> > table=email-objects-1, parent=b85d893815b488f1282159d89e1cc86b,
> > daughterA=942ae945978493ac3304a6eeb286a110,
> > daughterB=2e97f125b567c33626891076fdb4f500 in 3.8930 sec
> >
> > // Finish reopening all regions, moving on with ModifyTableProcedure
> > (hmaster) 2025-05-06T17:46:22,158 [PEWorker-4] INFO
> > org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished
> > pid=22034599, ppid=22034598, state=SUCCESS;
> > ReopenTableRegionsProcedure table=email-objects-1 in 9 mins, 14.178
> > sec
> > (hmaster) 2025-05-06T17:46:22,242 [PEWorker-2] INFO
> > org.apache.hadoop.hbase.util.ServerRegionReplicaUtil: Region replica
> > replication peer id=region_replica_replication not exist. Creating...
> > (hmaster) 2025-05-06T17:46:22,246 [PEWorker-2] INFO
> > org.apache.hadoop.hbase.master.HMaster: Client=null/null creating
> > replication peer, id=region_replica_replication,
> > config=clusterKey=zk-hbase-pluto-hb2-a-prod-lb.zk.svc.iad02.k8s.run:2181:2181:/hbase,replicationEndpointImpl=org.apache.hadoop.hbase.replication.regionserver.RegionReplicaReplicationEndpoint,replicateAllUserTables=true,bandwidth=0,serial=false,
> > state=ENABLED
> > ...
> > (hmaster) 2025-05-06T17:46:23,562 [PEWorker-4] INFO
> > org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished
> > pid=22060604, ppid=22060356, state=SUCCESS;
> > org.apache.hadoop.hbase.master.replication.RefreshPeerProcedure in
> > 1.2600 sec
> > (hmaster) 2025-05-06T17:46:23,562 [PEWorker-3] INFO
> > org.apache.hadoop.hbase.master.replication.AddPeerProcedure:
> > Successfully added ENABLED peer region_replica_replication, config
> > clusterKey=zk-hbase-pluto-hb2-a-prod-lb.zk.svc.iad02.k8s.run:2181:2181:/hbase,replicationEndpointImpl=org.apache.hadoop.hbase.replication.regionserver.RegionReplicaReplicationEndpoint,replicateAllUserTables=true,bandwidth=0,serial=false
> > (hmaster) 2025-05-06T17:46:23,565 [PEWorker-3] INFO
> > org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished
> > pid=22060356, state=SUCCESS;
> > org.apache.hadoop.hbase.master.replication.AddPeerProcedure in 1.3140
> > sec
> >
> > // Open all region replicas, including
> > f2edc050a8ea03ac9bef9554425b33e7 and f48f8695abfe36eba7211a8080808c65
> > (hmaster) 2025-05-06T17:46:23,571 [PEWorker-2] INFO
> > org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Initialized
> > subprocedures=[...{pid=22060998, ppid=22034598,
> > state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE;
> > TransitRegionStateProcedure table=email-objects-1,
> > region=f2edc050a8ea03ac9bef9554425b33e7, ASSIGN}, {pid=22060999,
> > ppid=22034598, state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE;
> > TransitRegionStateProcedure table=email-objects-1,
> > region=f48f8695abfe36eba7211a8080808c65, ASSIGN}...]
> >
> > // Some opens fail because they are already open
> > (hmaster) 2025-05-06T17:47:32,989 [ProcExecTimeout] WARN
> > org.apache.hadoop.hbase.master.assignment.AssignmentManager: STUCK
> > Region-In-Transition state=OPENING,
> > location=na1-rough-still-rose.iad02.hubinternal.net,60020,1745779142214,
> > table=email-objects-1, region=f2edc050a8ea03ac9bef9554425b33e7
> > (hmaster) 2025-05-06T17:48:32,993 [ProcExecTimeout] WARN
> > org.apache.hadoop.hbase.master.assignment.AssignmentManager: STUCK
> > Region-In-Transition state=OPENING,
> > location=na1-just-fast-magpie.iad02.hubinternal.net,60020,1745753381085,
> > table=email-objects-1, region=f48f8695abfe36eba7211a8080808c65
> >

Reply via email to