Ah, yes, this is mainly for availability, since region assignment
needs to hold the shared lock on table.

And I can recall that in the past we also have questions about what
happens when two ModifyTableProcedures are executed at the same time.

Maybe we should reconsider the locking way for ModifyTableProcedure.
Or for split and merge procedures, we should add a check in the
prepare step to make sure that there is no ModifyTableProcedure for
the same table.

Thanks.

Charles Connell <cconn...@hubspot.com.invalid> 于2025年5月15日周四 09:18写道:
>
> ModifyTableProcedure does not override Procedure#holdLock(), so its
> lock on the table is released every time it yields. Only a few
> procedures hold their locks for their whole execution.
>
> On Wed, May 14, 2025 at 12:20 AM 张铎(Duo Zhang) <palomino...@gmail.com> wrote:
> >
> > Haven't looked at the details yet, but we have table/region lock for
> > the procedure system, which should prevent ModifyTableProcedure and
> > SplitTableRegionProcedure executed at the same time.
> >
> > Not sure what breaks the assumption.
> >
> > Charles Connell <cconn...@hubspot.com.invalid> 于2025年5月14日周三 06:39写道:
> > >
> > > Hi folks,
> > >
> > > I recently uncovered a problem in the ModifyTableProcedure that I want
> > > to discuss with the community. I altered a table to increase the
> > > region replica count from 1 to 2. This led to some of the secondary
> > > replica regions being stuck opening, which was difficult to remediate,
> > > although luckily did not impact database users. I believe that the
> > > reason is because of an unfortunate interaction between the
> > > ModifyTableProcedure and the SplitTableRegionProcedure.
> > >
> > > When adding new replicas, ModifyTableProcedure roughly does the following:
> > > 1. Edits the table descriptor
> > > 2. Closes and opens all existing regions
> > > 3. Creates replication peer to support the new replica regions
> > > 4. Opens new replica regions
> > >
> > > When closing a region in step 2, the act of closing it can trigger a
> > > split. When the HMaster runs the SplitTableRegionProcedure, it roughly
> > > does:
> > > 1. Close the region to be split
> > > 2. Split the store files
> > > 3. Open the new daughter regions (including replicas)
> > >
> > > When a split is done as part of a ModifyTableProcedure that adds
> > > replicas, we get the following combination of behavior:
> > > 1. Edit the table descriptor
> > > 2. Close and open all existing regions, adding a
> > > SplitTableRegionProcedure to the HMaster's queue
> > > 3. To begin the SplitTableRegionProcedure, close region to be split
> > > 4. Split store files
> > > 5. Open the new daughter regions, including replicas, completing the
> > > SplitTableRegionProcedure
> > > 6. Create replication peer to support the new replica regions
> > > 7. Open new replica regions, even though some were already opened in
> > > step 5. Double-opening doesn't work, and the open procedures are
> > > stuck.
> > >
> > > For some proof, I've prepared selected log lines from the incident at
> > > my company that prompted this email, which are at the end of this
> > > email.
> > >
> > > In thinking about this problem, I looked at the code that triggers
> > > splits. This is in the MemStoreFlusher class, which is invoked on
> > > every region close. It sometimes does a split if HRegion#checkSplit()
> > > returns non-empty. That method will return empty if the region's
> > > "closing" flag is set, however the order of operations of a region
> > > close means that the closing flag is not yet set when checkSplit()
> > > checks it. This seems like an oversight to me, and perhaps the
> > > solution is as simple as moving the `closing.set(true)` call to
> > > somewhere earlier in HRegion#doClose(). However, I don't understand
> > > what the full implications of this would be.
> > >
> > > I also thought about this from another angle: there is a span of time
> > > when the ModifyTableProcedure has modified the table descriptor, but
> > > has not yet opened the new replicas, and during that time, it assumes
> > > that no other procedures will act upon the new information in the
> > > table descriptor. In my case, the SplitTableRegionProcedure violated
> > > this assumption. Perhaps this assumption should be made enforceable,
> > > like somehow no region operations should be permitted by non-children
> > > of the ModifyTableProcedure until it has completed.
> > >
> > > What do you all think?
> > >
> > > Thank you for your time,
> > > Charles
> > >
> > > Logs from the incident:
> > > // Start of ModifyTableProcedure
> > > (hmaster) 2025-05-06T17:37:07,884
> > > [RpcServer.default.FPBQ.Fifo.handler=28,queue=1,port=60000] INFO
> > > org.apache.hadoop.hbase.master.HMaster:
> > > Client=HBase2TasksJobs-setReadReplication//172.16.81.147 modify table
> > > email-objects-1 from 'email-objects-1', {...} to 'email-objects-1',
> > > {... REGION_REPLICATION => '2',...}
> > >
> > > // Re-open b85d893815b488f1282159d89e1cc86b to pick up new table 
> > > descriptor
> > > (hmaster) 2025-05-06T17:37:35,717 [PEWorker-2] INFO
> > > org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Initialized
> > > subprocedures=[...{pid=22035700, ppid=22034599,
> > > state=RUNNABLE:REGION_STATE_TRANSITION_CLOSE;
> > > TransitRegionStateProcedure table=email-objects-1,
> > > region=b85d893815b488f1282159d89e1cc86b, REOPEN/MOVE}...]
> > >
> > > // Close b85d893815b488f1282159d89e1cc86b
> > > (hmaster) 2025-05-06T17:37:35,816 [PEWorker-5] INFO
> > > org.apache.hadoop.hbase.master.procedure.MasterProcedureScheduler:
> > > Took xlock for pid=22035700, ppid=22034599,
> > > state=RUNNABLE:REGION_STATE_TRANSITION_CLOSE;
> > > TransitRegionStateProcedure table=email-objects-1,
> > > region=b85d893815b488f1282159d89e1cc86b, REOPEN/MOVE
> > > (hmaster) 2025-05-06T17:37:35,817 [PEWorker-5] INFO
> > > org.apache.hadoop.hbase.master.assignment.RegionStateStore:
> > > pid=22035700 updating hbase:meta row=b85d893815b488f1282159d89e1cc86b,
> > > regionState=CLOSING,
> > > regionLocation=na1-fruity-sensible-bus.iad02.hubinternal.net,60020,1746489557104
> > > (hmaster) 2025-05-06T17:37:35,827 [PEWorker-5] INFO
> > > org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Initialized
> > > subprocedures=[{pid=22035776, ppid=22035700, state=RUNNABLE;
> > > CloseRegionProcedure b85d893815b488f1282159d89e1cc86b,
> > > server=na1-fruity-sensible-bus.iad02.hubinternal.net,60020,1746489557104}]
> > > (regionserver) 2025-05-06 17:37:35.986 UTC Close
> > > b85d893815b488f1282159d89e1cc86b
> > > (regionserver) 2025-05-06 17:37:35.986 UTC Running close preflush of
> > > b85d893815b488f1282159d89e1cc86b
> > > // At this point I suspect the RegionServer sent a split request to the 
> > > HMaster
> > > (regionserver) 2025-05-06 17:37:35.987 UTC Flushing
> > > b85d893815b488f1282159d89e1cc86b 2/2 column families, dataSize=8.25 MB
> > > heapSize=10.12 MB
> > > (regionserver) 2025-05-06 17:37:36.057 UTC Flushed memstore data
> > > size=7.80 MB at sequenceid=26941273 (bloomFilter=true),
> > > to=hdfs://pluto-hb2-a-prod:8020/hbase/data/default/email-objects-1/b85d893815b488f1282159d89e1cc86b/.tmp/e/7ffadb785b8e4ed88567437cc7633e09
> > > (regionserver) 2025-05-06 17:37:36.117 UTC Flushed memstore data
> > > size=455.16 KB at sequenceid=26941273 (bloomFilter=true),
> > > to=hdfs://pluto-hb2-a-prod:8020/hbase/data/default/email-objects-1/b85d893815b488f1282159d89e1cc86b/.tmp/i/ec9f55f487a443d99636f4aa6dd297d7
> > > (hmaster) 2025-05-06T17:37:36,174 [PEWorker-9] INFO
> > > org.apache.hadoop.hbase.master.assignment.RegionStateStore:
> > > pid=22035700 updating hbase:meta row=b85d893815b488f1282159d89e1cc86b,
> > > regionState=CLOSED
> > > (hmaster) 2025-05-06T17:37:36,182 [PEWorker-8] INFO
> > > org.apache.hadoop.hbase.master.assignment.TransitRegionStateProcedure:
> > > Starting pid=22035700, ppid=22034599,
> > > state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE,
> > > locked=true; TransitRegionStateProcedure table=email-objects-1,
> > > region=b85d893815b488f1282159d89e1cc86b, REOPEN/MOVE; state=CLOSED,
> > > location=na1-fruity-sensible-bus.iad02.hubinternal.net,60020,1746489557104;
> > > forceNewPlan=false, retain=true
> > > (hmaster) 2025-05-06T17:37:36,182 [PEWorker-9] INFO
> > > org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished
> > > pid=22035776, ppid=22035700, state=SUCCESS; CloseRegionProcedure
> > > b85d893815b488f1282159d89e1cc86b,
> > > server=na1-fruity-sensible-bus.iad02.hubinternal.net,60020,1746489557104
> > > in 357 msec
> > >
> > > // Open b85d893815b488f1282159d89e1cc86b with new table descriptor
> > > (hmaster) 2025-05-06T17:37:36,239 [PEWorker-6] INFO
> > > org.apache.hadoop.hbase.master.assignment.RegionStateStore:
> > > pid=22035700 updating hbase:meta row=b85d893815b488f1282159d89e1cc86b,
> > > regionState=OPENING,
> > > regionLocation=na1-fruity-sensible-bus.iad02.hubinternal.net,60020,1746489557104
> > > (hmaster) 2025-05-06T17:37:36,245 [PEWorker-6] INFO
> > > org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Initialized
> > > subprocedures=[{pid=22035800, ppid=22035700, state=RUNNABLE;
> > > OpenRegionProcedure b85d893815b488f1282159d89e1cc86b,
> > > server=na1-fruity-sensible-bus.iad02.hubinternal.net,60020,1746489557104}]
> > > (hmaster) 2025-05-06T17:37:36,529 [PEWorker-10] INFO
> > > org.apache.hadoop.hbase.master.assignment.RegionStateStore:
> > > pid=22035700 updating hbase:meta row=b85d893815b488f1282159d89e1cc86b,
> > > regionState=OPEN, openSeqNum=26941277,
> > > regionLocation=na1-fruity-sensible-bus.iad02.hubinternal.net,60020,1746489557104
> > > (hmaster) 2025-05-06T17:37:36,539 [PEWorker-10] INFO
> > > org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished
> > > pid=22035800, ppid=22035700, state=SUCCESS; OpenRegionProcedure
> > > b85d893815b488f1282159d89e1cc86b,
> > > server=na1-fruity-sensible-bus.iad02.hubinternal.net,60020,1746489557104
> > > in 290 msec
> > > (hmaster) 2025-05-06T17:37:36,557 [PEWorker-7] INFO
> > > org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished
> > > pid=22035700, ppid=22034599, state=SUCCESS;
> > > TransitRegionStateProcedure table=email-objects-1,
> > > region=b85d893815b488f1282159d89e1cc86b, REOPEN/MOVE in 822 msec
> > >
> > > // The requested split of b85d893815b488f1282159d89e1cc86b begins on
> > > the HMaster, beginning with closing the region. It works fine because
> > > b85d893815b488f1282159d89e1cc86b is open.
> > > (hmaster) 2025-05-06T17:37:43,225 [PEWorker-1] INFO
> > > org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Initialized
> > > subprocedures=[{pid=22036140, ppid=22036139,
> > > state=RUNNABLE:REGION_STATE_TRANSITION_CLOSE;
> > > TransitRegionStateProcedure table=email-objects-1,
> > > region=b85d893815b488f1282159d89e1cc86b, UNASSIGN}, {pid=22036141,
> > > ppid=22036139, state=RUNNABLE:REGION_STATE_TRANSITION_CLOSE;
> > > TransitRegionStateProcedure table=email-objects-1,
> > > region=b47e159aa157e066841e67ca1490a55d, UNASSIGN}]
> > > (hmaster) 2025-05-06T17:37:43,231 [PEWorker-6] INFO
> > > org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Initialized
> > > subprocedures=[{pid=22036142, ppid=22036140, state=RUNNABLE;
> > > CloseRegionProcedure b85d893815b488f1282159d89e1cc86b,
> > > server=na1-fruity-sensible-bus.iad02.hubinternal.net,60020,1746489557104}]
> > > (hmaster) 2025-05-06T17:37:45,352 [PEWorker-1] INFO
> > > org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished
> > > pid=22036142, ppid=22036140, state=SUCCESS; CloseRegionProcedure
> > > b85d893815b488f1282159d89e1cc86b,
> > > server=na1-fruity-sensible-bus.iad02.hubinternal.net,60020,1746489557104
> > > in 2.1170 sec
> > > (hmaster) 2025-05-06T17:37:45,355 [PEWorker-4] INFO
> > > org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished
> > > pid=22036140, ppid=22036139, state=SUCCESS;
> > > TransitRegionStateProcedure table=email-objects-1,
> > > region=b85d893815b488f1282159d89e1cc86b, UNASSIGN in 2.1270 sec
> > > (hmaster) 2025-05-06T17:37:45,389 [PEWorker-3] INFO
> > > org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure:
> > > pid=22036139 splitting 7 storefiles,
> > > region=b85d893815b488f1282159d89e1cc86b, threads=7
> > >
> > > // The region is now split.
> > > // Open split daughters 942ae945978493ac3304a6eeb286a110 and
> > > 2e97f125b567c33626891076fdb4f500, which is okay
> > > // and also open daughter replicas f2edc050a8ea03ac9bef9554425b33e7
> > > and f48f8695abfe36eba7211a8080808c65, which is not okay
> > > (hmaster) 2025-05-06T17:37:46,515 [PEWorker-3] INFO
> > > org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Initialized
> > > subprocedures=[{pid=22036145, ppid=22036139,
> > > state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE;
> > > TransitRegionStateProcedure table=email-objects-1,
> > > region=942ae945978493ac3304a6eeb286a110, ASSIGN}, {pid=22036146,
> > > ppid=22036139, 
> > > state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE;
> > > TransitRegionStateProcedure table=email-objects-1,
> > > region=2e97f125b567c33626891076fdb4f500, ASSIGN}, {pid=22036147,
> > > ppid=22036139, 
> > > state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE;
> > > TransitRegionStateProcedure table=email-objects-1,
> > > region=f2edc050a8ea03ac9bef9554425b33e7, ASSIGN}, {pid=22036148,
> > > ppid=22036139, 
> > > state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE;
> > > TransitRegionStateProcedure table=email-objects-1,
> > > region=f48f8695abfe36eba7211a8080808c65, ASSIGN}]
> > > ...
> > > (hmaster) 2025-05-06T17:37:46,960 [PEWorker-5] INFO
> > > org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished
> > > pid=22036145, ppid=22036139, state=SUCCESS;
> > > TransitRegionStateProcedure table=email-objects-1,
> > > region=942ae945978493ac3304a6eeb286a110, ASSIGN in 443 msec
> > > (hmaster) 2025-05-06T17:37:46,961 [PEWorker-8] INFO
> > > org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished
> > > pid=22036146, ppid=22036139, state=SUCCESS;
> > > TransitRegionStateProcedure table=email-objects-1,
> > > region=2e97f125b567c33626891076fdb4f500, ASSIGN in 444 msec
> > > (hmaster) 2025-05-06T17:37:47,027 [PEWorker-12] INFO
> > > org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished
> > > pid=22036147, ppid=22036139, state=SUCCESS;
> > > TransitRegionStateProcedure table=email-objects-1,
> > > region=f2edc050a8ea03ac9bef9554425b33e7, ASSIGN in 510 msec
> > > (hmaster) 2025-05-06T17:37:47,101 [PEWorker-10] INFO
> > > org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished
> > > pid=22036148, ppid=22036139, state=SUCCESS;
> > > TransitRegionStateProcedure table=email-objects-1,
> > > region=f48f8695abfe36eba7211a8080808c65, ASSIGN in 582 msec
> > > (hmaster) 2025-05-06T17:37:47,109 [PEWorker-11] INFO
> > > org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished
> > > pid=22036139, state=SUCCESS; SplitTableRegionProcedure
> > > table=email-objects-1, parent=b85d893815b488f1282159d89e1cc86b,
> > > daughterA=942ae945978493ac3304a6eeb286a110,
> > > daughterB=2e97f125b567c33626891076fdb4f500 in 3.8930 sec
> > >
> > > // Finish reopening all regions, moving on with ModifyTableProcedure
> > > (hmaster) 2025-05-06T17:46:22,158 [PEWorker-4] INFO
> > > org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished
> > > pid=22034599, ppid=22034598, state=SUCCESS;
> > > ReopenTableRegionsProcedure table=email-objects-1 in 9 mins, 14.178
> > > sec
> > > (hmaster) 2025-05-06T17:46:22,242 [PEWorker-2] INFO
> > > org.apache.hadoop.hbase.util.ServerRegionReplicaUtil: Region replica
> > > replication peer id=region_replica_replication not exist. Creating...
> > > (hmaster) 2025-05-06T17:46:22,246 [PEWorker-2] INFO
> > > org.apache.hadoop.hbase.master.HMaster: Client=null/null creating
> > > replication peer, id=region_replica_replication,
> > > config=clusterKey=zk-hbase-pluto-hb2-a-prod-lb.zk.svc.iad02.k8s.run:2181:2181:/hbase,replicationEndpointImpl=org.apache.hadoop.hbase.replication.regionserver.RegionReplicaReplicationEndpoint,replicateAllUserTables=true,bandwidth=0,serial=false,
> > > state=ENABLED
> > > ...
> > > (hmaster) 2025-05-06T17:46:23,562 [PEWorker-4] INFO
> > > org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished
> > > pid=22060604, ppid=22060356, state=SUCCESS;
> > > org.apache.hadoop.hbase.master.replication.RefreshPeerProcedure in
> > > 1.2600 sec
> > > (hmaster) 2025-05-06T17:46:23,562 [PEWorker-3] INFO
> > > org.apache.hadoop.hbase.master.replication.AddPeerProcedure:
> > > Successfully added ENABLED peer region_replica_replication, config
> > > clusterKey=zk-hbase-pluto-hb2-a-prod-lb.zk.svc.iad02.k8s.run:2181:2181:/hbase,replicationEndpointImpl=org.apache.hadoop.hbase.replication.regionserver.RegionReplicaReplicationEndpoint,replicateAllUserTables=true,bandwidth=0,serial=false
> > > (hmaster) 2025-05-06T17:46:23,565 [PEWorker-3] INFO
> > > org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished
> > > pid=22060356, state=SUCCESS;
> > > org.apache.hadoop.hbase.master.replication.AddPeerProcedure in 1.3140
> > > sec
> > >
> > > // Open all region replicas, including
> > > f2edc050a8ea03ac9bef9554425b33e7 and f48f8695abfe36eba7211a8080808c65
> > > (hmaster) 2025-05-06T17:46:23,571 [PEWorker-2] INFO
> > > org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Initialized
> > > subprocedures=[...{pid=22060998, ppid=22034598,
> > > state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE;
> > > TransitRegionStateProcedure table=email-objects-1,
> > > region=f2edc050a8ea03ac9bef9554425b33e7, ASSIGN}, {pid=22060999,
> > > ppid=22034598, 
> > > state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE;
> > > TransitRegionStateProcedure table=email-objects-1,
> > > region=f48f8695abfe36eba7211a8080808c65, ASSIGN}...]
> > >
> > > // Some opens fail because they are already open
> > > (hmaster) 2025-05-06T17:47:32,989 [ProcExecTimeout] WARN
> > > org.apache.hadoop.hbase.master.assignment.AssignmentManager: STUCK
> > > Region-In-Transition state=OPENING,
> > > location=na1-rough-still-rose.iad02.hubinternal.net,60020,1745779142214,
> > > table=email-objects-1, region=f2edc050a8ea03ac9bef9554425b33e7
> > > (hmaster) 2025-05-06T17:48:32,993 [ProcExecTimeout] WARN
> > > org.apache.hadoop.hbase.master.assignment.AssignmentManager: STUCK
> > > Region-In-Transition state=OPENING,
> > > location=na1-just-fast-magpie.iad02.hubinternal.net,60020,1745753381085,
> > > table=email-objects-1, region=f48f8695abfe36eba7211a8080808c65
> > >
>

Reply via email to