Haven't looked at the details yet, but we have table/region lock for
the procedure system, which should prevent ModifyTableProcedure and
SplitTableRegionProcedure executed at the same time.

Not sure what breaks the assumption.

Charles Connell <cconn...@hubspot.com.invalid> 于2025年5月14日周三 06:39写道:
>
> Hi folks,
>
> I recently uncovered a problem in the ModifyTableProcedure that I want
> to discuss with the community. I altered a table to increase the
> region replica count from 1 to 2. This led to some of the secondary
> replica regions being stuck opening, which was difficult to remediate,
> although luckily did not impact database users. I believe that the
> reason is because of an unfortunate interaction between the
> ModifyTableProcedure and the SplitTableRegionProcedure.
>
> When adding new replicas, ModifyTableProcedure roughly does the following:
> 1. Edits the table descriptor
> 2. Closes and opens all existing regions
> 3. Creates replication peer to support the new replica regions
> 4. Opens new replica regions
>
> When closing a region in step 2, the act of closing it can trigger a
> split. When the HMaster runs the SplitTableRegionProcedure, it roughly
> does:
> 1. Close the region to be split
> 2. Split the store files
> 3. Open the new daughter regions (including replicas)
>
> When a split is done as part of a ModifyTableProcedure that adds
> replicas, we get the following combination of behavior:
> 1. Edit the table descriptor
> 2. Close and open all existing regions, adding a
> SplitTableRegionProcedure to the HMaster's queue
> 3. To begin the SplitTableRegionProcedure, close region to be split
> 4. Split store files
> 5. Open the new daughter regions, including replicas, completing the
> SplitTableRegionProcedure
> 6. Create replication peer to support the new replica regions
> 7. Open new replica regions, even though some were already opened in
> step 5. Double-opening doesn't work, and the open procedures are
> stuck.
>
> For some proof, I've prepared selected log lines from the incident at
> my company that prompted this email, which are at the end of this
> email.
>
> In thinking about this problem, I looked at the code that triggers
> splits. This is in the MemStoreFlusher class, which is invoked on
> every region close. It sometimes does a split if HRegion#checkSplit()
> returns non-empty. That method will return empty if the region's
> "closing" flag is set, however the order of operations of a region
> close means that the closing flag is not yet set when checkSplit()
> checks it. This seems like an oversight to me, and perhaps the
> solution is as simple as moving the `closing.set(true)` call to
> somewhere earlier in HRegion#doClose(). However, I don't understand
> what the full implications of this would be.
>
> I also thought about this from another angle: there is a span of time
> when the ModifyTableProcedure has modified the table descriptor, but
> has not yet opened the new replicas, and during that time, it assumes
> that no other procedures will act upon the new information in the
> table descriptor. In my case, the SplitTableRegionProcedure violated
> this assumption. Perhaps this assumption should be made enforceable,
> like somehow no region operations should be permitted by non-children
> of the ModifyTableProcedure until it has completed.
>
> What do you all think?
>
> Thank you for your time,
> Charles
>
> Logs from the incident:
> // Start of ModifyTableProcedure
> (hmaster) 2025-05-06T17:37:07,884
> [RpcServer.default.FPBQ.Fifo.handler=28,queue=1,port=60000] INFO
> org.apache.hadoop.hbase.master.HMaster:
> Client=HBase2TasksJobs-setReadReplication//172.16.81.147 modify table
> email-objects-1 from 'email-objects-1', {...} to 'email-objects-1',
> {... REGION_REPLICATION => '2',...}
>
> // Re-open b85d893815b488f1282159d89e1cc86b to pick up new table descriptor
> (hmaster) 2025-05-06T17:37:35,717 [PEWorker-2] INFO
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Initialized
> subprocedures=[...{pid=22035700, ppid=22034599,
> state=RUNNABLE:REGION_STATE_TRANSITION_CLOSE;
> TransitRegionStateProcedure table=email-objects-1,
> region=b85d893815b488f1282159d89e1cc86b, REOPEN/MOVE}...]
>
> // Close b85d893815b488f1282159d89e1cc86b
> (hmaster) 2025-05-06T17:37:35,816 [PEWorker-5] INFO
> org.apache.hadoop.hbase.master.procedure.MasterProcedureScheduler:
> Took xlock for pid=22035700, ppid=22034599,
> state=RUNNABLE:REGION_STATE_TRANSITION_CLOSE;
> TransitRegionStateProcedure table=email-objects-1,
> region=b85d893815b488f1282159d89e1cc86b, REOPEN/MOVE
> (hmaster) 2025-05-06T17:37:35,817 [PEWorker-5] INFO
> org.apache.hadoop.hbase.master.assignment.RegionStateStore:
> pid=22035700 updating hbase:meta row=b85d893815b488f1282159d89e1cc86b,
> regionState=CLOSING,
> regionLocation=na1-fruity-sensible-bus.iad02.hubinternal.net,60020,1746489557104
> (hmaster) 2025-05-06T17:37:35,827 [PEWorker-5] INFO
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Initialized
> subprocedures=[{pid=22035776, ppid=22035700, state=RUNNABLE;
> CloseRegionProcedure b85d893815b488f1282159d89e1cc86b,
> server=na1-fruity-sensible-bus.iad02.hubinternal.net,60020,1746489557104}]
> (regionserver) 2025-05-06 17:37:35.986 UTC Close
> b85d893815b488f1282159d89e1cc86b
> (regionserver) 2025-05-06 17:37:35.986 UTC Running close preflush of
> b85d893815b488f1282159d89e1cc86b
> // At this point I suspect the RegionServer sent a split request to the 
> HMaster
> (regionserver) 2025-05-06 17:37:35.987 UTC Flushing
> b85d893815b488f1282159d89e1cc86b 2/2 column families, dataSize=8.25 MB
> heapSize=10.12 MB
> (regionserver) 2025-05-06 17:37:36.057 UTC Flushed memstore data
> size=7.80 MB at sequenceid=26941273 (bloomFilter=true),
> to=hdfs://pluto-hb2-a-prod:8020/hbase/data/default/email-objects-1/b85d893815b488f1282159d89e1cc86b/.tmp/e/7ffadb785b8e4ed88567437cc7633e09
> (regionserver) 2025-05-06 17:37:36.117 UTC Flushed memstore data
> size=455.16 KB at sequenceid=26941273 (bloomFilter=true),
> to=hdfs://pluto-hb2-a-prod:8020/hbase/data/default/email-objects-1/b85d893815b488f1282159d89e1cc86b/.tmp/i/ec9f55f487a443d99636f4aa6dd297d7
> (hmaster) 2025-05-06T17:37:36,174 [PEWorker-9] INFO
> org.apache.hadoop.hbase.master.assignment.RegionStateStore:
> pid=22035700 updating hbase:meta row=b85d893815b488f1282159d89e1cc86b,
> regionState=CLOSED
> (hmaster) 2025-05-06T17:37:36,182 [PEWorker-8] INFO
> org.apache.hadoop.hbase.master.assignment.TransitRegionStateProcedure:
> Starting pid=22035700, ppid=22034599,
> state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE,
> locked=true; TransitRegionStateProcedure table=email-objects-1,
> region=b85d893815b488f1282159d89e1cc86b, REOPEN/MOVE; state=CLOSED,
> location=na1-fruity-sensible-bus.iad02.hubinternal.net,60020,1746489557104;
> forceNewPlan=false, retain=true
> (hmaster) 2025-05-06T17:37:36,182 [PEWorker-9] INFO
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished
> pid=22035776, ppid=22035700, state=SUCCESS; CloseRegionProcedure
> b85d893815b488f1282159d89e1cc86b,
> server=na1-fruity-sensible-bus.iad02.hubinternal.net,60020,1746489557104
> in 357 msec
>
> // Open b85d893815b488f1282159d89e1cc86b with new table descriptor
> (hmaster) 2025-05-06T17:37:36,239 [PEWorker-6] INFO
> org.apache.hadoop.hbase.master.assignment.RegionStateStore:
> pid=22035700 updating hbase:meta row=b85d893815b488f1282159d89e1cc86b,
> regionState=OPENING,
> regionLocation=na1-fruity-sensible-bus.iad02.hubinternal.net,60020,1746489557104
> (hmaster) 2025-05-06T17:37:36,245 [PEWorker-6] INFO
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Initialized
> subprocedures=[{pid=22035800, ppid=22035700, state=RUNNABLE;
> OpenRegionProcedure b85d893815b488f1282159d89e1cc86b,
> server=na1-fruity-sensible-bus.iad02.hubinternal.net,60020,1746489557104}]
> (hmaster) 2025-05-06T17:37:36,529 [PEWorker-10] INFO
> org.apache.hadoop.hbase.master.assignment.RegionStateStore:
> pid=22035700 updating hbase:meta row=b85d893815b488f1282159d89e1cc86b,
> regionState=OPEN, openSeqNum=26941277,
> regionLocation=na1-fruity-sensible-bus.iad02.hubinternal.net,60020,1746489557104
> (hmaster) 2025-05-06T17:37:36,539 [PEWorker-10] INFO
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished
> pid=22035800, ppid=22035700, state=SUCCESS; OpenRegionProcedure
> b85d893815b488f1282159d89e1cc86b,
> server=na1-fruity-sensible-bus.iad02.hubinternal.net,60020,1746489557104
> in 290 msec
> (hmaster) 2025-05-06T17:37:36,557 [PEWorker-7] INFO
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished
> pid=22035700, ppid=22034599, state=SUCCESS;
> TransitRegionStateProcedure table=email-objects-1,
> region=b85d893815b488f1282159d89e1cc86b, REOPEN/MOVE in 822 msec
>
> // The requested split of b85d893815b488f1282159d89e1cc86b begins on
> the HMaster, beginning with closing the region. It works fine because
> b85d893815b488f1282159d89e1cc86b is open.
> (hmaster) 2025-05-06T17:37:43,225 [PEWorker-1] INFO
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Initialized
> subprocedures=[{pid=22036140, ppid=22036139,
> state=RUNNABLE:REGION_STATE_TRANSITION_CLOSE;
> TransitRegionStateProcedure table=email-objects-1,
> region=b85d893815b488f1282159d89e1cc86b, UNASSIGN}, {pid=22036141,
> ppid=22036139, state=RUNNABLE:REGION_STATE_TRANSITION_CLOSE;
> TransitRegionStateProcedure table=email-objects-1,
> region=b47e159aa157e066841e67ca1490a55d, UNASSIGN}]
> (hmaster) 2025-05-06T17:37:43,231 [PEWorker-6] INFO
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Initialized
> subprocedures=[{pid=22036142, ppid=22036140, state=RUNNABLE;
> CloseRegionProcedure b85d893815b488f1282159d89e1cc86b,
> server=na1-fruity-sensible-bus.iad02.hubinternal.net,60020,1746489557104}]
> (hmaster) 2025-05-06T17:37:45,352 [PEWorker-1] INFO
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished
> pid=22036142, ppid=22036140, state=SUCCESS; CloseRegionProcedure
> b85d893815b488f1282159d89e1cc86b,
> server=na1-fruity-sensible-bus.iad02.hubinternal.net,60020,1746489557104
> in 2.1170 sec
> (hmaster) 2025-05-06T17:37:45,355 [PEWorker-4] INFO
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished
> pid=22036140, ppid=22036139, state=SUCCESS;
> TransitRegionStateProcedure table=email-objects-1,
> region=b85d893815b488f1282159d89e1cc86b, UNASSIGN in 2.1270 sec
> (hmaster) 2025-05-06T17:37:45,389 [PEWorker-3] INFO
> org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure:
> pid=22036139 splitting 7 storefiles,
> region=b85d893815b488f1282159d89e1cc86b, threads=7
>
> // The region is now split.
> // Open split daughters 942ae945978493ac3304a6eeb286a110 and
> 2e97f125b567c33626891076fdb4f500, which is okay
> // and also open daughter replicas f2edc050a8ea03ac9bef9554425b33e7
> and f48f8695abfe36eba7211a8080808c65, which is not okay
> (hmaster) 2025-05-06T17:37:46,515 [PEWorker-3] INFO
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Initialized
> subprocedures=[{pid=22036145, ppid=22036139,
> state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE;
> TransitRegionStateProcedure table=email-objects-1,
> region=942ae945978493ac3304a6eeb286a110, ASSIGN}, {pid=22036146,
> ppid=22036139, state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE;
> TransitRegionStateProcedure table=email-objects-1,
> region=2e97f125b567c33626891076fdb4f500, ASSIGN}, {pid=22036147,
> ppid=22036139, state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE;
> TransitRegionStateProcedure table=email-objects-1,
> region=f2edc050a8ea03ac9bef9554425b33e7, ASSIGN}, {pid=22036148,
> ppid=22036139, state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE;
> TransitRegionStateProcedure table=email-objects-1,
> region=f48f8695abfe36eba7211a8080808c65, ASSIGN}]
> ...
> (hmaster) 2025-05-06T17:37:46,960 [PEWorker-5] INFO
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished
> pid=22036145, ppid=22036139, state=SUCCESS;
> TransitRegionStateProcedure table=email-objects-1,
> region=942ae945978493ac3304a6eeb286a110, ASSIGN in 443 msec
> (hmaster) 2025-05-06T17:37:46,961 [PEWorker-8] INFO
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished
> pid=22036146, ppid=22036139, state=SUCCESS;
> TransitRegionStateProcedure table=email-objects-1,
> region=2e97f125b567c33626891076fdb4f500, ASSIGN in 444 msec
> (hmaster) 2025-05-06T17:37:47,027 [PEWorker-12] INFO
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished
> pid=22036147, ppid=22036139, state=SUCCESS;
> TransitRegionStateProcedure table=email-objects-1,
> region=f2edc050a8ea03ac9bef9554425b33e7, ASSIGN in 510 msec
> (hmaster) 2025-05-06T17:37:47,101 [PEWorker-10] INFO
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished
> pid=22036148, ppid=22036139, state=SUCCESS;
> TransitRegionStateProcedure table=email-objects-1,
> region=f48f8695abfe36eba7211a8080808c65, ASSIGN in 582 msec
> (hmaster) 2025-05-06T17:37:47,109 [PEWorker-11] INFO
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished
> pid=22036139, state=SUCCESS; SplitTableRegionProcedure
> table=email-objects-1, parent=b85d893815b488f1282159d89e1cc86b,
> daughterA=942ae945978493ac3304a6eeb286a110,
> daughterB=2e97f125b567c33626891076fdb4f500 in 3.8930 sec
>
> // Finish reopening all regions, moving on with ModifyTableProcedure
> (hmaster) 2025-05-06T17:46:22,158 [PEWorker-4] INFO
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished
> pid=22034599, ppid=22034598, state=SUCCESS;
> ReopenTableRegionsProcedure table=email-objects-1 in 9 mins, 14.178
> sec
> (hmaster) 2025-05-06T17:46:22,242 [PEWorker-2] INFO
> org.apache.hadoop.hbase.util.ServerRegionReplicaUtil: Region replica
> replication peer id=region_replica_replication not exist. Creating...
> (hmaster) 2025-05-06T17:46:22,246 [PEWorker-2] INFO
> org.apache.hadoop.hbase.master.HMaster: Client=null/null creating
> replication peer, id=region_replica_replication,
> config=clusterKey=zk-hbase-pluto-hb2-a-prod-lb.zk.svc.iad02.k8s.run:2181:2181:/hbase,replicationEndpointImpl=org.apache.hadoop.hbase.replication.regionserver.RegionReplicaReplicationEndpoint,replicateAllUserTables=true,bandwidth=0,serial=false,
> state=ENABLED
> ...
> (hmaster) 2025-05-06T17:46:23,562 [PEWorker-4] INFO
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished
> pid=22060604, ppid=22060356, state=SUCCESS;
> org.apache.hadoop.hbase.master.replication.RefreshPeerProcedure in
> 1.2600 sec
> (hmaster) 2025-05-06T17:46:23,562 [PEWorker-3] INFO
> org.apache.hadoop.hbase.master.replication.AddPeerProcedure:
> Successfully added ENABLED peer region_replica_replication, config
> clusterKey=zk-hbase-pluto-hb2-a-prod-lb.zk.svc.iad02.k8s.run:2181:2181:/hbase,replicationEndpointImpl=org.apache.hadoop.hbase.replication.regionserver.RegionReplicaReplicationEndpoint,replicateAllUserTables=true,bandwidth=0,serial=false
> (hmaster) 2025-05-06T17:46:23,565 [PEWorker-3] INFO
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished
> pid=22060356, state=SUCCESS;
> org.apache.hadoop.hbase.master.replication.AddPeerProcedure in 1.3140
> sec
>
> // Open all region replicas, including
> f2edc050a8ea03ac9bef9554425b33e7 and f48f8695abfe36eba7211a8080808c65
> (hmaster) 2025-05-06T17:46:23,571 [PEWorker-2] INFO
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Initialized
> subprocedures=[...{pid=22060998, ppid=22034598,
> state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE;
> TransitRegionStateProcedure table=email-objects-1,
> region=f2edc050a8ea03ac9bef9554425b33e7, ASSIGN}, {pid=22060999,
> ppid=22034598, state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE;
> TransitRegionStateProcedure table=email-objects-1,
> region=f48f8695abfe36eba7211a8080808c65, ASSIGN}...]
>
> // Some opens fail because they are already open
> (hmaster) 2025-05-06T17:47:32,989 [ProcExecTimeout] WARN
> org.apache.hadoop.hbase.master.assignment.AssignmentManager: STUCK
> Region-In-Transition state=OPENING,
> location=na1-rough-still-rose.iad02.hubinternal.net,60020,1745779142214,
> table=email-objects-1, region=f2edc050a8ea03ac9bef9554425b33e7
> (hmaster) 2025-05-06T17:48:32,993 [ProcExecTimeout] WARN
> org.apache.hadoop.hbase.master.assignment.AssignmentManager: STUCK
> Region-In-Transition state=OPENING,
> location=na1-just-fast-magpie.iad02.hubinternal.net,60020,1745753381085,
> table=email-objects-1, region=f48f8695abfe36eba7211a8080808c65
>

Reply via email to