Haven't looked at the details yet, but we have table/region lock for the procedure system, which should prevent ModifyTableProcedure and SplitTableRegionProcedure executed at the same time.
Not sure what breaks the assumption. Charles Connell <cconn...@hubspot.com.invalid> 于2025年5月14日周三 06:39写道: > > Hi folks, > > I recently uncovered a problem in the ModifyTableProcedure that I want > to discuss with the community. I altered a table to increase the > region replica count from 1 to 2. This led to some of the secondary > replica regions being stuck opening, which was difficult to remediate, > although luckily did not impact database users. I believe that the > reason is because of an unfortunate interaction between the > ModifyTableProcedure and the SplitTableRegionProcedure. > > When adding new replicas, ModifyTableProcedure roughly does the following: > 1. Edits the table descriptor > 2. Closes and opens all existing regions > 3. Creates replication peer to support the new replica regions > 4. Opens new replica regions > > When closing a region in step 2, the act of closing it can trigger a > split. When the HMaster runs the SplitTableRegionProcedure, it roughly > does: > 1. Close the region to be split > 2. Split the store files > 3. Open the new daughter regions (including replicas) > > When a split is done as part of a ModifyTableProcedure that adds > replicas, we get the following combination of behavior: > 1. Edit the table descriptor > 2. Close and open all existing regions, adding a > SplitTableRegionProcedure to the HMaster's queue > 3. To begin the SplitTableRegionProcedure, close region to be split > 4. Split store files > 5. Open the new daughter regions, including replicas, completing the > SplitTableRegionProcedure > 6. Create replication peer to support the new replica regions > 7. Open new replica regions, even though some were already opened in > step 5. Double-opening doesn't work, and the open procedures are > stuck. > > For some proof, I've prepared selected log lines from the incident at > my company that prompted this email, which are at the end of this > email. > > In thinking about this problem, I looked at the code that triggers > splits. This is in the MemStoreFlusher class, which is invoked on > every region close. It sometimes does a split if HRegion#checkSplit() > returns non-empty. That method will return empty if the region's > "closing" flag is set, however the order of operations of a region > close means that the closing flag is not yet set when checkSplit() > checks it. This seems like an oversight to me, and perhaps the > solution is as simple as moving the `closing.set(true)` call to > somewhere earlier in HRegion#doClose(). However, I don't understand > what the full implications of this would be. > > I also thought about this from another angle: there is a span of time > when the ModifyTableProcedure has modified the table descriptor, but > has not yet opened the new replicas, and during that time, it assumes > that no other procedures will act upon the new information in the > table descriptor. In my case, the SplitTableRegionProcedure violated > this assumption. Perhaps this assumption should be made enforceable, > like somehow no region operations should be permitted by non-children > of the ModifyTableProcedure until it has completed. > > What do you all think? > > Thank you for your time, > Charles > > Logs from the incident: > // Start of ModifyTableProcedure > (hmaster) 2025-05-06T17:37:07,884 > [RpcServer.default.FPBQ.Fifo.handler=28,queue=1,port=60000] INFO > org.apache.hadoop.hbase.master.HMaster: > Client=HBase2TasksJobs-setReadReplication//172.16.81.147 modify table > email-objects-1 from 'email-objects-1', {...} to 'email-objects-1', > {... REGION_REPLICATION => '2',...} > > // Re-open b85d893815b488f1282159d89e1cc86b to pick up new table descriptor > (hmaster) 2025-05-06T17:37:35,717 [PEWorker-2] INFO > org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Initialized > subprocedures=[...{pid=22035700, ppid=22034599, > state=RUNNABLE:REGION_STATE_TRANSITION_CLOSE; > TransitRegionStateProcedure table=email-objects-1, > region=b85d893815b488f1282159d89e1cc86b, REOPEN/MOVE}...] > > // Close b85d893815b488f1282159d89e1cc86b > (hmaster) 2025-05-06T17:37:35,816 [PEWorker-5] INFO > org.apache.hadoop.hbase.master.procedure.MasterProcedureScheduler: > Took xlock for pid=22035700, ppid=22034599, > state=RUNNABLE:REGION_STATE_TRANSITION_CLOSE; > TransitRegionStateProcedure table=email-objects-1, > region=b85d893815b488f1282159d89e1cc86b, REOPEN/MOVE > (hmaster) 2025-05-06T17:37:35,817 [PEWorker-5] INFO > org.apache.hadoop.hbase.master.assignment.RegionStateStore: > pid=22035700 updating hbase:meta row=b85d893815b488f1282159d89e1cc86b, > regionState=CLOSING, > regionLocation=na1-fruity-sensible-bus.iad02.hubinternal.net,60020,1746489557104 > (hmaster) 2025-05-06T17:37:35,827 [PEWorker-5] INFO > org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Initialized > subprocedures=[{pid=22035776, ppid=22035700, state=RUNNABLE; > CloseRegionProcedure b85d893815b488f1282159d89e1cc86b, > server=na1-fruity-sensible-bus.iad02.hubinternal.net,60020,1746489557104}] > (regionserver) 2025-05-06 17:37:35.986 UTC Close > b85d893815b488f1282159d89e1cc86b > (regionserver) 2025-05-06 17:37:35.986 UTC Running close preflush of > b85d893815b488f1282159d89e1cc86b > // At this point I suspect the RegionServer sent a split request to the > HMaster > (regionserver) 2025-05-06 17:37:35.987 UTC Flushing > b85d893815b488f1282159d89e1cc86b 2/2 column families, dataSize=8.25 MB > heapSize=10.12 MB > (regionserver) 2025-05-06 17:37:36.057 UTC Flushed memstore data > size=7.80 MB at sequenceid=26941273 (bloomFilter=true), > to=hdfs://pluto-hb2-a-prod:8020/hbase/data/default/email-objects-1/b85d893815b488f1282159d89e1cc86b/.tmp/e/7ffadb785b8e4ed88567437cc7633e09 > (regionserver) 2025-05-06 17:37:36.117 UTC Flushed memstore data > size=455.16 KB at sequenceid=26941273 (bloomFilter=true), > to=hdfs://pluto-hb2-a-prod:8020/hbase/data/default/email-objects-1/b85d893815b488f1282159d89e1cc86b/.tmp/i/ec9f55f487a443d99636f4aa6dd297d7 > (hmaster) 2025-05-06T17:37:36,174 [PEWorker-9] INFO > org.apache.hadoop.hbase.master.assignment.RegionStateStore: > pid=22035700 updating hbase:meta row=b85d893815b488f1282159d89e1cc86b, > regionState=CLOSED > (hmaster) 2025-05-06T17:37:36,182 [PEWorker-8] INFO > org.apache.hadoop.hbase.master.assignment.TransitRegionStateProcedure: > Starting pid=22035700, ppid=22034599, > state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE, > locked=true; TransitRegionStateProcedure table=email-objects-1, > region=b85d893815b488f1282159d89e1cc86b, REOPEN/MOVE; state=CLOSED, > location=na1-fruity-sensible-bus.iad02.hubinternal.net,60020,1746489557104; > forceNewPlan=false, retain=true > (hmaster) 2025-05-06T17:37:36,182 [PEWorker-9] INFO > org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished > pid=22035776, ppid=22035700, state=SUCCESS; CloseRegionProcedure > b85d893815b488f1282159d89e1cc86b, > server=na1-fruity-sensible-bus.iad02.hubinternal.net,60020,1746489557104 > in 357 msec > > // Open b85d893815b488f1282159d89e1cc86b with new table descriptor > (hmaster) 2025-05-06T17:37:36,239 [PEWorker-6] INFO > org.apache.hadoop.hbase.master.assignment.RegionStateStore: > pid=22035700 updating hbase:meta row=b85d893815b488f1282159d89e1cc86b, > regionState=OPENING, > regionLocation=na1-fruity-sensible-bus.iad02.hubinternal.net,60020,1746489557104 > (hmaster) 2025-05-06T17:37:36,245 [PEWorker-6] INFO > org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Initialized > subprocedures=[{pid=22035800, ppid=22035700, state=RUNNABLE; > OpenRegionProcedure b85d893815b488f1282159d89e1cc86b, > server=na1-fruity-sensible-bus.iad02.hubinternal.net,60020,1746489557104}] > (hmaster) 2025-05-06T17:37:36,529 [PEWorker-10] INFO > org.apache.hadoop.hbase.master.assignment.RegionStateStore: > pid=22035700 updating hbase:meta row=b85d893815b488f1282159d89e1cc86b, > regionState=OPEN, openSeqNum=26941277, > regionLocation=na1-fruity-sensible-bus.iad02.hubinternal.net,60020,1746489557104 > (hmaster) 2025-05-06T17:37:36,539 [PEWorker-10] INFO > org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished > pid=22035800, ppid=22035700, state=SUCCESS; OpenRegionProcedure > b85d893815b488f1282159d89e1cc86b, > server=na1-fruity-sensible-bus.iad02.hubinternal.net,60020,1746489557104 > in 290 msec > (hmaster) 2025-05-06T17:37:36,557 [PEWorker-7] INFO > org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished > pid=22035700, ppid=22034599, state=SUCCESS; > TransitRegionStateProcedure table=email-objects-1, > region=b85d893815b488f1282159d89e1cc86b, REOPEN/MOVE in 822 msec > > // The requested split of b85d893815b488f1282159d89e1cc86b begins on > the HMaster, beginning with closing the region. It works fine because > b85d893815b488f1282159d89e1cc86b is open. > (hmaster) 2025-05-06T17:37:43,225 [PEWorker-1] INFO > org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Initialized > subprocedures=[{pid=22036140, ppid=22036139, > state=RUNNABLE:REGION_STATE_TRANSITION_CLOSE; > TransitRegionStateProcedure table=email-objects-1, > region=b85d893815b488f1282159d89e1cc86b, UNASSIGN}, {pid=22036141, > ppid=22036139, state=RUNNABLE:REGION_STATE_TRANSITION_CLOSE; > TransitRegionStateProcedure table=email-objects-1, > region=b47e159aa157e066841e67ca1490a55d, UNASSIGN}] > (hmaster) 2025-05-06T17:37:43,231 [PEWorker-6] INFO > org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Initialized > subprocedures=[{pid=22036142, ppid=22036140, state=RUNNABLE; > CloseRegionProcedure b85d893815b488f1282159d89e1cc86b, > server=na1-fruity-sensible-bus.iad02.hubinternal.net,60020,1746489557104}] > (hmaster) 2025-05-06T17:37:45,352 [PEWorker-1] INFO > org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished > pid=22036142, ppid=22036140, state=SUCCESS; CloseRegionProcedure > b85d893815b488f1282159d89e1cc86b, > server=na1-fruity-sensible-bus.iad02.hubinternal.net,60020,1746489557104 > in 2.1170 sec > (hmaster) 2025-05-06T17:37:45,355 [PEWorker-4] INFO > org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished > pid=22036140, ppid=22036139, state=SUCCESS; > TransitRegionStateProcedure table=email-objects-1, > region=b85d893815b488f1282159d89e1cc86b, UNASSIGN in 2.1270 sec > (hmaster) 2025-05-06T17:37:45,389 [PEWorker-3] INFO > org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure: > pid=22036139 splitting 7 storefiles, > region=b85d893815b488f1282159d89e1cc86b, threads=7 > > // The region is now split. > // Open split daughters 942ae945978493ac3304a6eeb286a110 and > 2e97f125b567c33626891076fdb4f500, which is okay > // and also open daughter replicas f2edc050a8ea03ac9bef9554425b33e7 > and f48f8695abfe36eba7211a8080808c65, which is not okay > (hmaster) 2025-05-06T17:37:46,515 [PEWorker-3] INFO > org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Initialized > subprocedures=[{pid=22036145, ppid=22036139, > state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE; > TransitRegionStateProcedure table=email-objects-1, > region=942ae945978493ac3304a6eeb286a110, ASSIGN}, {pid=22036146, > ppid=22036139, state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE; > TransitRegionStateProcedure table=email-objects-1, > region=2e97f125b567c33626891076fdb4f500, ASSIGN}, {pid=22036147, > ppid=22036139, state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE; > TransitRegionStateProcedure table=email-objects-1, > region=f2edc050a8ea03ac9bef9554425b33e7, ASSIGN}, {pid=22036148, > ppid=22036139, state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE; > TransitRegionStateProcedure table=email-objects-1, > region=f48f8695abfe36eba7211a8080808c65, ASSIGN}] > ... > (hmaster) 2025-05-06T17:37:46,960 [PEWorker-5] INFO > org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished > pid=22036145, ppid=22036139, state=SUCCESS; > TransitRegionStateProcedure table=email-objects-1, > region=942ae945978493ac3304a6eeb286a110, ASSIGN in 443 msec > (hmaster) 2025-05-06T17:37:46,961 [PEWorker-8] INFO > org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished > pid=22036146, ppid=22036139, state=SUCCESS; > TransitRegionStateProcedure table=email-objects-1, > region=2e97f125b567c33626891076fdb4f500, ASSIGN in 444 msec > (hmaster) 2025-05-06T17:37:47,027 [PEWorker-12] INFO > org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished > pid=22036147, ppid=22036139, state=SUCCESS; > TransitRegionStateProcedure table=email-objects-1, > region=f2edc050a8ea03ac9bef9554425b33e7, ASSIGN in 510 msec > (hmaster) 2025-05-06T17:37:47,101 [PEWorker-10] INFO > org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished > pid=22036148, ppid=22036139, state=SUCCESS; > TransitRegionStateProcedure table=email-objects-1, > region=f48f8695abfe36eba7211a8080808c65, ASSIGN in 582 msec > (hmaster) 2025-05-06T17:37:47,109 [PEWorker-11] INFO > org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished > pid=22036139, state=SUCCESS; SplitTableRegionProcedure > table=email-objects-1, parent=b85d893815b488f1282159d89e1cc86b, > daughterA=942ae945978493ac3304a6eeb286a110, > daughterB=2e97f125b567c33626891076fdb4f500 in 3.8930 sec > > // Finish reopening all regions, moving on with ModifyTableProcedure > (hmaster) 2025-05-06T17:46:22,158 [PEWorker-4] INFO > org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished > pid=22034599, ppid=22034598, state=SUCCESS; > ReopenTableRegionsProcedure table=email-objects-1 in 9 mins, 14.178 > sec > (hmaster) 2025-05-06T17:46:22,242 [PEWorker-2] INFO > org.apache.hadoop.hbase.util.ServerRegionReplicaUtil: Region replica > replication peer id=region_replica_replication not exist. Creating... > (hmaster) 2025-05-06T17:46:22,246 [PEWorker-2] INFO > org.apache.hadoop.hbase.master.HMaster: Client=null/null creating > replication peer, id=region_replica_replication, > config=clusterKey=zk-hbase-pluto-hb2-a-prod-lb.zk.svc.iad02.k8s.run:2181:2181:/hbase,replicationEndpointImpl=org.apache.hadoop.hbase.replication.regionserver.RegionReplicaReplicationEndpoint,replicateAllUserTables=true,bandwidth=0,serial=false, > state=ENABLED > ... > (hmaster) 2025-05-06T17:46:23,562 [PEWorker-4] INFO > org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished > pid=22060604, ppid=22060356, state=SUCCESS; > org.apache.hadoop.hbase.master.replication.RefreshPeerProcedure in > 1.2600 sec > (hmaster) 2025-05-06T17:46:23,562 [PEWorker-3] INFO > org.apache.hadoop.hbase.master.replication.AddPeerProcedure: > Successfully added ENABLED peer region_replica_replication, config > clusterKey=zk-hbase-pluto-hb2-a-prod-lb.zk.svc.iad02.k8s.run:2181:2181:/hbase,replicationEndpointImpl=org.apache.hadoop.hbase.replication.regionserver.RegionReplicaReplicationEndpoint,replicateAllUserTables=true,bandwidth=0,serial=false > (hmaster) 2025-05-06T17:46:23,565 [PEWorker-3] INFO > org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished > pid=22060356, state=SUCCESS; > org.apache.hadoop.hbase.master.replication.AddPeerProcedure in 1.3140 > sec > > // Open all region replicas, including > f2edc050a8ea03ac9bef9554425b33e7 and f48f8695abfe36eba7211a8080808c65 > (hmaster) 2025-05-06T17:46:23,571 [PEWorker-2] INFO > org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Initialized > subprocedures=[...{pid=22060998, ppid=22034598, > state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE; > TransitRegionStateProcedure table=email-objects-1, > region=f2edc050a8ea03ac9bef9554425b33e7, ASSIGN}, {pid=22060999, > ppid=22034598, state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE; > TransitRegionStateProcedure table=email-objects-1, > region=f48f8695abfe36eba7211a8080808c65, ASSIGN}...] > > // Some opens fail because they are already open > (hmaster) 2025-05-06T17:47:32,989 [ProcExecTimeout] WARN > org.apache.hadoop.hbase.master.assignment.AssignmentManager: STUCK > Region-In-Transition state=OPENING, > location=na1-rough-still-rose.iad02.hubinternal.net,60020,1745779142214, > table=email-objects-1, region=f2edc050a8ea03ac9bef9554425b33e7 > (hmaster) 2025-05-06T17:48:32,993 [ProcExecTimeout] WARN > org.apache.hadoop.hbase.master.assignment.AssignmentManager: STUCK > Region-In-Transition state=OPENING, > location=na1-just-fast-magpie.iad02.hubinternal.net,60020,1745753381085, > table=email-objects-1, region=f48f8695abfe36eba7211a8080808c65 >