[ https://issues.apache.org/jira/browse/HBASE-23594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17000888#comment-17000888 ]
Hudson commented on HBASE-23594: -------------------------------- Results for branch master [build #1573 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/master/1573/]: (x) *{color:red}-1 overall{color}* ---- details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/master/1573//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/master/1573//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/master/1573//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > Procedure stuck due to region happen to recorded on two servers. > ---------------------------------------------------------------- > > Key: HBASE-23594 > URL: https://issues.apache.org/jira/browse/HBASE-23594 > Project: HBase > Issue Type: Bug > Components: amv2, Region Assignment > Affects Versions: 2.2.2 > Reporter: Lijin Bin > Assignee: Duo Zhang > Priority: Critical > Fix For: 3.0.0, 2.3.0, 2.2.3 > > Attachments: HBASE-23594-UT.patch > > > Master log: > {code} > $ grep "cf9a4ec6cd890aa6806fb313d71e2ebd" > hbase-hbaseadmin-master-100.107.176.225.log.1 > 2019-12-17 11:24:03,534 DEBUG [KeepAlivePEWorker-20] > procedure.MasterProcedureScheduler: Add TableQueue(table1w_7, xlock=false > sharedLock=34 size=1662) to run queue because: the exclusive lock is not held > by anyone when adding pid=193706, ppid=187614, > state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE; > TransitRegionStateProcedure table=table1w_7, > region=cf9a4ec6cd890aa6806fb313d71e2ebd, ASSIGN > 2019-12-17 11:24:22,851 INFO [KeepAlivePEWorker-17] > procedure.MasterProcedureScheduler: Took xlock for pid=193706, ppid=187614, > state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE; > TransitRegionStateProcedure table=table1w_7, > region=cf9a4ec6cd890aa6806fb313d71e2ebd, ASSIGN > 2019-12-17 11:24:22,852 INFO [KeepAlivePEWorker-17] > assignment.TransitRegionStateProcedure: Starting pid=193706, ppid=187614, > state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE, locked=true; > TransitRegionStateProcedure table=table1w_7, > region=cf9a4ec6cd890aa6806fb313d71e2ebd, ASSIGN; rit=OPEN, location=null; > forceNewPlan=true, retain=false > 2019-12-17 11:24:22,852 DEBUG [KeepAlivePEWorker-17] > procedure2.RootProcedureState: Add procedure pid=193706, ppid=187614, > state=RUNNABLE:REGION_STATE_TRANSITION_OPEN, locked=true; > TransitRegionStateProcedure table=table1w_7, > region=cf9a4ec6cd890aa6806fb313d71e2ebd, ASSIGN as the 51669th rollback step > 2019-12-17 11:24:22,858 DEBUG [master/100.107.176.225:60000] > procedure.MasterProcedureScheduler: Add TableQueue(table1w_7, xlock=false > sharedLock=349 size=1666) to run queue because: pid=193706, ppid=187614, > state=RUNNABLE:REGION_STATE_TRANSITION_OPEN, locked=true; > TransitRegionStateProcedure table=table1w_7, > region=cf9a4ec6cd890aa6806fb313d71e2ebd, ASSIGN has lock > 2019-12-17 11:24:22,912 INFO [PEWorker-9] > assignment.TransitRegionStateProcedure: Starting pid=193706, ppid=187614, > state=RUNNABLE:REGION_STATE_TRANSITION_OPEN, locked=true; > TransitRegionStateProcedure table=table1w_7, > region=cf9a4ec6cd890aa6806fb313d71e2ebd, ASSIGN; openRegion rit=OPEN, > location=100.107.176.215,60020,1576552834619; > loc=100.107.176.215,60020,1576552834619 > 2019-12-17 11:24:22,912 INFO [PEWorker-9] assignment.RegionStateStore: > pid=193706 updating hbase:meta row=cf9a4ec6cd890aa6806fb313d71e2ebd, > regionState=OPENING, regionLocation=100.107.176.215,60020,1576552834619 > 2019-12-17 11:24:22,912 DEBUG [PEWorker-9] procedure2.RootProcedureState: Add > procedure pid=193706, ppid=187614, > state=WAITING:REGION_STATE_TRANSITION_CONFIRM_OPENED, locked=true; > TransitRegionStateProcedure table=table1w_7, > region=cf9a4ec6cd890aa6806fb313d71e2ebd, ASSIGN as the 52115th rollback step > 2019-12-17 11:24:22,918 WARN [PEWorker-8] > assignment.RegionRemoteProcedureBase: Can not add remote operation > pid=243482, ppid=193706, state=RUNNABLE, locked=true; > org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure for region > {ENCODED => cf9a4ec6cd890aa6806fb313d71e2ebd, NAME => > 'table1w_7,user68694,1576484498244.cf9a4ec6cd890aa6806fb313d71e2ebd.', > STARTKEY => 'user68694', ENDKEY => 'user68703'} to server > 100.107.176.215,60020,1576552834619, this usually because the server is > alread dead, give up and mark the procedure as complete, the parent procedure > will take care of this. > 2019-12-17 11:24:22,921 DEBUG [PEWorker-8] > procedure.MasterProcedureScheduler: Add TableQueue(table1w_7, xlock=false > sharedLock=331 size=1664) to run queue because: pid=193706, ppid=187614, > state=RUNNABLE:REGION_STATE_TRANSITION_CONFIRM_OPENED, locked=true; > TransitRegionStateProcedure table=table1w_7, > region=cf9a4ec6cd890aa6806fb313d71e2ebd, ASSIGN has lock > 2019-12-17 11:24:22,921 INFO [PEWorker-8] procedure2.ProcedureExecutor: > Finished subprocedure pid=243482, resume processing parent pid=193706, > ppid=187614, state=RUNNABLE:REGION_STATE_TRANSITION_CONFIRM_OPENED, > locked=true; TransitRegionStateProcedure table=table1w_7, > region=cf9a4ec6cd890aa6806fb313d71e2ebd, ASSIGN > 2019-12-17 11:24:22,921 INFO [PEWorker-9] > assignment.TransitRegionStateProcedure: Retry=1 of max=2147483647; > pid=193706, ppid=187614, > state=RUNNABLE:REGION_STATE_TRANSITION_CONFIRM_OPENED, locked=true; > TransitRegionStateProcedure table=table1w_7, > region=cf9a4ec6cd890aa6806fb313d71e2ebd, ASSIGN; rit=OPENING, > location=100.107.176.215,60020,1576552834619 > 2019-12-17 11:24:22,921 DEBUG [PEWorker-9] procedure2.RootProcedureState: Add > procedure pid=193706, ppid=187614, > state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE, locked=true; > TransitRegionStateProcedure table=table1w_7, > region=cf9a4ec6cd890aa6806fb313d71e2ebd, ASSIGN as the 52157th rollback step > 2019-12-17 11:24:22,922 INFO [PEWorker-9] > assignment.TransitRegionStateProcedure: Starting pid=193706, ppid=187614, > state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE, locked=true; > TransitRegionStateProcedure table=table1w_7, > region=cf9a4ec6cd890aa6806fb313d71e2ebd, ASSIGN; rit=OPENING, location=null; > forceNewPlan=true, retain=false > 2019-12-17 11:24:22,922 DEBUG [PEWorker-9] procedure2.RootProcedureState: Add > procedure pid=193706, ppid=187614, > state=RUNNABLE:REGION_STATE_TRANSITION_OPEN, locked=true; > TransitRegionStateProcedure table=table1w_7, > region=cf9a4ec6cd890aa6806fb313d71e2ebd, ASSIGN as the 52164th rollback step > 2019-12-17 11:24:22,930 DEBUG [master/100.107.176.225:60000] > procedure.MasterProcedureScheduler: Add TableQueue(table1w_7, xlock=false > sharedLock=331 size=1661) to run queue because: pid=193706, ppid=187614, > state=RUNNABLE:REGION_STATE_TRANSITION_OPEN, locked=true; > TransitRegionStateProcedure table=table1w_7, > region=cf9a4ec6cd890aa6806fb313d71e2ebd, ASSIGN has lock > 2019-12-17 11:24:22,943 INFO [PEWorker-14] > assignment.TransitRegionStateProcedure: Starting pid=193706, ppid=187614, > state=RUNNABLE:REGION_STATE_TRANSITION_OPEN, locked=true; > TransitRegionStateProcedure table=table1w_7, > region=cf9a4ec6cd890aa6806fb313d71e2ebd, ASSIGN; openRegion rit=OPENING, > location=100.107.164.90,60020,1576553001648; > loc=100.107.164.90,60020,1576553001648 > 2019-12-17 11:24:22,943 INFO [PEWorker-14] assignment.RegionStateStore: > pid=193706 updating hbase:meta row=cf9a4ec6cd890aa6806fb313d71e2ebd, > regionState=OPENING, regionLocation=100.107.164.90,60020,1576553001648 > 2019-12-17 11:24:22,944 DEBUG [PEWorker-14] procedure2.RootProcedureState: > Add procedure pid=193706, ppid=187614, > state=WAITING:REGION_STATE_TRANSITION_CONFIRM_OPENED, locked=true; > TransitRegionStateProcedure table=table1w_7, > region=cf9a4ec6cd890aa6806fb313d71e2ebd, ASSIGN as the 52284th rollback step > 2019-12-17 11:24:23,000 INFO [PEWorker-11] assignment.RegionStateStore: > pid=193706 updating hbase:meta row=cf9a4ec6cd890aa6806fb313d71e2ebd, > regionState=OPEN, openSeqNum=17, > regionLocation=100.107.164.90,60020,1576553001648 > 2019-12-17 11:24:23,003 DEBUG [PEWorker-11] > procedure.MasterProcedureScheduler: Add TableQueue(table1w_7, xlock=false > sharedLock=336 size=1635) to run queue because: pid=193706, ppid=187614, > state=RUNNABLE:REGION_STATE_TRANSITION_CONFIRM_OPENED, locked=true; > TransitRegionStateProcedure table=table1w_7, > region=cf9a4ec6cd890aa6806fb313d71e2ebd, ASSIGN has lock > 2019-12-17 11:24:23,003 INFO [PEWorker-11] procedure2.ProcedureExecutor: > Finished subprocedure pid=243523, resume processing parent pid=193706, > ppid=187614, state=RUNNABLE:REGION_STATE_TRANSITION_CONFIRM_OPENED, > locked=true; TransitRegionStateProcedure table=table1w_7, > region=cf9a4ec6cd890aa6806fb313d71e2ebd, ASSIGN > 2019-12-17 11:24:23,003 DEBUG [KeepAlivePEWorker-17] > procedure2.RootProcedureState: Add procedure pid=193706, ppid=187614, > state=SUCCESS, locked=true; TransitRegionStateProcedure table=table1w_7, > region=cf9a4ec6cd890aa6806fb313d71e2ebd, ASSIGN as the 52624th rollback step > 2019-12-17 11:24:23,005 INFO [KeepAlivePEWorker-17] > procedure2.ProcedureExecutor: Finished pid=193706, ppid=187614, > state=SUCCESS; TransitRegionStateProcedure table=table1w_7, > region=cf9a4ec6cd890aa6806fb313d71e2ebd, ASSIGN in 19.7230sec > 2019-12-17 11:24:28,902 DEBUG [PEWorker-7] > procedure.MasterProcedureScheduler: Add TableQueue(table1w_7, xlock=false > sharedLock=27 size=1856) to run queue because: the exclusive lock is not held > by anyone when adding pid=258071, ppid=243483, > state=RUNNABLE:REGION_STATE_TRANSITION_GE > 2019-12-17 11:24:22,914 DEBUG [RegionServerTracker-0] > procedure2.ProcedureExecutor: Stored pid=243483, > state=RUNNABLE:SERVER_CRASH_START; ServerCrashProcedure > server=100.107.176.215,60020,1576552834619, splitWal=true, meta=false > 2019-12-17 11:24:39,485 INFO [PEWorker-10] > procedure.MasterProcedureScheduler: Took xlock for pid=258071, ppid=243483, > state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE; > TransitRegionStateProcedure table=table1w_7, > region=cf9a4ec6cd890aa6806fb313d71e2ebd, ASSIGN > 2019-12-17 11:24:39,486 INFO [PEWorker-10] > assignment.TransitRegionStateProcedure: Starting pid=258071, ppid=243483, > state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE, locked=true; > TransitRegionStateProcedure table=table1w_7, > region=cf9a4ec6cd890aa6806fb313d71e2ebd, ASSIGN; rit=OPEN, location=null; > forceNewPlan=true, retain=false > 2019-12-17 11:24:39,486 DEBUG [PEWorker-10] procedure2.RootProcedureState: > Add procedure pid=258071, ppid=243483, > state=RUNNABLE:REGION_STATE_TRANSITION_OPEN, locked=true; > TransitRegionStateProcedure table=table1w_7, > region=cf9a4ec6cd890aa6806fb313d71e2ebd, ASSIGN as the 68029th rollback step > 2019-12-17 11:24:39,571 DEBUG [master/100.107.176.225:60000] > procedure.MasterProcedureScheduler: Add TableQueue(table1w_7, xlock=false > sharedLock=47 size=1847) to run queue because: pid=258071, ppid=243483, > state=RUNNABLE:REGION_STATE_TRANSITION_OPEN, locked=true; > TransitRegionStateProcedure table=table1w_7, > region=cf9a4ec6cd890aa6806fb313d71e2ebd, ASSIGN has lock > 2019-12-17 11:24:39,625 INFO [PEWorker-16] > assignment.TransitRegionStateProcedure: Starting pid=258071, ppid=243483, > state=RUNNABLE:REGION_STATE_TRANSITION_OPEN, locked=true; > TransitRegionStateProcedure table=table1w_7, > region=cf9a4ec6cd890aa6806fb313d71e2ebd, ASSIGN; openRegion rit=OPEN, > location=100.107.164.90,60020,1576553001648; > loc=100.107.164.90,60020,1576553001648 > 2019-12-17 11:24:39,625 INFO [PEWorker-16] assignment.RegionStateStore: > pid=258071 updating hbase:meta row=cf9a4ec6cd890aa6806fb313d71e2ebd, > regionState=OPENING, regionLocation=100.107.164.90,60020,1576553001648 > 2019-12-17 11:24:39,626 DEBUG [PEWorker-16] procedure2.RootProcedureState: > Add procedure pid=258071, ppid=243483, > state=WAITING:REGION_STATE_TRANSITION_CONFIRM_OPENED, locked=true; > TransitRegionStateProcedure table=table1w_7, > region=cf9a4ec6cd890aa6806fb313d71e2ebd, ASSIGN as the 68773th rollback step > 2019-12-17 11:26:23,217 WARN [ProcExecTimeout] assignment.AssignmentManager: > STUCK Region-In-Transition rit=OPENING, > location=100.107.164.90,60020,1576553001648, table=table1w_7, > region=cf9a4ec6cd890aa6806fb313d71e2ebd > 2019-12-17 11:27:23,217 WARN [ProcExecTimeout] assignment.AssignmentManager: > STUCK Region-In-Transition rit=OPENING, > location=100.107.164.90,60020,1576553001648, table=table1w_7, > region=cf9a4ec6cd890aa6806fb313d71e2ebd > 2019-12-17 11:28:06,532 INFO [master/100.107.176.225:60000.Chore.1] > master.HMaster: Not running balancer because 1 region(s) in transition: > [rit=OPENING, location=100.107.164.90,60020,1576553001648, table=table1w_7, > region=cf9a4ec6cd890aa6806fb313d71e2ebd] > 2019-12-17 11:28:23,218 WARN [ProcExecTimeout] assignment.AssignmentManager: > STUCK Region-In-Transition rit=OPENING, > location=100.107.164.90,60020,1576553001648, table=table1w_7, > region=cf9a4ec6cd890aa6806fb313d71e2ebd > 2019-12-17 11:29:23,218 WARN [ProcExecTimeout] assignment.AssignmentManager: > STUCK Region-In-Transition rit=OPENING, > location=100.107.164.90,60020,1576553001648, table=table1w_7, > region=cf9a4ec6cd890aa6806fb313d71e2ebd > 2019-12-17 11:30:23,219 WARN [ProcExecTimeout] assignment.AssignmentManager: > STUCK Region-In-Transition rit=OPENING, > location=100.107.164.90,60020,1576553001648, table=table1w_7, > region=cf9a4ec6cd890aa6806fb313d71e2ebd > {code} > RegionServer log: > {code} > $ grep "cf9a4ec6cd890aa6806fb313d71e2ebd" > hbase-hbaseadmin-regionserver-100.107.164.90.log > 2019-12-17 11:24:22,972 INFO > [RS_OPEN_REGION-regionserver/100.107.164.90:60020-1] > handler.AssignRegionHandler: Open > table1w_7,user68694,1576484498244.cf9a4ec6cd890aa6806fb313d71e2ebd. > 2019-12-17 11:24:22,972 INFO > [RS_OPEN_REGION-regionserver/100.107.164.90:60020-1] regionserver.HRegion: > filter rows disabled with empty filtering pattern for > region=table1w_7,user68694,1576484498244.cf9a4ec6cd890aa6806fb313d71e2ebd.. > 2019-12-17 11:24:22,983 INFO > [StoreOpener-cf9a4ec6cd890aa6806fb313d71e2ebd-1] hfile.CacheConfig: Created > cacheConfig: cacheDataOnRead=true, cacheDataOnWrite=false, > cacheIndexesOnWrite=false, cacheBloomsOnWrite=false, cacheEvictOnClose=true, > cacheDataCompressed=false, prefetchOnOpen=false for family {NAME => 'cf', > VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => > 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', > DATA_BLOCK_ENCODING => 'DIFF', TTL => 'FOREVER', MIN_VERSIONS => '0', > REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => > 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', > PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'SNAPPY', BLOCKCACHE => > 'true', BLOCKSIZE => '65536'} with > blockCache=org.apache.hadoop.hbase.io.hfile.CombinedBlockCache@1376a714 > 2019-12-17 11:24:22,983 INFO > [StoreOpener-cf9a4ec6cd890aa6806fb313d71e2ebd-1] > compactions.CompactionConfiguration: size [128 MB, 2 GB, 2 GB); files [7, > 25); ratio 1.200000; off-peak ratio 5.000000; throttle point 4294967296; > major period 0, major jitter 0.500000, min locality to compact 0.000000; > tiered compaction: max_age 9223372036854775807, incoming window min 6, > compaction policy for tiered window > org.apache.hadoop.hbase.regionserver.compactions.ExploringCompactionPolicy, > single output for minor true, compaction window factory > org.apache.hadoop.hbase.regionserver.compactions.ExponentialCompactionWindowFactory > 2019-12-17 11:24:22,984 INFO > [StoreOpener-cf9a4ec6cd890aa6806fb313d71e2ebd-1] regionserver.HStore: > Store=cf, memstore type=DefaultMemStore, storagePolicy=HOT, > verifyBulkLoads=false, parallelPutCountPrintThreshold=50, encoding=DIFF, > compression=SNAPPY > 2019-12-17 11:24:22,987 INFO > [RS_OPEN_REGION-regionserver/100.107.164.90:60020-1] regionserver.HRegion: > Opened cf9a4ec6cd890aa6806fb313d71e2ebd; next sequenceid=17 > 2019-12-17 11:24:22,987 INFO > [RS_OPEN_REGION-regionserver/100.107.164.90:60020-1] > regionserver.HRegionServer: Post open deploy tasks for > table1w_7,user68694,1576484498244.cf9a4ec6cd890aa6806fb313d71e2ebd., > openProcId=243523, masterSystemTime=1576553062965 > 2019-12-17 11:24:22,990 INFO > [RS_OPEN_REGION-regionserver/100.107.164.90:60020-1] > handler.AssignRegionHandler: Opened > table1w_7,user68694,1576484498244.cf9a4ec6cd890aa6806fb313d71e2ebd. > 2019-12-17 11:24:39,657 WARN > [RS_OPEN_REGION-regionserver/100.107.164.90:60020-1] > handler.AssignRegionHandler: Received OPEN for the > region:table1w_7,user68694,1576484498244.cf9a4ec6cd890aa6806fb313d71e2ebd., > which is already online > {code} > 2019-12-17 11:23:49,598 DEBUG [RegionServerTracker-0] > procedure2.ProcedureExecutor: Stored pid=187614, > state=RUNNABLE:SERVER_CRASH_START; ServerCrashProcedure > server=100.107.165.41,60020,1576552792328, splitWal=true, meta=true > Process ServerCrashProcedure(100.107.165.41,60020,1576552792328) assign > region cf9a4ec6cd890aa6806fb313d71e2ebd, > region assign to 100.107.176.215,60020,1576552834619, but failed, so retry > and assign to 100.107.164.90,60020,1576553001648 and open on > 100.107.164.90,60020,1576553001648 success. > Later process ServerCrashProcedure(100.107.176.215,60020,1576552834619) and > find region cf9a4ec6cd890aa6806fb313d71e2ebd on this server also, than try > assign this region, and assign the region to > 100.107.164.90,60020,1576553001648 also, but > 100.107.164.90,60020,1576553001648 find it has open this region already, than > the assign stuck. -- This message was sent by Atlassian Jira (v8.3.4#803005)