[ 
https://issues.apache.org/jira/browse/HBASE-23594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17000888#comment-17000888
 ] 

Hudson commented on HBASE-23594:
--------------------------------

Results for branch master
        [build #1573 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/master/1573/]: (x) 
*{color:red}-1 overall{color}*
----
details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/1573//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/1573//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/1573//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Procedure stuck due to region happen to recorded on two servers.
> ----------------------------------------------------------------
>
>                 Key: HBASE-23594
>                 URL: https://issues.apache.org/jira/browse/HBASE-23594
>             Project: HBase
>          Issue Type: Bug
>          Components: amv2, Region Assignment
>    Affects Versions: 2.2.2
>            Reporter: Lijin Bin
>            Assignee: Duo Zhang
>            Priority: Critical
>             Fix For: 3.0.0, 2.3.0, 2.2.3
>
>         Attachments: HBASE-23594-UT.patch
>
>
> Master log:
> {code}
> $ grep "cf9a4ec6cd890aa6806fb313d71e2ebd" 
> hbase-hbaseadmin-master-100.107.176.225.log.1
> 2019-12-17 11:24:03,534 DEBUG [KeepAlivePEWorker-20] 
> procedure.MasterProcedureScheduler: Add TableQueue(table1w_7, xlock=false 
> sharedLock=34 size=1662) to run queue because: the exclusive lock is not held 
> by anyone when adding pid=193706, ppid=187614, 
> state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE; 
> TransitRegionStateProcedure table=table1w_7, 
> region=cf9a4ec6cd890aa6806fb313d71e2ebd, ASSIGN
> 2019-12-17 11:24:22,851 INFO  [KeepAlivePEWorker-17] 
> procedure.MasterProcedureScheduler: Took xlock for pid=193706, ppid=187614, 
> state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE; 
> TransitRegionStateProcedure table=table1w_7, 
> region=cf9a4ec6cd890aa6806fb313d71e2ebd, ASSIGN
> 2019-12-17 11:24:22,852 INFO  [KeepAlivePEWorker-17] 
> assignment.TransitRegionStateProcedure: Starting pid=193706, ppid=187614, 
> state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE, locked=true; 
> TransitRegionStateProcedure table=table1w_7, 
> region=cf9a4ec6cd890aa6806fb313d71e2ebd, ASSIGN; rit=OPEN, location=null; 
> forceNewPlan=true, retain=false
> 2019-12-17 11:24:22,852 DEBUG [KeepAlivePEWorker-17] 
> procedure2.RootProcedureState: Add procedure pid=193706, ppid=187614, 
> state=RUNNABLE:REGION_STATE_TRANSITION_OPEN, locked=true; 
> TransitRegionStateProcedure table=table1w_7, 
> region=cf9a4ec6cd890aa6806fb313d71e2ebd, ASSIGN as the 51669th rollback step
> 2019-12-17 11:24:22,858 DEBUG [master/100.107.176.225:60000] 
> procedure.MasterProcedureScheduler: Add TableQueue(table1w_7, xlock=false 
> sharedLock=349 size=1666) to run queue because: pid=193706, ppid=187614, 
> state=RUNNABLE:REGION_STATE_TRANSITION_OPEN, locked=true; 
> TransitRegionStateProcedure table=table1w_7, 
> region=cf9a4ec6cd890aa6806fb313d71e2ebd, ASSIGN has lock
> 2019-12-17 11:24:22,912 INFO  [PEWorker-9] 
> assignment.TransitRegionStateProcedure: Starting pid=193706, ppid=187614, 
> state=RUNNABLE:REGION_STATE_TRANSITION_OPEN, locked=true; 
> TransitRegionStateProcedure table=table1w_7, 
> region=cf9a4ec6cd890aa6806fb313d71e2ebd, ASSIGN; openRegion rit=OPEN, 
> location=100.107.176.215,60020,1576552834619; 
> loc=100.107.176.215,60020,1576552834619
> 2019-12-17 11:24:22,912 INFO  [PEWorker-9] assignment.RegionStateStore: 
> pid=193706 updating hbase:meta row=cf9a4ec6cd890aa6806fb313d71e2ebd, 
> regionState=OPENING, regionLocation=100.107.176.215,60020,1576552834619
> 2019-12-17 11:24:22,912 DEBUG [PEWorker-9] procedure2.RootProcedureState: Add 
> procedure pid=193706, ppid=187614, 
> state=WAITING:REGION_STATE_TRANSITION_CONFIRM_OPENED, locked=true; 
> TransitRegionStateProcedure table=table1w_7, 
> region=cf9a4ec6cd890aa6806fb313d71e2ebd, ASSIGN as the 52115th rollback step
> 2019-12-17 11:24:22,918 WARN  [PEWorker-8] 
> assignment.RegionRemoteProcedureBase: Can not add remote operation 
> pid=243482, ppid=193706, state=RUNNABLE, locked=true; 
> org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure for region 
> {ENCODED => cf9a4ec6cd890aa6806fb313d71e2ebd, NAME => 
> 'table1w_7,user68694,1576484498244.cf9a4ec6cd890aa6806fb313d71e2ebd.', 
> STARTKEY => 'user68694', ENDKEY => 'user68703'} to server 
> 100.107.176.215,60020,1576552834619, this usually because the server is 
> alread dead, give up and mark the procedure as complete, the parent procedure 
> will take care of this.
> 2019-12-17 11:24:22,921 DEBUG [PEWorker-8] 
> procedure.MasterProcedureScheduler: Add TableQueue(table1w_7, xlock=false 
> sharedLock=331 size=1664) to run queue because: pid=193706, ppid=187614, 
> state=RUNNABLE:REGION_STATE_TRANSITION_CONFIRM_OPENED, locked=true; 
> TransitRegionStateProcedure table=table1w_7, 
> region=cf9a4ec6cd890aa6806fb313d71e2ebd, ASSIGN has lock
> 2019-12-17 11:24:22,921 INFO  [PEWorker-8] procedure2.ProcedureExecutor: 
> Finished subprocedure pid=243482, resume processing parent pid=193706, 
> ppid=187614, state=RUNNABLE:REGION_STATE_TRANSITION_CONFIRM_OPENED, 
> locked=true; TransitRegionStateProcedure table=table1w_7, 
> region=cf9a4ec6cd890aa6806fb313d71e2ebd, ASSIGN
> 2019-12-17 11:24:22,921 INFO  [PEWorker-9] 
> assignment.TransitRegionStateProcedure: Retry=1 of max=2147483647; 
> pid=193706, ppid=187614, 
> state=RUNNABLE:REGION_STATE_TRANSITION_CONFIRM_OPENED, locked=true; 
> TransitRegionStateProcedure table=table1w_7, 
> region=cf9a4ec6cd890aa6806fb313d71e2ebd, ASSIGN; rit=OPENING, 
> location=100.107.176.215,60020,1576552834619
> 2019-12-17 11:24:22,921 DEBUG [PEWorker-9] procedure2.RootProcedureState: Add 
> procedure pid=193706, ppid=187614, 
> state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE, locked=true; 
> TransitRegionStateProcedure table=table1w_7, 
> region=cf9a4ec6cd890aa6806fb313d71e2ebd, ASSIGN as the 52157th rollback step
> 2019-12-17 11:24:22,922 INFO  [PEWorker-9] 
> assignment.TransitRegionStateProcedure: Starting pid=193706, ppid=187614, 
> state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE, locked=true; 
> TransitRegionStateProcedure table=table1w_7, 
> region=cf9a4ec6cd890aa6806fb313d71e2ebd, ASSIGN; rit=OPENING, location=null; 
> forceNewPlan=true, retain=false
> 2019-12-17 11:24:22,922 DEBUG [PEWorker-9] procedure2.RootProcedureState: Add 
> procedure pid=193706, ppid=187614, 
> state=RUNNABLE:REGION_STATE_TRANSITION_OPEN, locked=true; 
> TransitRegionStateProcedure table=table1w_7, 
> region=cf9a4ec6cd890aa6806fb313d71e2ebd, ASSIGN as the 52164th rollback step
> 2019-12-17 11:24:22,930 DEBUG [master/100.107.176.225:60000] 
> procedure.MasterProcedureScheduler: Add TableQueue(table1w_7, xlock=false 
> sharedLock=331 size=1661) to run queue because: pid=193706, ppid=187614, 
> state=RUNNABLE:REGION_STATE_TRANSITION_OPEN, locked=true; 
> TransitRegionStateProcedure table=table1w_7, 
> region=cf9a4ec6cd890aa6806fb313d71e2ebd, ASSIGN has lock
> 2019-12-17 11:24:22,943 INFO  [PEWorker-14] 
> assignment.TransitRegionStateProcedure: Starting pid=193706, ppid=187614, 
> state=RUNNABLE:REGION_STATE_TRANSITION_OPEN, locked=true; 
> TransitRegionStateProcedure table=table1w_7, 
> region=cf9a4ec6cd890aa6806fb313d71e2ebd, ASSIGN; openRegion rit=OPENING, 
> location=100.107.164.90,60020,1576553001648; 
> loc=100.107.164.90,60020,1576553001648
> 2019-12-17 11:24:22,943 INFO  [PEWorker-14] assignment.RegionStateStore: 
> pid=193706 updating hbase:meta row=cf9a4ec6cd890aa6806fb313d71e2ebd, 
> regionState=OPENING, regionLocation=100.107.164.90,60020,1576553001648
> 2019-12-17 11:24:22,944 DEBUG [PEWorker-14] procedure2.RootProcedureState: 
> Add procedure pid=193706, ppid=187614, 
> state=WAITING:REGION_STATE_TRANSITION_CONFIRM_OPENED, locked=true; 
> TransitRegionStateProcedure table=table1w_7, 
> region=cf9a4ec6cd890aa6806fb313d71e2ebd, ASSIGN as the 52284th rollback step
> 2019-12-17 11:24:23,000 INFO  [PEWorker-11] assignment.RegionStateStore: 
> pid=193706 updating hbase:meta row=cf9a4ec6cd890aa6806fb313d71e2ebd, 
> regionState=OPEN, openSeqNum=17, 
> regionLocation=100.107.164.90,60020,1576553001648
> 2019-12-17 11:24:23,003 DEBUG [PEWorker-11] 
> procedure.MasterProcedureScheduler: Add TableQueue(table1w_7, xlock=false 
> sharedLock=336 size=1635) to run queue because: pid=193706, ppid=187614, 
> state=RUNNABLE:REGION_STATE_TRANSITION_CONFIRM_OPENED, locked=true; 
> TransitRegionStateProcedure table=table1w_7, 
> region=cf9a4ec6cd890aa6806fb313d71e2ebd, ASSIGN has lock
> 2019-12-17 11:24:23,003 INFO  [PEWorker-11] procedure2.ProcedureExecutor: 
> Finished subprocedure pid=243523, resume processing parent pid=193706, 
> ppid=187614, state=RUNNABLE:REGION_STATE_TRANSITION_CONFIRM_OPENED, 
> locked=true; TransitRegionStateProcedure table=table1w_7, 
> region=cf9a4ec6cd890aa6806fb313d71e2ebd, ASSIGN
> 2019-12-17 11:24:23,003 DEBUG [KeepAlivePEWorker-17] 
> procedure2.RootProcedureState: Add procedure pid=193706, ppid=187614, 
> state=SUCCESS, locked=true; TransitRegionStateProcedure table=table1w_7, 
> region=cf9a4ec6cd890aa6806fb313d71e2ebd, ASSIGN as the 52624th rollback step
> 2019-12-17 11:24:23,005 INFO  [KeepAlivePEWorker-17] 
> procedure2.ProcedureExecutor: Finished pid=193706, ppid=187614, 
> state=SUCCESS; TransitRegionStateProcedure table=table1w_7, 
> region=cf9a4ec6cd890aa6806fb313d71e2ebd, ASSIGN in 19.7230sec
> 2019-12-17 11:24:28,902 DEBUG [PEWorker-7] 
> procedure.MasterProcedureScheduler: Add TableQueue(table1w_7, xlock=false 
> sharedLock=27 size=1856) to run queue because: the exclusive lock is not held 
> by anyone when adding pid=258071, ppid=243483, 
> state=RUNNABLE:REGION_STATE_TRANSITION_GE
> 2019-12-17 11:24:22,914 DEBUG [RegionServerTracker-0] 
> procedure2.ProcedureExecutor: Stored pid=243483, 
> state=RUNNABLE:SERVER_CRASH_START; ServerCrashProcedure 
> server=100.107.176.215,60020,1576552834619, splitWal=true, meta=false
> 2019-12-17 11:24:39,485 INFO  [PEWorker-10] 
> procedure.MasterProcedureScheduler: Took xlock for pid=258071, ppid=243483, 
> state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE; 
> TransitRegionStateProcedure table=table1w_7, 
> region=cf9a4ec6cd890aa6806fb313d71e2ebd, ASSIGN
> 2019-12-17 11:24:39,486 INFO  [PEWorker-10] 
> assignment.TransitRegionStateProcedure: Starting pid=258071, ppid=243483, 
> state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE, locked=true; 
> TransitRegionStateProcedure table=table1w_7, 
> region=cf9a4ec6cd890aa6806fb313d71e2ebd, ASSIGN; rit=OPEN, location=null; 
> forceNewPlan=true, retain=false
> 2019-12-17 11:24:39,486 DEBUG [PEWorker-10] procedure2.RootProcedureState: 
> Add procedure pid=258071, ppid=243483, 
> state=RUNNABLE:REGION_STATE_TRANSITION_OPEN, locked=true; 
> TransitRegionStateProcedure table=table1w_7, 
> region=cf9a4ec6cd890aa6806fb313d71e2ebd, ASSIGN as the 68029th rollback step
> 2019-12-17 11:24:39,571 DEBUG [master/100.107.176.225:60000] 
> procedure.MasterProcedureScheduler: Add TableQueue(table1w_7, xlock=false 
> sharedLock=47 size=1847) to run queue because: pid=258071, ppid=243483, 
> state=RUNNABLE:REGION_STATE_TRANSITION_OPEN, locked=true; 
> TransitRegionStateProcedure table=table1w_7, 
> region=cf9a4ec6cd890aa6806fb313d71e2ebd, ASSIGN has lock
> 2019-12-17 11:24:39,625 INFO  [PEWorker-16] 
> assignment.TransitRegionStateProcedure: Starting pid=258071, ppid=243483, 
> state=RUNNABLE:REGION_STATE_TRANSITION_OPEN, locked=true; 
> TransitRegionStateProcedure table=table1w_7, 
> region=cf9a4ec6cd890aa6806fb313d71e2ebd, ASSIGN; openRegion rit=OPEN, 
> location=100.107.164.90,60020,1576553001648; 
> loc=100.107.164.90,60020,1576553001648
> 2019-12-17 11:24:39,625 INFO  [PEWorker-16] assignment.RegionStateStore: 
> pid=258071 updating hbase:meta row=cf9a4ec6cd890aa6806fb313d71e2ebd, 
> regionState=OPENING, regionLocation=100.107.164.90,60020,1576553001648
> 2019-12-17 11:24:39,626 DEBUG [PEWorker-16] procedure2.RootProcedureState: 
> Add procedure pid=258071, ppid=243483, 
> state=WAITING:REGION_STATE_TRANSITION_CONFIRM_OPENED, locked=true; 
> TransitRegionStateProcedure table=table1w_7, 
> region=cf9a4ec6cd890aa6806fb313d71e2ebd, ASSIGN as the 68773th rollback step
> 2019-12-17 11:26:23,217 WARN  [ProcExecTimeout] assignment.AssignmentManager: 
> STUCK Region-In-Transition rit=OPENING, 
> location=100.107.164.90,60020,1576553001648, table=table1w_7, 
> region=cf9a4ec6cd890aa6806fb313d71e2ebd
> 2019-12-17 11:27:23,217 WARN  [ProcExecTimeout] assignment.AssignmentManager: 
> STUCK Region-In-Transition rit=OPENING, 
> location=100.107.164.90,60020,1576553001648, table=table1w_7, 
> region=cf9a4ec6cd890aa6806fb313d71e2ebd
> 2019-12-17 11:28:06,532 INFO  [master/100.107.176.225:60000.Chore.1] 
> master.HMaster: Not running balancer because 1 region(s) in transition: 
> [rit=OPENING, location=100.107.164.90,60020,1576553001648, table=table1w_7, 
> region=cf9a4ec6cd890aa6806fb313d71e2ebd]
> 2019-12-17 11:28:23,218 WARN  [ProcExecTimeout] assignment.AssignmentManager: 
> STUCK Region-In-Transition rit=OPENING, 
> location=100.107.164.90,60020,1576553001648, table=table1w_7, 
> region=cf9a4ec6cd890aa6806fb313d71e2ebd
> 2019-12-17 11:29:23,218 WARN  [ProcExecTimeout] assignment.AssignmentManager: 
> STUCK Region-In-Transition rit=OPENING, 
> location=100.107.164.90,60020,1576553001648, table=table1w_7, 
> region=cf9a4ec6cd890aa6806fb313d71e2ebd
> 2019-12-17 11:30:23,219 WARN  [ProcExecTimeout] assignment.AssignmentManager: 
> STUCK Region-In-Transition rit=OPENING, 
> location=100.107.164.90,60020,1576553001648, table=table1w_7, 
> region=cf9a4ec6cd890aa6806fb313d71e2ebd
> {code}
> RegionServer log:
> {code}
> $ grep "cf9a4ec6cd890aa6806fb313d71e2ebd" 
> hbase-hbaseadmin-regionserver-100.107.164.90.log
> 2019-12-17 11:24:22,972 INFO  
> [RS_OPEN_REGION-regionserver/100.107.164.90:60020-1] 
> handler.AssignRegionHandler: Open 
> table1w_7,user68694,1576484498244.cf9a4ec6cd890aa6806fb313d71e2ebd.
> 2019-12-17 11:24:22,972 INFO  
> [RS_OPEN_REGION-regionserver/100.107.164.90:60020-1] regionserver.HRegion: 
> filter rows disabled with empty filtering pattern for 
> region=table1w_7,user68694,1576484498244.cf9a4ec6cd890aa6806fb313d71e2ebd..
> 2019-12-17 11:24:22,983 INFO  
> [StoreOpener-cf9a4ec6cd890aa6806fb313d71e2ebd-1] hfile.CacheConfig: Created 
> cacheConfig: cacheDataOnRead=true, cacheDataOnWrite=false, 
> cacheIndexesOnWrite=false, cacheBloomsOnWrite=false, cacheEvictOnClose=true, 
> cacheDataCompressed=false, prefetchOnOpen=false for family {NAME => 'cf', 
> VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 
> 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', 
> DATA_BLOCK_ENCODING => 'DIFF', TTL => 'FOREVER', MIN_VERSIONS => '0', 
> REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 
> 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', 
> PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'SNAPPY', BLOCKCACHE => 
> 'true', BLOCKSIZE => '65536'} with 
> blockCache=org.apache.hadoop.hbase.io.hfile.CombinedBlockCache@1376a714
> 2019-12-17 11:24:22,983 INFO  
> [StoreOpener-cf9a4ec6cd890aa6806fb313d71e2ebd-1] 
> compactions.CompactionConfiguration: size [128 MB, 2 GB, 2 GB); files [7, 
> 25); ratio 1.200000; off-peak ratio 5.000000; throttle point 4294967296; 
> major period 0, major jitter 0.500000, min locality to compact 0.000000; 
> tiered compaction: max_age 9223372036854775807, incoming window min 6, 
> compaction policy for tiered window 
> org.apache.hadoop.hbase.regionserver.compactions.ExploringCompactionPolicy, 
> single output for minor true, compaction window factory 
> org.apache.hadoop.hbase.regionserver.compactions.ExponentialCompactionWindowFactory
> 2019-12-17 11:24:22,984 INFO  
> [StoreOpener-cf9a4ec6cd890aa6806fb313d71e2ebd-1] regionserver.HStore: 
> Store=cf,  memstore type=DefaultMemStore, storagePolicy=HOT, 
> verifyBulkLoads=false, parallelPutCountPrintThreshold=50, encoding=DIFF, 
> compression=SNAPPY
> 2019-12-17 11:24:22,987 INFO  
> [RS_OPEN_REGION-regionserver/100.107.164.90:60020-1] regionserver.HRegion: 
> Opened cf9a4ec6cd890aa6806fb313d71e2ebd; next sequenceid=17
> 2019-12-17 11:24:22,987 INFO  
> [RS_OPEN_REGION-regionserver/100.107.164.90:60020-1] 
> regionserver.HRegionServer: Post open deploy tasks for 
> table1w_7,user68694,1576484498244.cf9a4ec6cd890aa6806fb313d71e2ebd., 
> openProcId=243523, masterSystemTime=1576553062965
> 2019-12-17 11:24:22,990 INFO  
> [RS_OPEN_REGION-regionserver/100.107.164.90:60020-1] 
> handler.AssignRegionHandler: Opened 
> table1w_7,user68694,1576484498244.cf9a4ec6cd890aa6806fb313d71e2ebd.
> 2019-12-17 11:24:39,657 WARN  
> [RS_OPEN_REGION-regionserver/100.107.164.90:60020-1] 
> handler.AssignRegionHandler: Received OPEN for the 
> region:table1w_7,user68694,1576484498244.cf9a4ec6cd890aa6806fb313d71e2ebd., 
> which is already online
> {code}
> 2019-12-17 11:23:49,598 DEBUG [RegionServerTracker-0] 
> procedure2.ProcedureExecutor: Stored pid=187614, 
> state=RUNNABLE:SERVER_CRASH_START; ServerCrashProcedure 
> server=100.107.165.41,60020,1576552792328, splitWal=true, meta=true
> Process ServerCrashProcedure(100.107.165.41,60020,1576552792328) assign 
> region cf9a4ec6cd890aa6806fb313d71e2ebd,
> region assign to 100.107.176.215,60020,1576552834619, but failed, so retry 
> and assign to 100.107.164.90,60020,1576553001648 and open on 
> 100.107.164.90,60020,1576553001648 success.
> Later process ServerCrashProcedure(100.107.176.215,60020,1576552834619) and 
> find region cf9a4ec6cd890aa6806fb313d71e2ebd on this server also, than try 
> assign this region, and assign the region to 
> 100.107.164.90,60020,1576553001648 also, but 
> 100.107.164.90,60020,1576553001648 find it has open this region already, than 
> the assign stuck.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to