[ https://issues.apache.org/jira/browse/HBASE-21625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sergey Shelukhin resolved HBASE-21625. -------------------------------------- Resolution: Cannot Reproduce Probably a dup of OpenRegionProcedure issues > a runnable procedure v2 does not run > ------------------------------------ > > Key: HBASE-21625 > URL: https://issues.apache.org/jira/browse/HBASE-21625 > Project: HBase > Issue Type: Bug > Components: amv2, proc-v2 > Affects Versions: 3.0.0 > Reporter: Sergey Shelukhin > Priority: Critical > > This is on master snapshot as of a few weeks ago. > Haven't looked at the code much yet, but it seems rather fundamental. The > procedure comes from meta replica assignment (HBASE-21624), in case it > matters w.r.t. the engine initialization; however, the master is functional > and other procedures run fine. I can also see lots of other open region > procedures with a similar patterns that were initialized before this one and > have run fine. > Currently, there are no other runnable procedures on master - a lot of > succeeded procedures since then, the parent blocked on this procedure, and > one unrelated RIT procedure waiting with timeout and being updated > periodically. > The procedure itself is > {noformat} > 157156 157155 RUNNABLE hadoop > org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure Wed Dec 19 > 17:20:27 PST 2018 Wed Dec 19 17:20:28 PST 2018 [ { region => { > regionId => '1', tableName => { ... }, startKey => '', endKey => '', offline > => 'false', split => 'false', replicaId => '1' }, targetServer => { hostName > => 'server1', port => '17020', startCode => '1545266805778' } }, {} ] > {noformat} > This is in PST so it's been like that for ~19 hours. > The only line involving this PID in the log is {noformat} > 2018-12-19 17:20:27,974 INFO [PEWorker-4] procedure2.ProcedureExecutor: > Initialized subprocedures=[{pid=157156, ppid=157155, state=RUNNABLE, > hasLock=false; org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure}] > {noformat} > There are no other useful logs for either this PID, parent PID, or region in > question since. This PEWorker (4) is also alive and did some work since then, > so it's not like the thread errored out somewhere. > All the PEWorker-s are waiting for work: > {noformat} > Thread 158 (PEWorker-16): > State: TIMED_WAITING > Blocked count: 1340 > Waited count: 5064 > Stack: > sun.misc.Unsafe.park(Native Method) > java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) > > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078) > > org.apache.hadoop.hbase.procedure2.AbstractProcedureScheduler.poll(AbstractProcedureScheduler.java:171) > > org.apache.hadoop.hbase.procedure2.AbstractProcedureScheduler.poll(AbstractProcedureScheduler.java:153) > > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1949) > {noformat} > The main assignment procedure for this region is blocked on it: > {noformat} > 157155 WAITING hadoop TransitRegionStateProcedure > table=hbase:meta, region=534574363, ASSIGN Wed Dec 19 17:20:27 PST 2018 > Wed Dec 19 17:20:27 PST 2018 [ { state => [ '1', '2', '3' ] }, { > regionId => '1', tableName => { ... }, startKey => '', endKey => '', offline > => 'false', split => 'false', replicaId => '1' }, { initialState => > 'REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE', lastState => > 'REGION_STATE_TRANSITION_CONFIRM_OPENED', assignCandidate => { hostName => > 'server1', port => '17020', startCode => '1545266805778' }, forceNewPlan => > 'false' } ] > 2018-12-19 17:20:27,673 INFO [PEWorker-9] > procedure.MasterProcedureScheduler: Took xlock for pid=157155, > state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE, hasLock=false; > TransitRegionStateProcedure table=hbase:meta, region=..., ASSIGN > 2018-12-19 17:20:27,809 INFO [PEWorker-9] > assignment.TransitRegionStateProcedure: Starting pid=157155, > state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE, hasLock=true; > TransitRegionStateProcedure table=hbase:meta, region=..., ASSIGN; > rit=OFFLINE, location=server1,17020,1545266805778; forceNewPlan=false, > retain=false > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)