[ https://issues.apache.org/jira/browse/HBASE-21120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stack updated HBASE-21120: -------------------------- Attachment: HBASE-21120.apply-in-reverse.txt > MoveRegionProcedure makes no progress; goes to STUCK > ---------------------------------------------------- > > Key: HBASE-21120 > URL: https://issues.apache.org/jira/browse/HBASE-21120 > Project: HBase > Issue Type: Bug > Components: amv2 > Reporter: stack > Priority: Major > Attachments: HBASE-21120.apply-in-reverse.txt > > > This is tip of branch-2.0. I'm running cluster tests. I've seen this twice > now where a slowDeterministic schedules a move procedure just after startup > and we go dead-in-the-water never going beyond schedule of the unassign.We > even schedule subsequent moves but they go no where either. > {code} > ... > 2018-08-26 10:19:30,352 INFO [PEWorker-14] procedure2.ProcedureExecutor: > Finished pid=286, ppid=260, state=SUCCESS; MoveRegionProcedure > hri=fb79d389cf324aaf3ad4b686d3ad21d5, > source=ve0530.halxg.cloudera.com,16020,1535303391139, > destination=ve0530.halxg.cloudera.com,16020,1535303391139 in 10.8100sec > ... > // MASTER KILLED > .... > 2018-08-26 10:21:10,585 INFO [Thread-19] assignment.RegionStateStore: Load > hbase:meta entry region=fb79d389cf324aaf3ad4b686d3ad21d5, regionState=OPEN, > lastHost=ve0530.halxg.cloudera.com,16020,1535303391139, > regionLocation=ve0530.halxg.cloudera.com,16020,1535303391139, openSeqNum=42095 > ... > 2018-08-26 10:26:15,531 INFO > [RpcServer.default.FPBQ.Fifo.handler=46,queue=1,port=16000] master.HMaster: > Client=stack//10.17.240.20 move hri=fb79d389cf324aaf3ad4b686d3ad21d5, > source=ve0530.halxg.cloudera.com,16020,1535303391139, > destination=ve0534.halxg.cloudera.com,16020,1535303812180, running balancer > 2018-08-26 10:26:16,635 INFO [PEWorker-13] > procedure.MasterProcedureScheduler: pid=413, > state=RUNNABLE:MOVE_REGION_UNASSIGN; MoveRegionProcedure > hri=fb79d389cf324aaf3ad4b686d3ad21d5, > source=ve0530.halxg.cloudera.com,16020,1535303391139, > destination=ve0534.halxg.cloudera.com,16020,1535303812180 checking lock on > fb79d389cf324aaf3ad4b686d3ad21d5 > 2018-08-26 10:26:16,896 INFO [PEWorker-14] procedure2.ProcedureExecutor: > Finished pid=405, state=SUCCESS; ServerCrashProcedure > server=ve0540.halxg.cloudera.com,16020,1535303391120, splitWal=true, > meta=true in 25.8450sec > 2018-08-26 10:26:16,896 INFO [PEWorker-13] > procedure.MasterProcedureScheduler: pid=414, ppid=413, > state=RUNNABLE:REGION_TRANSITION_QUEUE; UnassignProcedure > table=IntegrationTestBigLinkedList, region=fb79d389cf324aaf3ad4b686d3ad21d5, > server=ve0530.halxg.cloudera.com,16020,1535303391139 checking lock on > fb79d389cf324aaf3ad4b686d3ad21d5 > 2018-08-26 10:26:29,308 INFO > [RpcServer.default.FPBQ.Fifo.handler=0,queue=0,port=16000] > master.ServerManager: Registering > regionserver=ve0540.halxg.cloudera.com,16020,1535304386779 > 2018-08-26 > 10:26:29,342 INFO [RegionServerTracker-0] master.RegionServerTracker: > RegionServer ephemeral node created, adding > [ve0540.halxg.cloudera.com,16020,1535304386779] > 2018-08-26 > 10:27:03,820 INFO > [ReadOnlyZKClient-ve0524.halxg.cloudera.com:2222@0x1da09819] > zookeeper.ZooKeeper: Session: 0x16577368c840054 closed > 2018-08-26 10:27:03,820 INFO > [ReadOnlyZKClient-ve0524.halxg.cloudera.com:2222@0x1da09819-EventThread] > zookeeper.ClientCnxn: EventThread shut down for session: 0x16577368c840054 > > 2018-08-26 10:27:16,411 INFO > [RpcServer.default.FPBQ.Fifo.handler=0,queue=0,port=16000] master.HMaster: > Client=stack//10.17.240.20 move hri=fb79d389cf324aaf3ad4b686d3ad21d5, > source=ve0530.halxg.cloudera.com,16020,1535303391139, > destination=ve0534.halxg.cloudera.com,16020,1535303812180, running balancer > > > 2018-08-26 > 10:27:17,001 INFO [PEWorker-13] procedure.MasterProcedureScheduler: pid=415, > state=RUNNABLE:MOVE_REGION_UNASSIGN; MoveRegionProcedure > hri=fb79d389cf324aaf3ad4b686d3ad21d5, source=ve0530.halxg > .cloudera.com,16020,1535303391139, > destination=ve0534.halxg.cloudera.com,16020,1535303812180 checking lock on > fb79d389cf324aaf3ad4b686d3ad21d5 > ... > {code} > Looking in UI, it says the below which has NO mention of pid=414 ... like it > disappeared!!! > {code} > 413 WAITING stack MoveRegionProcedure > hri=fb79d389cf324aaf3ad4b686d3ad21d5, > source=ve0530.halxg.cloudera.com,16020,1535303391139, > destination=ve0534.halxg.cloudera.com,16020,1535303812180 Sun Aug 26 > 10:26:15 PDT 2018 Sun Aug 26 10:26:16 PDT 2018 [ { state => [ > '1', '2' ] }, { regionId => '1535303851257', tableName => { namespace => > 'ZGVmYXVsdA==', qualifier => 'SW50ZWdyYXRpb25UZXN0QmlnTGlua2VkTGlzdA==' }, > startKey => 'mGGOysI+5tUhxwXKql8zRg==', endKey => 'nnnnnnnnnnA=', offline => > 'false', split => 'false', replicaId => '0' }, { sourceServer => { hostName > => 've0530.halxg.cloudera.com', port => '16020', startCode => '1535303391139' > }, destinationServer => { hostName => 've0534.halxg.cloudera.com', port => > '16020', startCode => '1535303812180' } } ] > 415 RUNNABLE stack MoveRegionProcedure > hri=fb79d389cf324aaf3ad4b686d3ad21d5, > source=ve0530.halxg.cloudera.com,16020,1535303391139, > destination=ve0534.halxg.cloudera.com,16020,1535303812180 Sun Aug 26 > 10:27:16 PDT 2018 Sun Aug 26 10:27:16 PDT 2018 [ {}, { regionId > => '1535303851257', tableName => { namespace => 'ZGVmYXVsdA==', qualifier => > 'SW50ZWdyYXRpb25UZXN0QmlnTGlua2VkTGlzdA==' }, startKey => > 'mGGOysI+5tUhxwXKql8zRg==', endKey => 'nnnnnnnnnnA=', offline => 'false', > split => 'false', replicaId => '0' }, { sourceServer => { hostName => > 've0530.halxg.cloudera.com', port => '16020', startCode => '1535303391139' }, > destinationServer => { hostName => 've0534.halxg.cloudera.com', port => > '16020', startCode => '1535303812180' } } ] > {code} > Looking at locks in UI, it says that pid=413 has the exclusive lock on this > region... > {code} > REGION: fb79d389cf324aaf3ad4b686d3ad21d5 > Lock type: EXCLUSIVE > Owner procedure: { ID => '413', PARENT_ID => '-1', STATE => 'WAITING', OWNER > => 'stack', TYPE => 'MoveRegionProcedure > hri=fb79d389cf324aaf3ad4b686d3ad21d5, > source=ve0530.halxg.cloudera.com,16020,1535303391139, > destination=ve0534.halxg.cloudera.com,16020,1535303812180', START_TIME => > 'Sun Aug 26 10:26:15 PDT 2018', LAST_UPDATE => 'Sun Aug 26 10:26:16 PDT > 2018', PARAMETERS => [ { state => [ '1', '2' ] }, { regionId => > '1535303851257', tableName => { namespace => 'ZGVmYXVsdA==', qualifier => > 'SW50ZWdyYXRpb25UZXN0QmlnTGlua2VkTGlzdA==' }, startKey => > 'mGGOysI+5tUhxwXKql8zRg==', endKey => 'nnnnnnnnnnA=', offline => 'false', > split => 'false', replicaId => '0' }, { sourceServer => { hostName => > 've0530.halxg.cloudera.com', port => '16020', startCode => '1535303391139' }, > destinationServer => { hostName => 've0534.halxg.cloudera.com', port => > '16020', startCode => '1535303812180' } } ] } > {code} > Is it that the child is not picking up the lock? Let me look at the WALs to > see what story with pid=414 is.... -- This message was sent by Atlassian JIRA (v7.6.3#76005)