[ 
https://issues.apache.org/jira/browse/HBASE-21120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-21120:
--------------------------
    Attachment: HBASE-21120.apply-in-reverse.txt

> MoveRegionProcedure makes no progress; goes to STUCK
> ----------------------------------------------------
>
>                 Key: HBASE-21120
>                 URL: https://issues.apache.org/jira/browse/HBASE-21120
>             Project: HBase
>          Issue Type: Bug
>          Components: amv2
>            Reporter: stack
>            Priority: Major
>         Attachments: HBASE-21120.apply-in-reverse.txt
>
>
> This is tip of branch-2.0. I'm running cluster tests. I've seen this twice 
> now where a slowDeterministic schedules a move procedure just after startup 
> and we go dead-in-the-water never going beyond schedule of the unassign.We 
> even schedule subsequent moves but they go no where either.
> {code}
> ...
> 2018-08-26 10:19:30,352 INFO  [PEWorker-14] procedure2.ProcedureExecutor: 
> Finished pid=286, ppid=260, state=SUCCESS; MoveRegionProcedure 
> hri=fb79d389cf324aaf3ad4b686d3ad21d5, 
> source=ve0530.halxg.cloudera.com,16020,1535303391139, 
> destination=ve0530.halxg.cloudera.com,16020,1535303391139 in 10.8100sec
> ...
> // MASTER KILLED
> ....
> 2018-08-26 10:21:10,585 INFO  [Thread-19] assignment.RegionStateStore: Load 
> hbase:meta entry region=fb79d389cf324aaf3ad4b686d3ad21d5, regionState=OPEN, 
> lastHost=ve0530.halxg.cloudera.com,16020,1535303391139, 
> regionLocation=ve0530.halxg.cloudera.com,16020,1535303391139, openSeqNum=42095
> ...
> 2018-08-26 10:26:15,531 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=46,queue=1,port=16000] master.HMaster: 
> Client=stack//10.17.240.20 move hri=fb79d389cf324aaf3ad4b686d3ad21d5, 
> source=ve0530.halxg.cloudera.com,16020,1535303391139, 
> destination=ve0534.halxg.cloudera.com,16020,1535303812180, running balancer
> 2018-08-26 10:26:16,635 INFO  [PEWorker-13] 
> procedure.MasterProcedureScheduler: pid=413, 
> state=RUNNABLE:MOVE_REGION_UNASSIGN; MoveRegionProcedure 
> hri=fb79d389cf324aaf3ad4b686d3ad21d5, 
> source=ve0530.halxg.cloudera.com,16020,1535303391139, 
> destination=ve0534.halxg.cloudera.com,16020,1535303812180 checking lock on 
> fb79d389cf324aaf3ad4b686d3ad21d5
> 2018-08-26 10:26:16,896 INFO  [PEWorker-14] procedure2.ProcedureExecutor: 
> Finished pid=405, state=SUCCESS; ServerCrashProcedure 
> server=ve0540.halxg.cloudera.com,16020,1535303391120, splitWal=true, 
> meta=true in 25.8450sec
> 2018-08-26 10:26:16,896 INFO  [PEWorker-13] 
> procedure.MasterProcedureScheduler: pid=414, ppid=413, 
> state=RUNNABLE:REGION_TRANSITION_QUEUE; UnassignProcedure 
> table=IntegrationTestBigLinkedList, region=fb79d389cf324aaf3ad4b686d3ad21d5, 
> server=ve0530.halxg.cloudera.com,16020,1535303391139 checking lock on 
> fb79d389cf324aaf3ad4b686d3ad21d5
> 2018-08-26 10:26:29,308 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=0,queue=0,port=16000] 
> master.ServerManager: Registering 
> regionserver=ve0540.halxg.cloudera.com,16020,1535304386779                    
>                                                               2018-08-26 
> 10:26:29,342 INFO  [RegionServerTracker-0] master.RegionServerTracker: 
> RegionServer ephemeral node created, adding 
> [ve0540.halxg.cloudera.com,16020,1535304386779]                               
>                                                            2018-08-26 
> 10:27:03,820 INFO  
> [ReadOnlyZKClient-ve0524.halxg.cloudera.com:2222@0x1da09819] 
> zookeeper.ZooKeeper: Session: 0x16577368c840054 closed
> 2018-08-26 10:27:03,820 INFO  
> [ReadOnlyZKClient-ve0524.halxg.cloudera.com:2222@0x1da09819-EventThread] 
> zookeeper.ClientCnxn: EventThread shut down for session: 0x16577368c840054    
>                                                                               
>     2018-08-26 10:27:16,411 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=0,queue=0,port=16000] master.HMaster: 
> Client=stack//10.17.240.20 move hri=fb79d389cf324aaf3ad4b686d3ad21d5, 
> source=ve0530.halxg.cloudera.com,16020,1535303391139, 
> destination=ve0534.halxg.cloudera.com,16020,1535303812180, running balancer   
>                                                                               
>                                                                               
>                                                                2018-08-26 
> 10:27:17,001 INFO  [PEWorker-13] procedure.MasterProcedureScheduler: pid=415, 
> state=RUNNABLE:MOVE_REGION_UNASSIGN; MoveRegionProcedure 
> hri=fb79d389cf324aaf3ad4b686d3ad21d5, source=ve0530.halxg
> .cloudera.com,16020,1535303391139, 
> destination=ve0534.halxg.cloudera.com,16020,1535303812180 checking lock on 
> fb79d389cf324aaf3ad4b686d3ad21d5
> ...
> {code}
> Looking in UI, it says the below which has NO mention of pid=414 ... like it 
> disappeared!!!
> {code}
> 413           WAITING stack   MoveRegionProcedure 
> hri=fb79d389cf324aaf3ad4b686d3ad21d5, 
> source=ve0530.halxg.cloudera.com,16020,1535303391139, 
> destination=ve0534.halxg.cloudera.com,16020,1535303812180       Sun Aug 26 
> 10:26:15 PDT 2018    Sun Aug 26 10:26:16 PDT 2018            [ { state => [ 
> '1', '2' ] }, { regionId => '1535303851257', tableName => { namespace => 
> 'ZGVmYXVsdA==', qualifier => 'SW50ZWdyYXRpb25UZXN0QmlnTGlua2VkTGlzdA==' }, 
> startKey => 'mGGOysI+5tUhxwXKql8zRg==', endKey => 'nnnnnnnnnnA=', offline => 
> 'false', split => 'false', replicaId => '0' }, { sourceServer => { hostName 
> => 've0530.halxg.cloudera.com', port => '16020', startCode => '1535303391139' 
> }, destinationServer => { hostName => 've0534.halxg.cloudera.com', port => 
> '16020', startCode => '1535303812180' } } ]
> 415           RUNNABLE        stack   MoveRegionProcedure 
> hri=fb79d389cf324aaf3ad4b686d3ad21d5, 
> source=ve0530.halxg.cloudera.com,16020,1535303391139, 
> destination=ve0534.halxg.cloudera.com,16020,1535303812180       Sun Aug 26 
> 10:27:16 PDT 2018    Sun Aug 26 10:27:16 PDT 2018            [ {}, { regionId 
> => '1535303851257', tableName => { namespace => 'ZGVmYXVsdA==', qualifier => 
> 'SW50ZWdyYXRpb25UZXN0QmlnTGlua2VkTGlzdA==' }, startKey => 
> 'mGGOysI+5tUhxwXKql8zRg==', endKey => 'nnnnnnnnnnA=', offline => 'false', 
> split => 'false', replicaId => '0' }, { sourceServer => { hostName => 
> 've0530.halxg.cloudera.com', port => '16020', startCode => '1535303391139' }, 
> destinationServer => { hostName => 've0534.halxg.cloudera.com', port => 
> '16020', startCode => '1535303812180' } } ]
> {code}
> Looking at locks in UI, it says that pid=413 has the exclusive lock on this 
> region...
> {code}
> REGION: fb79d389cf324aaf3ad4b686d3ad21d5
> Lock type: EXCLUSIVE
> Owner procedure: { ID => '413', PARENT_ID => '-1', STATE => 'WAITING', OWNER 
> => 'stack', TYPE => 'MoveRegionProcedure 
> hri=fb79d389cf324aaf3ad4b686d3ad21d5, 
> source=ve0530.halxg.cloudera.com,16020,1535303391139, 
> destination=ve0534.halxg.cloudera.com,16020,1535303812180', START_TIME => 
> 'Sun Aug 26 10:26:15 PDT 2018', LAST_UPDATE => 'Sun Aug 26 10:26:16 PDT 
> 2018', PARAMETERS => [ { state => [ '1', '2' ] }, { regionId => 
> '1535303851257', tableName => { namespace => 'ZGVmYXVsdA==', qualifier => 
> 'SW50ZWdyYXRpb25UZXN0QmlnTGlua2VkTGlzdA==' }, startKey => 
> 'mGGOysI+5tUhxwXKql8zRg==', endKey => 'nnnnnnnnnnA=', offline => 'false', 
> split => 'false', replicaId => '0' }, { sourceServer => { hostName => 
> 've0530.halxg.cloudera.com', port => '16020', startCode => '1535303391139' }, 
> destinationServer => { hostName => 've0534.halxg.cloudera.com', port => 
> '16020', startCode => '1535303812180' } } ] }
> {code}
> Is it that the child is not picking up the lock? Let me look at the WALs to  
> see what story with pid=414 is....



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to