[ https://issues.apache.org/jira/browse/HBASE-21078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16589370#comment-16589370 ]
stack commented on HBASE-21078: ------------------------------- Not sure if this my patch but on cluster I see a strangeness where a MP starts up its UP and then it hangs here and is never scheduled again: {code} 2018-08-22 11:05:29,929 INFO [RpcServer.default.FPBQ.Fifo.handler=3,queue=3,port=16000] master.HMaster: Client=stack//10.17.240.20 move hri=e5ed5607d30e9813aa7206048d5a94fd, source=ve0530.halxg.cloudera.com,16020,1534960226278, destination=ve0540.halxg.cloudera.com,16020,1534960226357, running balancer 2018-08-22 11:05:30,144 INFO [PEWorker-1] procedure.MasterProcedureScheduler: pid=288, state=RUNNABLE:MOVE_REGION_UNASSIGN, hasLock=false; MoveRegionProcedure hri=e5ed5607d30e9813aa7206048d5a94fd, source=ve0530.halxg.cloudera.com,16020,1534960226278, destination=ve0540.halxg.cloudera.com,16020,1534960226357 checking lock on e5ed5607d30e9813aa7206048d5a94fd 2018-08-22 11:05:30,195 INFO [PEWorker-1] procedure2.ProcedureExecutor: Initialized subprocedures=[{pid=289, ppid=288, state=RUNNABLE:REGION_TRANSITION_QUEUE, hasLock=false; UnassignProcedure table=IntegrationTestBigLinkedList, region=e5ed5607d30e9813aa7206048d5a94fd, server=ve0530.halxg.cloudera.com,16020,1534960226278}] 2018-08-22 11:05:30,252 INFO [PEWorker-1] procedure.MasterProcedureScheduler: pid=289, ppid=288, state=RUNNABLE:REGION_TRANSITION_QUEUE, hasLock=false; UnassignProcedure table=IntegrationTestBigLinkedList, region=e5ed5607d30e9813aa7206048d5a94fd, server=ve0530.halxg.cloudera.com,16020,1534960226278 checking lock on e5ed5607d30e9813aa7206048d5a94fd {code} It has the lock on the region and won't let go (from UI): {code} REGION: e5ed5607d30e9813aa7206048d5a94fd Lock type: EXCLUSIVE Owner procedure: { ID => '288', PARENT_ID => '-1', STATE => 'WAITING', OWNER => 'stack', TYPE => 'MoveRegionProcedure hri=e5ed5607d30e9813aa7206048d5a94fd, source=ve0530.halxg.cloudera.com,16020,1534960226278, destination=ve0540.halxg.cloudera.com,16020,1534960226357', START_TIME => 'Wed Aug 22 11:05:29 PDT 2018', LAST_UPDATE => 'Wed Aug 22 11:05:30 PDT 2018', PARAMETERS => [ { state => [ '1', '2' ] }, { regionId => '1534960773088', tableName => { namespace => 'ZGVmYXVsdA==', qualifier => 'SW50ZWdyYXRpb25UZXN0QmlnTGlua2VkTGlzdA==' }, startKey => 'BhjRNw==', endKey => 'DDDDDDDDDDA=', offline => 'false', split => 'false', replicaId => '0' }, { sourceServer => { hostName => 've0530.halxg.cloudera.com', port => '16020', startCode => '1534960226278' }, destinationServer => { hostName => 've0540.halxg.cloudera.com', port => '16020', startCode => '1534960226357' } } ] } {code} > [amv2] CODE-BUG NPE in RTP doing Unassign > ----------------------------------------- > > Key: HBASE-21078 > URL: https://issues.apache.org/jira/browse/HBASE-21078 > Project: HBase > Issue Type: Bug > Components: amv2 > Affects Versions: 2.0.1 > Reporter: stack > Assignee: stack > Priority: Major > Fix For: 2.0.2 > > Attachments: HBASE-21078.branch-2.0.001.patch, > HBASE-21078.branch-2.0.002.patch, HBASE-21078.branch-2.0.003.patch > > > Saw this is a run against tip of branch-2.0. The region had just finished > being split when the move goes to run. > {code} > 2018-08-18 16:55:14,908 INFO [PEWorker-2] procedure2.ProcedureExecutor: > Finished pid=2028, state=SUCCESS, hasLock=false; SplitTableRegionProcedure > table=IntegrationTestBigLinkedList, parent=c3f199b5af62ae2ff8f8b6426b21d95d, > daughterA=31ccbf098ae615ce30f28ec84c956b8f, > daughterB=1890b4c96736f223f31efef11c817c90 in 9.0090sec > 2018-08-18 16:55:14,908 INFO [PEWorker-16] > procedure.MasterProcedureScheduler: pid=2038, ppid=2030, > state=RUNNABLE:MOVE_REGION_UNASSIGN, hasLock=false; MoveRegionProcedure > hri=c3f199b5af62ae2ff8f8b6426b21d95d, > source=ve0540.halxg.cloudera.com,16020,1534632630737, > destination=ve0540.halxg.cloudera.com,16020,1534632630737 checking lock on > c3f199b5af62ae2ff8f8b6426b21d95d > 2018-08-18 16:55:14,958 INFO [PEWorker-16] procedure2.ProcedureExecutor: > Initialized subprocedures=[{pid=2095, ppid=2038, > state=RUNNABLE:REGION_TRANSITION_DISPATCH, hasLock=false; UnassignProcedure > table=IntegrationTestBigLinkedList, region=c3f199b5af62ae2ff8f8b6426b21d95d, > server=ve0540.halxg.cloudera.com,16020,1534632630737}] > 2018-08-18 16:55:15,008 INFO [PEWorker-3] > procedure.MasterProcedureScheduler: pid=2095, ppid=2038, > state=RUNNABLE:REGION_TRANSITION_DISPATCH, hasLock=false; UnassignProcedure > table=IntegrationTestBigLinkedList, region=c3f199b5af62ae2ff8f8b6426b21d95d, > server=ve0540.halxg.cloudera.com,16020,1534632630737 checking lock on > c3f199b5af62ae2ff8f8b6426b21d95d > 2018-08-18 16:55:15,085 ERROR [PEWorker-3] procedure2.ProcedureExecutor: > CODE-BUG: Uncaught runtime exception: pid=2095, ppid=2038, > state=RUNNABLE:REGION_TRANSITION_DISPATCH, hasLock=true; UnassignProcedure > table=IntegrationTestBigLinkedList, region=c3f199b5af62ae2ff8f8b6426b21d95d, > server=ve0540.halxg.cloudera.com,16020,1534632630737 > java.lang.NullPointerException > at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:936) > at > org.apache.hadoop.hbase.master.assignment.RegionStates.getOrCreateServer(RegionStates.java:1097) > at > org.apache.hadoop.hbase.master.assignment.RegionStates.addRegionToServer(RegionStates.java:1125) > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.markRegionAsClosing(AssignmentManager.java:1477) > at > org.apache.hadoop.hbase.master.assignment.UnassignProcedure.updateTransition(UnassignProcedure.java:204) > at > org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:345) > at > org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:97) > at > org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:873) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1556) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1344) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:76) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1854) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)