[ https://issues.apache.org/jira/browse/HBASE-20202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stack updated HBASE-20202: -------------------------- Resolution: Fixed Status: Resolved (was: Patch Available) Fixed the checkstyle and pushed. I've been running this patch over the last day at scale so it bascially works (may have fixed the issue seen above... need more time to figure it). > [AMv2] Don't move region if its a split parent or offlined > ---------------------------------------------------------- > > Key: HBASE-20202 > URL: https://issues.apache.org/jira/browse/HBASE-20202 > Project: HBase > Issue Type: Sub-task > Components: amv2 > Affects Versions: 2.0.0-beta-2 > Reporter: stack > Assignee: stack > Priority: Critical > Fix For: 2.0.0 > > Attachments: HBASE-20202.branch-2.001.patch, > HBASE-20202.branch-2.002.patch > > > Found this one running ITBLLs. We'd just finished splitting a region > 91655de06786f786b0ee9c51280e1ee6 and then a move for it comes in. The move > fails in an interesting way. The location has been removed from the > regionnode kept by the Master. HBASE-20178 adds macro checks on context. Need > to add a few checks to the likes of MoveRegionProcedure so we don't try to > move an offlined/split parent. > {code} > 2018-03-14 10:21:45,678 INFO [PEWorker-2] procedure2.ProcedureExecutor: > Finished pid=3177, state=SUCCESS; SplitTableRegionProcedure > table=IntegrationTestBigLinkedList, parent=91655de06786f786b0ee9c51280e1ee6, > daughterA=b67bf6b79eaa83de788b0519f782ce8e, > daughterB=99cf6ddb38cad08e3aa7635b6cac2e7b in 10.0210sec > 2018-03-14 10:21:45,679 INFO [PEWorker-15] > procedure.MasterProcedureScheduler: pid=3194, ppid=3193, > state=RUNNABLE:REGION_TRANSITION_DISPATCH; UnassignProcedure > table=IntegrationTestBigLinkedList, region=af198ca64b196fb3d2f5b3e815b2dad0, > server=ve0530.halxg.cloudera.com,16020,1521007509855, > IntegrationTestBigLinkedList,\xAA\xAA\xAA\xAA\xAA\xAA\xAA\xA0,1521047891276.af198ca64b196fb3d2f5b3e815b2dad0. > 2018-03-14 10:21:45,680 INFO [PEWorker-5] > procedure.MasterProcedureScheduler: pid=3187, > state=RUNNABLE:MOVE_REGION_UNASSIGN; MoveRegionProcedure > hri=IntegrationTestBigLinkedList,\x0C0\xC3\x0C0\xC3\x0C0,1521045713137.91655de06786f786b0ee9c51280e1ee6., > source=ve0530.halxg.cloudera.com,16020,1521007509855, > destination=ve0528.halxg.cloudera.com,16020,1521047890874, > IntegrationTestBigLinkedList,\x0C0\xC3\x0C0\xC3\x0C0,1521045713137.91655de06786f786b0ee9c51280e1ee6. > 2018-03-14 10:21:45,680 INFO [PEWorker-15] assignment.RegionStateStore: > pid=3194 updating hbase:meta > row=IntegrationTestBigLinkedList,\xAA\xAA\xAA\xAA\xAA\xAA\xAA\xA0,1521047891276.af198ca64b196fb3d2f5b3e815b2dad0., > regionState=CLOSING > 2018-03-14 10:21:45,680 INFO [PEWorker-5] procedure2.ProcedureExecutor: > Initialized subprocedures=[{pid=3195, ppid=3187, > state=RUNNABLE:REGION_TRANSITION_DISPATCH; UnassignProcedure > table=IntegrationTestBigLinkedList, region=91655de06786f786b0ee9c51280e1ee6, > server=ve0530.halxg.cloudera.com,16020,1521007509855}] > 2018-03-14 10:21:45,683 INFO [PEWorker-15] > assignment.RegionTransitionProcedure: Dispatch pid=3194, ppid=3193, > state=RUNNABLE:REGION_TRANSITION_DISPATCH; UnassignProcedure > table=IntegrationTestBigLinkedList, region=af198ca64b196fb3d2f5b3e815b2dad0, > server=ve0530.halxg.cloudera.com,16020,1521007509855; rit=CLOSING, > location=ve0530.halxg.cloudera.com,16020,1521007509855 > 2018-03-14 10:21:45,752 INFO [PEWorker-15] > procedure.MasterProcedureScheduler: pid=3195, ppid=3187, > state=RUNNABLE:REGION_TRANSITION_DISPATCH; UnassignProcedure > table=IntegrationTestBigLinkedList, region=91655de06786f786b0ee9c51280e1ee6, > server=ve0530.halxg.cloudera.com,16020,1521007509855, > IntegrationTestBigLinkedList,\x0C0\xC3\x0C0\xC3\x0C0,1521045713137.91655de06786f786b0ee9c51280e1ee6. > 2018-03-14 10:21:45,753 ERROR [PEWorker-15] procedure2.ProcedureExecutor: > CODE-BUG: Uncaught runtime exception: pid=3195, ppid=3187, > state=RUNNABLE:REGION_TRANSITION_DISPATCH; UnassignProcedure > table=IntegrationTestBigLinkedList, region=91655de06786f786b0ee9c51280e1ee6, > server=ve0530.halxg.cloudera.com,16020,1521007509855 > java.lang.NullPointerException > > > > at > java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:936) > at > org.apache.hadoop.hbase.master.assignment.RegionStates.getOrCreateServer(RegionStates.java:934) > at > org.apache.hadoop.hbase.master.assignment.RegionStates.addRegionToServer(RegionStates.java:962) > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.markRegionAsClosing(AssignmentManager.java:1548) > > > at > org.apache.hadoop.hbase.master.assignment.UnassignProcedure.updateTransition(UnassignProcedure.java:197) > at > org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:304) > > > at > org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:86) > at > org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:845) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1452) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1221) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1740) > > > > 2018-03-14 10:21:45,835 DEBUG [RSProcedureDispatcher-pool3-t22] > ipc.NettyRpcConnection: Connecting to > ve0530.halxg.cloudera.com/10.17.240.23:16020 > {code} > Will work on this after HBASE-20178 -- This message was sent by Atlassian JIRA (v7.6.3#76005)