[ https://issues.apache.org/jira/browse/HBASE-21213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16627437#comment-16627437 ]
stack commented on HBASE-21213: ------------------------------- [~allan163] Thanks for bringing this up. I suppose we need to declare that hbck2 only works for 2.1.1 onward. 2.1.1 is when the HbckService shows up (I should verify that this stuff showing up on a minor version does not break upgrades). I need to add to hbck2 a version check, one we can run to check the remote cluster has new facilities as we add them to hbck. How does this sound [~allan163]/[~Apache9] (Can't wait till 2.2.x because 2.2.x has the awkward upgrade... that is my thinking at least). > [hbck2] bypass leaves behind state in RegionStates when assign/unassign > ----------------------------------------------------------------------- > > Key: HBASE-21213 > URL: https://issues.apache.org/jira/browse/HBASE-21213 > Project: HBase > Issue Type: Bug > Components: amv2, hbck2 > Reporter: stack > Assignee: stack > Priority: Major > Fix For: 2.1.1 > > Attachments: HBASE-21213.branch-2.1.001.patch, > HBASE-21213.branch-2.1.002.patch, HBASE-21213.branch-2.1.003.patch, > HBASE-21213.branch-2.1.004.patch, HBASE-21213.branch-2.1.005.patch, > HBASE-21213.branch-2.1.006.patch > > > This is a follow-on from HBASE-21083 which added the 'bypass' functionality. > On bypass, there is more state to be cleared if we are allow new Procedures > to be scheduled. > For example, here is a bypass: > {code} > 2018-09-20 05:45:43,722 INFO org.apache.hadoop.hbase.procedure2.Procedure: > pid=100449, state=RUNNABLE:REGION_TRANSITION_DISPATCH, locked=true, > bypass=LOG-REDACTED UnassignProcedure table=hbase:namespace, > region=37cc206fe9c4bc1c0a46a34c5f523d16, > server=ve1233.halxg.cloudera.com,22101,1537397961664 bypassed, returning null > to finish it > 2018-09-20 05:45:44,022 INFO > org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished pid=100449, > state=SUCCESS, bypass=LOG-REDACTED UnassignProcedure table=hbase:namespace, > region=37cc206fe9c4bc1c0a46a34c5f523d16, > server=ve1233.halxg.cloudera.com,22101,1537397961664 in 2mins, 7.618sec > {code} > ... but then when I try to assign the bypassed region later, I get this: > {code} > 2018-09-20 05:46:31,435 WARN > org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure: There is > already another procedure running on this region this=pid=100450, > state=RUNNABLE:REGION_TRANSITION_QUEUE, locked=true; AssignProcedure > table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16 > owner=pid=100449, state=SUCCESS, bypass=LOG-REDACTED UnassignProcedure > table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16, > server=ve1233.halxg.cloudera.com,22101,1537397961664 pid=100450, > state=RUNNABLE:REGION_TRANSITION_QUEUE, locked=true; AssignProcedure > table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16; rit=OPENING, > location=ve1233.halxg.cloudera.com,22101,1537397961664 > 2018-09-20 05:46:31,510 INFO > org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Rolled back pid=100450, > state=ROLLEDBACK, > exception=org.apache.hadoop.hbase.procedure2.ProcedureAbortedException via > AssignProcedure:org.apache.hadoop.hbase.procedure2.ProcedureAbortedException: > There is already another procedure running on this region this=pid=100450, > state=RUNNABLE:REGION_TRANSITION_QUEUE, locked=true; AssignProcedure > table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16 > owner=pid=100449, state=SUCCESS, bypass=LOG-REDACTED UnassignProcedure > table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16, > server=ve1233.halxg.cloudera.com,22101,1537397961664; AssignProcedure > table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16 > exec-time=473msec > {code} > ... which is a long-winded way of saying the Unassign Procedure still exists > still in RegionStateNodes. -- This message was sent by Atlassian JIRA (v7.6.3#76005)