[ https://issues.apache.org/jira/browse/HBASE-21787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16752874#comment-16752874 ]
Duo Zhang commented on HBASE-21787: ----------------------------------- There is a check at the beginning of TRSP, if it is not the one attached, it will quit immediately. So the problem here is that why a new TRSP is scheduled, while the old is still trying to open the region... > proc WAL replaces a RIT that holds a lock with a RIT that doesn't > ----------------------------------------------------------------- > > Key: HBASE-21787 > URL: https://issues.apache.org/jira/browse/HBASE-21787 > Project: HBase > Issue Type: Bug > Affects Versions: 3.0.0 > Reporter: Sergey Shelukhin > Priority: Critical > > This is not the same as HBASE-21786, but related - after master restart, 2 > RITs are both in proc WAL. According to the comment where RIT is restored, > this is expected. > However what happens is that master takes lock for the older RIT, and then > replaces the older RIT with the newer RIT on the region. > You can see two "to restore RIT" log lines. > Both RITs are still active in procedures view (and stuck due to yet another > bug that I will file later). However, it seems wrong that lock is held by one > RIT but region points to the other RIT as the correct one. > {noformat} > 2019-01-25 11:26:54,616 INFO [master/master:17000:becomeActiveMaster] > procedure.MasterProcedureScheduler: Took xlock for pid=1738, ppid=3, > state=WAITING:REGION_STATE_TRANSITION_CONFIRM_OPENED, hasLock=false; > TransitRegionStateProcedure table=table, > region=27f7ab2a05d9d730b2ab2339d1531b8e, ASSIGN > 2019-01-25 11:26:54,834 INFO [master/master:17000:becomeActiveMaster] > assignment.AssignmentManager: Attach pid=1738, ppid=3, > state=WAITING:REGION_STATE_TRANSITION_CONFIRM_OPENED, hasLock=false; > TransitRegionStateProcedure table=table, > region=27f7ab2a05d9d730b2ab2339d1531b8e, ASSIGN to rit=OFFLINE, > location=null, table=table, region=27f7ab2a05d9d730b2ab2339d1531b8e to > restore RIT > 2019-01-25 11:26:54,853 INFO [master/master:17000:becomeActiveMaster] > assignment.AssignmentManager: Attach pid=4351, > state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE, hasLock=false; > TransitRegionStateProcedure table=table, > region=27f7ab2a05d9d730b2ab2339d1531b8e, ASSIGN to rit=OFFLINE, > location=null, table=table, region=27f7ab2a05d9d730b2ab2339d1531b8e to > restore RIT > 2019-01-25 11:27:02,460 INFO [master/master:17000:becomeActiveMaster] > assignment.RegionStateStore: Load hbase:meta entry > region=27f7ab2a05d9d730b2ab2339d1531b8e, regionState=OPENING, > lastHost=server1,17020,1548290445704, > regionLocation=server2,17020,1548442571056, openSeqNum=120108 > 2019-01-25 11:27:10,184 INFO [PEWorker-11] > procedure.MasterProcedureScheduler: Waiting on xlock for pid=4351, > state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE, hasLock=false; > TransitRegionStateProcedure table=table, > region=27f7ab2a05d9d730b2ab2339d1531b8e, ASSIGN held by pid=1738 > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)