[ https://issues.apache.org/jira/browse/HBASE-21050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16580710#comment-16580710 ]
stack commented on HBASE-21050: ------------------------------- [~allan163] Let me give it a go then... will be back soon. > Exclusive lock may be held by a SUCCESS state procedure forever > --------------------------------------------------------------- > > Key: HBASE-21050 > URL: https://issues.apache.org/jira/browse/HBASE-21050 > Project: HBase > Issue Type: Sub-task > Components: amv2 > Affects Versions: 2.1.0, 2.0.1 > Reporter: Allan Yang > Assignee: Allan Yang > Priority: Major > Attachments: HBASE-21050.branch-2.0.001.patch > > > After HBASE-20846, we restore lock info for procedures. But, there is a case > that the lock and be held by a already success procedure. Since the procedure > won't execute again, the lock will held by the procedure forever. > 1. All children for pid=1208 had been finished, but before procedure 1208 > awake, the master was killed > {code} > 2018-08-05 02:20:14,465 INFO [PEWorker-8] > procedure2.ProcedureExecutor(1659): Finished subprocedure(s) of pid=1208, > ppid=1206, state=RUNNABLE, hasLock=true; MoveRegionProcedure > hri=c2a23a735f16df57299 > dba6fd4599f2f, source=e010125050127.bja,60020,1533403109034, > destination=e010125050127.bja,60020,1533403109034; resume parent processing. > 2018-08-05 02:20:14,466 INFO [PEWorker-8] > procedure2.ProcedureExecutor(1296): Finished pid=1232, ppid=1208, > state=SUCCESS, hasLock=false; AssignProcedure > table=IntegrationTestBigLinkedList, region=c2a > 23a735f16df57299dba6fd4599f2f, target=e010125050127.bja,60020,1533403109034 > in 1.5060sec > {code} > 2. Master restarts, since procedure 1208 held the lock before restart, so the > lock was resotore for it > {code} > 2018-08-05 02:20:30,803 DEBUG [Thread-15] procedure2.ProcedureExecutor(456): > Loading pid=1208, ppid=1206, state=SUCCESS, hasLock=false; > MoveRegionProcedure hri=c2a23a735f16df57299dba6fd4599f2f, source= > e010125050127.bja,60020,1533403109034, > destination=e010125050127.bja,60020,1533403109034 > 2018-08-05 02:20:30,818 DEBUG [Thread-15] procedure2.Procedure(898): > pid=1208, ppid=1206, state=SUCCESS, hasLock=false; MoveRegionProcedure > hri=c2a23a735f16df57299dba6fd4599f2f, source=e010125050127.bj > a,60020,1533403109034, destination=e010125050127.bja,60020,1533403109034 held > the lock before restarting, call acquireLock to restore it. > 2018-08-05 02:20:30,818 INFO [Thread-15] > procedure.MasterProcedureScheduler(631): pid=1208, ppid=1206, state=SUCCESS, > hasLock=false; MoveRegionProcedure hri=c2a23a735f16df57299dba6fd4599f2f, > source=e0 > 10125050127.bja,60020,1533403109034, > destination=e010125050127.bja,60020,1533403109034 checking lock on > c2a23a735f16df57299dba6fd4599f2f > {code} > 3. Since procedure 1208 is success, it won't execute later, so the lock will > be held by it forever > We need to check the state of the procedure before restoring locks, if the > procedure is already finished (success or rollback), we do not need to > acquire lock for it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)