[ 
https://issues.apache.org/jira/browse/HBASE-21364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16660407#comment-16660407
 ] 

Duo Zhang commented on HBASE-21364:
-----------------------------------

It will not fail... Maybe it is because that on branch-2+ we will not start the 
workers until all the procedures have been loaded...

But I think the problem still exists. If a procedure which wants to hold the 
exclusive lock has been placed at first, and then there are procedures which 
holds the shared lock also in the queue, then we are likely to hang...

Let me dig more. Nice catch [~allan163].

> Procedure holds the lock should put to front of the queue after restart
> -----------------------------------------------------------------------
>
>                 Key: HBASE-21364
>                 URL: https://issues.apache.org/jira/browse/HBASE-21364
>             Project: HBase
>          Issue Type: Sub-task
>    Affects Versions: 2.1.0, 2.0.2
>            Reporter: Allan Yang
>            Assignee: Allan Yang
>            Priority: Major
>         Attachments: HBASE-21364.branch-2.0.001.patch
>
>
> After restore the procedures form Procedure WALs. We will put the runable 
> procedures back to the queue to execute. The order is not the problem before 
> HBASE-20846 since the first one to execute will acquire the lock itself. But 
> since the locks will restored after HBASE-20846. If we execute a procedure 
> without the lock first before a procedure with the lock in the same queue, 
> there is a race condition that we may not be able to execute all procedures 
> in the same queue at all.
> The race condtion is:
> 1. A procedure need to take the table's exclusive lock was put into the 
> table's queue, but the table's shard lock was lock by a Region Procedure. 
> Since no one takes the exclusive lock, the queue is put to run queue to 
> execute. But soon, the worker thread see the procedure can't execute because 
> it doesn't hold the lock, so it will stop execute and remove the queue from 
> run queue.
> 2. At the same time, the Region procedure which holds the table's shard lock 
> and the region's exclusive lock is put to the table's queue. But, since the 
> queue already added to the run queue, it won't add again.
> 3. Since 1, the table's queue was removed from the run queue.
> 4. Then, no one will put the table's queue back, thus no worker will execute 
> the procedures inside
> A test case in the patch shows how.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to