Allan Yang created HBASE-21364:
----------------------------------
Summary: Procedure holds the lock should put to front of the queue
after restart
Key: HBASE-21364
URL: https://issues.apache.org/jira/browse/HBASE-21364
Project: HBase
Issue Type: Sub-task
Affects Versions: 2.0.2, 2.1.0
Reporter: Allan Yang
Assignee: Allan Yang
After restore the procedures form Procedure WALs. We will put the runable
procedures back to the queue to execute. The order is not the problem before
HBASE-20846 since the first one to execute will acquire the lock itself. But
since the locks will restored after HBASE-20846. If we execute a procedure
without the lock first before a procedure with the lock in the same queue,
there is a race condition that we may not be able to execute all procedures in
the same queue at all.
The race condtion is:
1. A procedure need to take the table's exclusive lock was put into the table's
queue, but the table's shard lock was lock by a Region Procedure. Since no one
takes the exclusive lock, the queue is put to run queue to execute. But soon,
the worker thread see the procedure can't execute because it doesn't hold the
lock, so it will stop execute and remove the queue from run queue.
2. At the same time, the Region procedure which holds the table's shard lock
and the region's exclusive lock is put to the table's queue. But, since the
queue already added to the run queue, it won't add again.
3. Since 1, the table's queue was removed from the run queue.
4. Then, no one will put the table's queue back, thus no worker will execute
the procedures inside
A test case in the patch shows how.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)