[ 
https://issues.apache.org/jira/browse/HBASE-20828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16533239#comment-16533239
 ] 

Duo Zhang commented on HBASE-20828:
-----------------------------------

[~allan163] has found a problem when restarting master, that we do not restore 
the locks when loading procedures. And then we  found that, the assumption in 
MasterProcedureScheduler.waitRegions is not correct, as the parent procedure of 
RegionTransitionProcedure may not have the table lock(think of SCP).

So here I think there are two problems which need to be fixed.

First is that, we need to restore the locks when loading procedures. A first 
thought is that, after loading all the procedures and the procedure execution 
stacks, we scan all the procedures which have sub procedures, and then for 
every stack, we start from the root procedure, test the holdLock method, if it 
returns true, then we will call the acquireLock method of it to get the lock. 
Not sure if there are still corner cases. [~allan163] PTAL.

And for the waitRegions method, I think we should apply the patch in 
HBASE-20846, i.e, always try to acquire the shared lock. But the implementation 
of procedure lock needs a bit modification. If the parent procedure already 
held the exclusive lock, instead of returning false to let the procedure wait, 
we should return true to let the procedure go on. The locks which have already 
been held by parent procedures should also be considered as held by sub 
procedures. This is OK as we can make sure that the parent procedure will not 
release the lock before the sub procedures, as it can only be executed again 
after all the sub procedures have finished.

> Finish-up AMv2 Design/List of Tenets/Specification of operation
> ---------------------------------------------------------------
>
>                 Key: HBASE-20828
>                 URL: https://issues.apache.org/jira/browse/HBASE-20828
>             Project: HBase
>          Issue Type: Umbrella
>          Components: amv2
>            Reporter: stack
>            Priority: Major
>
> AMv2 is missing specification. There are too many grey-areas still. Also 
> missing are a concise listing of the tenets of AMv2 operation. Here are some 
> examples:
>  * HBASE-19529 "Handle null states in AM": Asks how we should treat null 
> state in hbase:meta. What does it 'mean'. We seem to treat it differently 
> dependent on context. Needs clarification. [~Apache9] recently asked similar 
> about the meaning of OFFLINE.
>  * Logging needs to have a particular form to help trace Procedure progress; 
> needs a write-up.
> Lets fill in items to address in this umbrella issue. Can address in 
> subissues and produce specification doc too. We have the below but these are 
> mostly (incomplete) description for devs on pv2 and amv2; the specification 
> is missing:
> http://hbase.apache.org/book.html#pv2
> http://hbase.apache.org/book.html#amv2
> (Other areas include addressing what is up w/ rollback -- when, how much, and 
> when it is not appropriate -- as well as recommendation on Procedures 
> coarseness, locking -- is it ok to lock table in alter table procedure for 
> the life of the procedure? -- and so on).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to