[ 
https://issues.apache.org/jira/browse/HBASE-7212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508125#comment-13508125
 ] 

Jonathan Hsieh commented on HBASE-7212:
---------------------------------------

Thanks for taking a look.  I'll get another rev out with cleaned up 
documentation in a day or two.  Answers below.

bq. Do you have to write it yourself? Anything already available that you could 
use? Say we used the zk curator client, it has a few barriers implemented 
already: https://github.com/Netflix/curator/wiki/Recipes If we were using 
curator say, could you use these receipes as building blocks so you didn't have 
to write this yourself? (This feature has to be backportable to 0.94?)

This is a simplified version of Jesse's patch.  I just gave curator a quick it 
is similar to the double barrier 
(https://github.com/Netflix/curator/wiki/Double-barrier). If it is implemented 
as the recipe you pointed out, I think we'd still need to add in the ability 
for cancellation/abort to come from any of the members.

bq. Reading the diagram, I"m not sure what receivedreached is. Or sendReached. 
sendReached is the coordinator saying all participants responded/are 
participating?

Yes -- reached is sent when the coordinator figures out that it has "reached" 
the global barrier point because all members have taken their part of the 
global barrier.

Basically, zk is being used for its async notifications and as the RPC 
mechanism.  Arrows into the ZK column are calls writing to ZK, arrows out of ZK 
are callbacks being called at the target.  So the red coordinator writes to zk 
via sendStart, zk node creation triggers a startNewOpearion callback on the the 
blue member1, and similarly on the the green member2.  These names are short 
hand for the names in the review was posted -- now sendStart -> 
sendBarrierStart, sendReached -> sendBarrierReached, startNewOperation -> 
Subprocedure's consturctor + acquireBarrier,  receiveReached -> 
receiveReachedGlobalBarrier 

bq. On your barrier you say "...but does not support ACID semantics" and thats 
ok because the 'transactions' we'll be running over this mechanism do not 
require it? Because they can proceed and complete in any order and result will 
come off the same?

Previously, this code was called TwoPhaseCommit (2pc).  While it had two 
phases, the code did not implement true two phase commit.  The purpose of this 
explicit comparison is to make clear 2pc's purpose (distributed ACID 
guarantees), to point out that we don't have 2pc here, to point out that we 
don't need 2pc here, and to point out that we just need a global barrier.  

The online snapshot coordination does not need all of what 2pc provides. The 
first cut will  have "only on a sunny day" semantics -- e.g. it will only 
succeed if everything succeeds and if anything fails along the way whole 
attempt will be aborted.  This is ok because the durable work that snapshots 
does goes into tmp dir (/hbase/.snapshots/.tmp/xxx) that is "commited" at the 
end atomically via HDFS dir rename, and that durable intermediate operation 
(e.g. new files from forcing a hlog roll or hlog flush) don't need to be undone 
to remain correct.  

bq. You say "....Does not recover on failures" ... because the operation just 
FAILs. Right?

Yup.  

bq. Only one of these precedures can be ongoingn at any one time? Is that right?

True for this first cut implementation, but not a fundamental limitation.  This 
actually gets enforced at the snapshot manager level which may be visible in 
HBASE-7208 and definitely in HBASE-6866 when that gets posted.    I believe as 
implemented if we picked a different class we could have multiple different 
kinds of procedure concurrently running on a different znode dir hierarchy.

bq. How do I read these set of slides? There is a 'Barrier Procedure 
Coordination' and then there is 'Procedure Coordination'? So, the PC makes use 
of a BPC? BPC is the skeleton you hang PC on?

All those are synonymous -- I've bee using procedure as a shorthand.  The code 
implements one framework for a globally barriered procedure, and I've just 
tried to call it 'procedure' and 'subprocedure' everywhere (though from review 
I missed spots where it was called task, operation, or commit).  This 
'procedure' takes care of the global barrier coordination and cross process 
error propagation.  

bq. Why you say this 'If we aren’t doing proper 2PC do we need all this 
infrastructure?'? Are you making a case for our not needing 2PC given what is 
being implemented?

I could probably remove that line -- I'm now convinced why we need what this 
code does.  

The main questions I had when I was initially understanding the previous 
implementation was "Is this 2pc?" and "Do we need 2pc?".  The answers are: what 
we have implemented here has two phases but is *not* true two-phase commit.  
2pc, as defined in the literature 
(http://www.cs.berkeley.edu/~brewer/cs262/Aries.pdf), requires that once the 
coordinator says something is committed, any failures at a member or 
coordinator must be recover by failing forward and completing it.  The key 
point here is that while we will need a global barrier for one of the snapshot 
flavors (global), it don't need full 2PC because 1) the we don't need to undo 
work (like a log roll or flush) if some sub part of the first phase (our 
acquire/2pc's prepare) fails, and because 2) we don't need to recover failing 
forward if anything fails in the second phase (our release/2pc's commit).  In 
the latter case we just fail and delete .snapshot/.tmp reminants in the fs, and 
carry on with extra flushed/rolled hlogs.

bq. Coordinator can be any client? Does not have to be master?

It could be anywhere, but currently for snapshots the coordinator lives on the 
master.

bq. What is ProcedureCoordinateComms?

This is actually a layer that separate the zk code (the rpc communications or 
comms code) from specific execution (snapshotting specific code).  I could 
probably remove it, but the abstraction allows for testing the core pieces 
without zk.

bq. Does this barrier acquistion have any relation to zk barrier receipe? 
http://zookeeper.apache.org/doc/trunk/recipes.html#sc_recipes_eventHandles

Yes.  It is very similar to the double barrier.  The main thing different here 
is this code allows for any member or coordinator to abort/cancel the whole 
shebang while the recipe doesn't seem to.  From the recipe it seems that we 
could be a little bit more clever about how we use our znodes.  (we might have 
one extra set).

bq. What is 'class' in the zk node hierarchy? Class of procedure?

The online-snapshots is a 'class' (e.g. all online snapshots) while a procedure 
name is an actual name for a particular snapshotting request (snapshot121201, 
snapshot121202 etc).  Off the top of my head I can't think of any other HBase 
processes that are ok with the procedure mechanism's semantics (other 
operations like enabling, disabling, schema change, splitting, merging probably 
want 2pc and its recovery requirements).  I think this extra znode dir could 
probably get removed.
                
> Globally Barriered Procedure mechanism
> --------------------------------------
>
>                 Key: HBASE-7212
>                 URL: https://issues.apache.org/jira/browse/HBASE-7212
>             Project: HBase
>          Issue Type: Sub-task
>          Components: snapshots
>    Affects Versions: hbase-6055
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>             Fix For: hbase-6055
>
>         Attachments: 121127-global-barrier-proc.pdf, hbase-7212.patch, 
> pre-hbase-7212.patch
>
>
> This is a simplified version of what was proposed in HBASE-6573.  Instead of 
> claiming to be a 2pc or 3pc implementation (which implies logging at each 
> actor, and recovery operations) this is just provides a best effort global 
> barrier mechanism called a Procedure.  
> Users need only to implement a methods to acquireBarrier, to act when 
> insideBarrier, and to releaseBarrier that use the ExternalException 
> cooperative error checking mechanism.
> Globally consistent snapshots require the ability to quiesce writes to a set 
> of region servers before a the snapshot operation is executed.  Also if any 
> node fails, it needs to be able to notify them so that they abort.
> The first cut of other online snapshots don't need the fully barrier but may 
> still use this for its error propagation mechanisms.
> This version removes the extra layer incurred in the previous implementation 
> due to the use of generics, separates the coordinator and members, and 
> reduces the amount of inheritance used in favor of composition.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to