[ https://issues.apache.org/jira/browse/HBASE-7212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508125#comment-13508125 ]
Jonathan Hsieh commented on HBASE-7212: --------------------------------------- Thanks for taking a look. I'll get another rev out with cleaned up documentation in a day or two. Answers below. bq. Do you have to write it yourself? Anything already available that you could use? Say we used the zk curator client, it has a few barriers implemented already: https://github.com/Netflix/curator/wiki/Recipes If we were using curator say, could you use these receipes as building blocks so you didn't have to write this yourself? (This feature has to be backportable to 0.94?) This is a simplified version of Jesse's patch. I just gave curator a quick it is similar to the double barrier (https://github.com/Netflix/curator/wiki/Double-barrier). If it is implemented as the recipe you pointed out, I think we'd still need to add in the ability for cancellation/abort to come from any of the members. bq. Reading the diagram, I"m not sure what receivedreached is. Or sendReached. sendReached is the coordinator saying all participants responded/are participating? Yes -- reached is sent when the coordinator figures out that it has "reached" the global barrier point because all members have taken their part of the global barrier. Basically, zk is being used for its async notifications and as the RPC mechanism. Arrows into the ZK column are calls writing to ZK, arrows out of ZK are callbacks being called at the target. So the red coordinator writes to zk via sendStart, zk node creation triggers a startNewOpearion callback on the the blue member1, and similarly on the the green member2. These names are short hand for the names in the review was posted -- now sendStart -> sendBarrierStart, sendReached -> sendBarrierReached, startNewOperation -> Subprocedure's consturctor + acquireBarrier, receiveReached -> receiveReachedGlobalBarrier bq. On your barrier you say "...but does not support ACID semantics" and thats ok because the 'transactions' we'll be running over this mechanism do not require it? Because they can proceed and complete in any order and result will come off the same? Previously, this code was called TwoPhaseCommit (2pc). While it had two phases, the code did not implement true two phase commit. The purpose of this explicit comparison is to make clear 2pc's purpose (distributed ACID guarantees), to point out that we don't have 2pc here, to point out that we don't need 2pc here, and to point out that we just need a global barrier. The online snapshot coordination does not need all of what 2pc provides. The first cut will have "only on a sunny day" semantics -- e.g. it will only succeed if everything succeeds and if anything fails along the way whole attempt will be aborted. This is ok because the durable work that snapshots does goes into tmp dir (/hbase/.snapshots/.tmp/xxx) that is "commited" at the end atomically via HDFS dir rename, and that durable intermediate operation (e.g. new files from forcing a hlog roll or hlog flush) don't need to be undone to remain correct. bq. You say "....Does not recover on failures" ... because the operation just FAILs. Right? Yup. bq. Only one of these precedures can be ongoingn at any one time? Is that right? True for this first cut implementation, but not a fundamental limitation. This actually gets enforced at the snapshot manager level which may be visible in HBASE-7208 and definitely in HBASE-6866 when that gets posted. I believe as implemented if we picked a different class we could have multiple different kinds of procedure concurrently running on a different znode dir hierarchy. bq. How do I read these set of slides? There is a 'Barrier Procedure Coordination' and then there is 'Procedure Coordination'? So, the PC makes use of a BPC? BPC is the skeleton you hang PC on? All those are synonymous -- I've bee using procedure as a shorthand. The code implements one framework for a globally barriered procedure, and I've just tried to call it 'procedure' and 'subprocedure' everywhere (though from review I missed spots where it was called task, operation, or commit). This 'procedure' takes care of the global barrier coordination and cross process error propagation. bq. Why you say this 'If we aren’t doing proper 2PC do we need all this infrastructure?'? Are you making a case for our not needing 2PC given what is being implemented? I could probably remove that line -- I'm now convinced why we need what this code does. The main questions I had when I was initially understanding the previous implementation was "Is this 2pc?" and "Do we need 2pc?". The answers are: what we have implemented here has two phases but is *not* true two-phase commit. 2pc, as defined in the literature (http://www.cs.berkeley.edu/~brewer/cs262/Aries.pdf), requires that once the coordinator says something is committed, any failures at a member or coordinator must be recover by failing forward and completing it. The key point here is that while we will need a global barrier for one of the snapshot flavors (global), it don't need full 2PC because 1) the we don't need to undo work (like a log roll or flush) if some sub part of the first phase (our acquire/2pc's prepare) fails, and because 2) we don't need to recover failing forward if anything fails in the second phase (our release/2pc's commit). In the latter case we just fail and delete .snapshot/.tmp reminants in the fs, and carry on with extra flushed/rolled hlogs. bq. Coordinator can be any client? Does not have to be master? It could be anywhere, but currently for snapshots the coordinator lives on the master. bq. What is ProcedureCoordinateComms? This is actually a layer that separate the zk code (the rpc communications or comms code) from specific execution (snapshotting specific code). I could probably remove it, but the abstraction allows for testing the core pieces without zk. bq. Does this barrier acquistion have any relation to zk barrier receipe? http://zookeeper.apache.org/doc/trunk/recipes.html#sc_recipes_eventHandles Yes. It is very similar to the double barrier. The main thing different here is this code allows for any member or coordinator to abort/cancel the whole shebang while the recipe doesn't seem to. From the recipe it seems that we could be a little bit more clever about how we use our znodes. (we might have one extra set). bq. What is 'class' in the zk node hierarchy? Class of procedure? The online-snapshots is a 'class' (e.g. all online snapshots) while a procedure name is an actual name for a particular snapshotting request (snapshot121201, snapshot121202 etc). Off the top of my head I can't think of any other HBase processes that are ok with the procedure mechanism's semantics (other operations like enabling, disabling, schema change, splitting, merging probably want 2pc and its recovery requirements). I think this extra znode dir could probably get removed. > Globally Barriered Procedure mechanism > -------------------------------------- > > Key: HBASE-7212 > URL: https://issues.apache.org/jira/browse/HBASE-7212 > Project: HBase > Issue Type: Sub-task > Components: snapshots > Affects Versions: hbase-6055 > Reporter: Jonathan Hsieh > Assignee: Jonathan Hsieh > Fix For: hbase-6055 > > Attachments: 121127-global-barrier-proc.pdf, hbase-7212.patch, > pre-hbase-7212.patch > > > This is a simplified version of what was proposed in HBASE-6573. Instead of > claiming to be a 2pc or 3pc implementation (which implies logging at each > actor, and recovery operations) this is just provides a best effort global > barrier mechanism called a Procedure. > Users need only to implement a methods to acquireBarrier, to act when > insideBarrier, and to releaseBarrier that use the ExternalException > cooperative error checking mechanism. > Globally consistent snapshots require the ability to quiesce writes to a set > of region servers before a the snapshot operation is executed. Also if any > node fails, it needs to be able to notify them so that they abort. > The first cut of other online snapshots don't need the fully barrier but may > still use this for its error propagation mechanisms. > This version removes the extra layer incurred in the previous implementation > due to the use of generics, separates the coordinator and members, and > reduces the amount of inheritance used in favor of composition. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira