[ 
https://issues.apache.org/jira/browse/HBASE-7212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13509465#comment-13509465
 ] 

Jonathan Hsieh commented on HBASE-7212:
---------------------------------------

bq. What happens when the coordinator dies (in this case hmaster). Does the new 
HMaster discover the prev procedure and abort?

The new HMaster will delete all znodes associated with the procedure class (all 
znodes associated with snapshotting procedures), all members still using them 
should timeout and fail, and new operations need to be issued.  For snapshots 
in particular, there isn't really a chance for a partial snapshot being present 
when taking one because all the snapshot work is done in a temp dir and 
atomically put into place with a dir rename op after the coordinator realizes 
all the members have released/leave'd successfully.   There will be junk in 
these tmp dirs left over but they get cleaned up on the next take snapshot 
attempt, or when the new master starts.


                
> Globally Barriered Procedure mechanism
> --------------------------------------
>
>                 Key: HBASE-7212
>                 URL: https://issues.apache.org/jira/browse/HBASE-7212
>             Project: HBase
>          Issue Type: Sub-task
>          Components: snapshots
>    Affects Versions: hbase-6055
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>             Fix For: hbase-6055
>
>         Attachments: 121127-global-barrier-proc.pdf, hbase-7212.patch, 
> pre-hbase-7212.patch
>
>
> This is a simplified version of what was proposed in HBASE-6573.  Instead of 
> claiming to be a 2pc or 3pc implementation (which implies logging at each 
> actor, and recovery operations) this is just provides a best effort global 
> barrier mechanism called a Procedure.  
> Users need only to implement a methods to acquireBarrier, to act when 
> insideBarrier, and to releaseBarrier that use the ExternalException 
> cooperative error checking mechanism.
> Globally consistent snapshots require the ability to quiesce writes to a set 
> of region servers before a the snapshot operation is executed.  Also if any 
> node fails, it needs to be able to notify them so that they abort.
> The first cut of other online snapshots don't need the fully barrier but may 
> still use this for its error propagation mechanisms.
> This version removes the extra layer incurred in the previous implementation 
> due to the use of generics, separates the coordinator and members, and 
> reduces the amount of inheritance used in favor of composition.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to