[ 
https://issues.apache.org/jira/browse/HDFS-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13245806#comment-13245806
 ] 

Todd Lipcon commented on HDFS-3077:
-----------------------------------

Hi Bikas. Thanks for bringing up this scenario. I do need to add a section to 
the doc about failure handling and re-adding failed journals.

My thinking is that the granularity of "membership" is the log segment. This is 
similar to what we do on local disks today - when we roll the edit log, we 
attempt to re-add any disks that previously failed. Similarly, when we start a 
new log segment, we give all of the JNs a chance to pick back up following 
along with the quorum.

To try to map to your example, we'd have the following:
JN1: writing edits_inprogress_1 (@txn 100)
JN2: writing edits_inprogress_1 (@txn 100)
JN3: has been reformatted, comes back online

At this point, the QJM can try to write txns to all three, but JN3 won't accept 
transactions because it doesn't have a currently open log segment. Currently it 
will just reject them. I can imagine a future optimization in which it would 
return a special exception, and the QJM could notify the NN that it would like 
to roll ASAP if possible.

Let's say we write another 20 txns, and then roll logs. On the next 
startLogSegment call, we'd end up with the following:

JN1: edits_1-120, edits_inprogress_121
JN2: edits_1-120, edits_inprogress_121
JN3: edits_inprogress_121

so all nodes are now taking part in the quorum. We could optionally at this 
point have JN3 copy over the edits_1-120 segment from one of the other nodes, 
but that copy can be asynchronous. It's a repair operation, but given we 
already have 2 valid replicas, we aren't in any imminent danger of data loss.
                
> Quorum-based protocol for reading and writing edit logs
> -------------------------------------------------------
>
>                 Key: HDFS-3077
>                 URL: https://issues.apache.org/jira/browse/HDFS-3077
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: ha, name-node
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>         Attachments: hdfs-3077-partial.txt, qjournal-design.pdf
>
>
> Currently, one of the weak points of the HA design is that it relies on 
> shared storage such as an NFS filer for the shared edit log. One alternative 
> that has been proposed is to depend on BookKeeper, a ZooKeeper subproject 
> which provides a highly available replicated edit log on commodity hardware. 
> This JIRA is to implement another alternative, based on a quorum commit 
> protocol, integrated more tightly in HDFS and with the requirements driven 
> only by HDFS's needs rather than more generic use cases. More details to 
> follow.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to