[ https://issues.apache.org/jira/browse/HDFS-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13245785#comment-13245785 ]
Bikas Saha commented on HDFS-3077: ---------------------------------- I have a question around syncing journal nodes and quorum based writes. There will always be a case that a lost journal node comes back up and is syncing its state - the extreme example of which is replacement of a broken journal node with a new node. While it is doing this, will it be part of the quorum when a quorum number of writes must succeed? Say we have 3 journals with the following txids JN1-100, JN2-100, JN3-0 (JN3 just joined) Now say some stuff got written to JN2 and JN3 (quorum commit with JN1 in flight records in the queue because JN1 is slow) JN1-100, JN2-110, JN3-110+syncing_holes At this point something terrible happens and when we recover, we can only access JN1 and JN3 JN1-100, JN3-110+syncing holes At this point of time how do we resolve the ground truth about the journal state and edit logs? > Quorum-based protocol for reading and writing edit logs > ------------------------------------------------------- > > Key: HDFS-3077 > URL: https://issues.apache.org/jira/browse/HDFS-3077 > Project: Hadoop HDFS > Issue Type: New Feature > Components: ha, name-node > Reporter: Todd Lipcon > Assignee: Todd Lipcon > Attachments: hdfs-3077-partial.txt, qjournal-design.pdf > > > Currently, one of the weak points of the HA design is that it relies on > shared storage such as an NFS filer for the shared edit log. One alternative > that has been proposed is to depend on BookKeeper, a ZooKeeper subproject > which provides a highly available replicated edit log on commodity hardware. > This JIRA is to implement another alternative, based on a quorum commit > protocol, integrated more tightly in HDFS and with the requirements driven > only by HDFS's needs rather than more generic use cases. More details to > follow. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira