[jira] [Commented] (HDFS-3077) Quorum-based protocol for reading and writing edit logs

Todd Lipcon (Commented) (JIRA) Mon, 12 Mar 2012 11:59:00 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13227780#comment-13227780
 ]


Todd Lipcon commented on HDFS-3077:
-----------------------------------

I plan to post a more thorough design doc in the next week or two, but my 
current rough thinking on the design is:

- Implement a standalone daemon which hosts the JournalProtocol interface, or 
something similar to it. This daemon is responsible for accepting edits from 
the active master and writing them locally. We will need to extend the 
interface to also allow reading from the logs, and a recovery operation (see 
below)
- The user would configure an odd number of these logger daemons (could be 
collocated with the NNs or even with DNs, given a dedicated disk)
- Create a JournalManager implementation which uses a quorum protocol to commit 
edits to the logger daemons.

As for the particular quorum protocol, I am leaning towards implementing ZAB -- 
its properties align very closely to the semantics we need.

I see the following advantages over the BookKeeper approach:
- Re-uses existing Hadoop subsystems like IPC, security, and the file-based 
edit logging code. This means that it will be easier to maintain for the Hadoop 
development community, and easier to deploy for Hadoop operations.
- Doesn't introduce a new dependency on an external project. If there is a bug 
discovered in this code, we can fix it with a new Hadoop release without having 
to wait on a new release of ZooKeeper. Since ZK and HDFS may be managed by 
different ops teams, this also simplifies upgrade.
- BookKeeper is a general system, whereas this is a specific system. Since BK 
tries to be quite general, it has extra complexity that we don't need. For 
example, it handles the interleaving of up to thousands of distinct edit logs 
into a single on-disk layout. These complexities are useful for a general 
"write-ahead log as a service" project, but not for our use case where even 
very large clusters have only a handful of distinct logs.
- BookKeeper's commit protocol waits for all replicas to commit. This means 
that, should one of the bookies fail, one must wait for a rather lengthy 
timeout before continuing. Additionally, the latency of a commit is the maximum 
of the latency of the bookies, meaning that it's much less feasible to 
collocate bookies with other machines under load like DataNodes. A quorum 
commit protocol instead has a latency equal to the median of its replicas' 
latencies, allowing it to ride over transient slowness on the part of one of 
its replicas.

The BK approach is still interesting for some deployments, and I'd advocate 
that we explore both options in parallel. If one approach as implemented turns 
out to have significant advantages, we can at that time choose to support only 
one, but in the meantime, I think it's worth developing both approaches.




                
> Quorum-based protocol for reading and writing edit logs
> -------------------------------------------------------
>
>                 Key: HDFS-3077
>                 URL: https://issues.apache.org/jira/browse/HDFS-3077
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: ha, name-node
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>
> Currently, one of the weak points of the HA design is that it relies on 
> shared storage such as an NFS filer for the shared edit log. One alternative 
> that has been proposed is to depend on BookKeeper, a ZooKeeper subproject 
> which provides a highly available replicated edit log on commodity hardware. 
> This JIRA is to implement another alternative, based on a quorum commit 
> protocol, integrated more tightly in HDFS and with the requirements driven 
> only by HDFS's needs rather than more generic use cases. More details to 
> follow.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3077) Quorum-based protocol for reading and writing edit logs

Reply via email to