[
https://issues.apache.org/jira/browse/HDFS-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229500#comment-13229500
]
Ivan Kelly commented on HDFS-3077:
----------------------------------
{quote}
* Re-uses existing Hadoop subsystems like IPC, security, and the file-based
edit logging code. This means that it will be easier to maintain for the Hadoop
development community, and easier to deploy for Hadoop operations.
* Doesn't introduce a new dependency on an external project. If there is a
bug discovered in this code, we can fix it with a new Hadoop release without
having to wait on a new release of ZooKeeper. Since ZK and HDFS may be managed
by different ops teams, this also simplifies upgrade.
{quote}
These arguments seem very much to be a case of NIH.
{quote}
* BookKeeper is a general system, whereas this is a specific system. Since BK
tries to be quite general, it has extra complexity that we don't need. For
example, it handles the interleaving of up to thousands of distinct edit logs
into a single on-disk layout. These complexities are useful for a general
"write-ahead log as a service" project, but not for our use case where even
very large clusters have only a handful of distinct logs.
{quote}
So the plan is to step around this complexity by implementing ZAB?
{quote}
* BookKeeper's commit protocol waits for all replicas to commit. This means
that, should one of the bookies fail, one must wait for a rather lengthy
timeout before continuing. Additionally, the latency of a commit is the maximum
of the latency of the bookies, meaning that it's much less feasible to
collocate bookies with other machines under load like DataNodes. A quorum
commit protocol instead has a latency equal to the median of its replicas'
latencies, allowing it to ride over transient slowness on the part of one of
its replicas.
{quote}
It would be actually very simple to change this within BookKeeper if needed.
Instead of sending to a quorum, you could send to the ensemble, wait for
responses from quorum. None of the guarantees of bookkeeper would be broken,
though throughput would obviously drop. Currently, with BookKeeper, we're able
to get higher throughput than when using a filer or a local file[1].
Also, I don't think ZAB is the right tool for this in any case. You have a
single writer, which can therefore act as a sequencer on the entries. You just
need to broadcast to an ensemble, and wait for quorum responses, as I outlined
above for BookKeeper.
[1] http://people.apache.org/~ivank/tpt_mar14.pdf
> Quorum-based protocol for reading and writing edit logs
> -------------------------------------------------------
>
> Key: HDFS-3077
> URL: https://issues.apache.org/jira/browse/HDFS-3077
> Project: Hadoop HDFS
> Issue Type: New Feature
> Components: ha, name-node
> Reporter: Todd Lipcon
> Assignee: Todd Lipcon
>
> Currently, one of the weak points of the HA design is that it relies on
> shared storage such as an NFS filer for the shared edit log. One alternative
> that has been proposed is to depend on BookKeeper, a ZooKeeper subproject
> which provides a highly available replicated edit log on commodity hardware.
> This JIRA is to implement another alternative, based on a quorum commit
> protocol, integrated more tightly in HDFS and with the requirements driven
> only by HDFS's needs rather than more generic use cases. More details to
> follow.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira