Josh,

thanks to you (and all the others working on this). I did read it once and
I think it sounds very sane. It answers questions that I face more and more
from customers. I have not looked at Ratis in detail so I can't comment on
the challenge of adopting it but I agree with the comments on avoiding the
complexity of requiring Kafka/DistributedLog. A nice and clean API would
give us the opportunity to leverage other services more easily in the
future as well.

This is a minor detail and I'm no expert here (and definitely haven't
thought through all ramifications) but do you still plan on having the WAL
hand out sequenceids or shall that be moved out of that implementation as
well?

Cheers,
Lars

On Thu, May 3, 2018 at 6:04 PM, Josh Elser <[email protected]> wrote:

> Hi,
>
> I'm pleased to finally be able to share this design document with you all.
> It's the result of internal review from half a dozen or so from within our
> community (Enis, Devaraj, Artem, and Clay easily come to mind) after
> multiple months of review and iteration.
>
> Abstract:
>
> <quote>
> Infrastructure as a service (IaaS) via public cloud infrastructure
> offerings (Cloud Iaas) has grown dramatically in popularity through
> services like Amazon EC2, Google Compute Engine, and Microsoft Azure
> Compute. Across Apache HBase users, the majority of new system
> architectures include some form of Cloud IaaS as a means to increase the
> capabilities and/or decrease the cost of operation of their system.
> However, deploying HBase on these platforms comes with difficulties as
> HBase has a non-optional dependency on Apache Hadoop HDFS to guarantee the
> durability of data written to HBase. This document outlines a proposal to
> remove HBase’s dependency on HDFS by replacing the current Write-Ahead-Log
> (WAL) implementation using Apache Ratis (incubating). It covers why the
> HDFS dependency is a problem on Cloud IaaS, how Ratis can be used to
> replace HDFS-based WALs, and a high-level development plan to effectively
> implement the replacement of this extremely critical HBase internal
> component without becoming tied to a single Cloud IaaS offering.
> </quote>
>
> The document is available on Google Docs[1] and there is also PDF
> available [2] of the current version. I'm happy to assist those who do not
> want to use the copy on a Google service (e.g. transcribe mailing-list
> chatter onto the Doc).
>
> Thanks to some of the same folks who helped with this document, I also
> have a fairly in-depth analysis of what we think the required work will
> entail. For the HBase specific changes, I'd like to avoid the pitfall we
> commonly face and work towards frequent merges into master that do not
> destabilize the build (keep things "Green") to avoid stalling our forward
> momentum after 2.0. If people are curious/interested, I'm happy to delve
> some more into how I think we can implement this.
>
> - Josh
>
> [1] https://docs.google.com/document/d/1Su5py_T5Ytfh9RoTTX2s20Kb
> SJwBHVxbO7ge5ORqbCk/edit#
> [2] https://home.apache.org/~elserj/Effective%20HBase%20in%20the
> %20Cloud.pdf
>

Reply via email to