Thanks for the read, Chia-Ping!

I don't have a good answer for you at this point, but that's a good question.

I think a significant part of the core WAL refactoring work will require us to work with all of the various WAL impls we're accruing :)

* FSHWAL
* AsyncWAL
* WALLess
* Ratis-WAL

The general goal would be to make sure we design our interfaces to the fundamental requirements of HBase, and not to any one implementation. My hope is that if we can do this, we can let all of these systems implement their changes as they want.

On 5/4/18 7:32 AM, Chia-Ping Tsai wrote:
Thanks for the great sharing Josh. I have benefited greatly from the docs.

Just curious. There is another related issue about walless. 
https://issues.apache.org/jira/browse/HBASE-20003. I have absolutely no 
imagination..but how we integrate both features to hbase? to be exact, does 
they have similar aim - remove the dependency to hdfs from hbase's write path?

Best Regard,
Chia-Ping

On 2018/05/03 16:04:00, Josh Elser <els...@apache.org> wrote:
Hi,

I'm pleased to finally be able to share this design document with you
all. It's the result of internal review from half a dozen or so from
within our community (Enis, Devaraj, Artem, and Clay easily come to
mind) after multiple months of review and iteration.

Abstract:

<quote>
Infrastructure as a service (IaaS) via public cloud infrastructure
offerings (Cloud Iaas) has grown dramatically in popularity through
services like Amazon EC2, Google Compute Engine, and Microsoft Azure
Compute. Across Apache HBase users, the majority of new system
architectures include some form of Cloud IaaS as a means to increase the
capabilities and/or decrease the cost of operation of their system.
However, deploying HBase on these platforms comes with difficulties as
HBase has a non-optional dependency on Apache Hadoop HDFS to guarantee
the durability of data written to HBase. This document outlines a
proposal to remove HBase’s dependency on HDFS by replacing the current
Write-Ahead-Log (WAL) implementation using Apache Ratis (incubating). It
covers why the HDFS dependency is a problem on Cloud IaaS, how Ratis can
be used to replace HDFS-based WALs, and a high-level development plan to
effectively implement the replacement of this extremely critical HBase
internal component without becoming tied to a single Cloud IaaS offering.
</quote>

The document is available on Google Docs[1] and there is also PDF
available [2] of the current version. I'm happy to assist those who do
not want to use the copy on a Google service (e.g. transcribe
mailing-list chatter onto the Doc).

Thanks to some of the same folks who helped with this document, I also
have a fairly in-depth analysis of what we think the required work will
entail. For the HBase specific changes, I'd like to avoid the pitfall we
commonly face and work towards frequent merges into master that do not
destabilize the build (keep things "Green") to avoid stalling our
forward momentum after 2.0. If people are curious/interested, I'm happy
to delve some more into how I think we can implement this.

- Josh

[1]
https://docs.google.com/document/d/1Su5py_T5Ytfh9RoTTX2s20KbSJwBHVxbO7ge5ORqbCk/edit#
[2] https://home.apache.org/~elserj/Effective%20HBase%20in%20the%20Cloud.pdf

Reply via email to