[DISCUSS] New User Experience and Data Durability Guarantees on LocalFileSystem (HBASE-24086)

Nick Dimiduk Wed, 15 Apr 2020 09:41:09 -0700

Hi folks,

I'd like to bring up the topic of the experience of new users as it
pertains to use of the `LocalFileSystem` and its associated (lack of) data
durability guarantees. By default, an unconfigured HBase runs with its root
directory on a `file:///` path. This patch is picked up as an instance of
`LocalFileSystem`. Hadoop has long offered this class, but it has never
supported `hsync` or `hflush` stream characteristics. Thus, when HBase runs
on this configuration, it is unable to ensure that WAL writes are durable,
and thus will ACK a write without this assurance. This is the case, even
when running in a fully durable WAL mode.


This impacts a new user, someone kicking the tires on HBase following our
Getting Started docs. On Hadoop 2.8 and before, an unconfigured HBase will
WARN and cary on. Hadoop 2.10+, HBase will refuse to start. The book
describes a process of disabling enforcement of stream capability
enforcement as a first step. This is a mandatory configuration for running
HBase directly out of our binary distribution.

HBASE-24086 restores the behavior on Hadoop 2.10+ to that of running on
2.8: log a warning and cary on. The critique of this approach is that it's
far too subtle, too quiet for a system operating in a state known to not
provide data durability.

I have two assumptions/concerns around the state of things, which prompted
my solution on HBASE-24086 and the associated doc update on HBASE-24106.

1. No one should be running a production system on `LocalFileSystem`.

The initial implementation checked both for `LocalFileSystem` and
`hbase.cluster.distributed`. When running on the former and the latter is
false, we assume the user is running a non-production deployment and carry
on with the warning. When the latter is true, we assume the user intended a
production deployment and the process terminates due to stream capability
enforcement. Subsequent code review resulted in skipping the
`hbase.cluster.distributed` check and simply warning, as was done on 2.8
and earlier.

(As I understand it, we've long used the `hbase.cluster.distributed`
configuration to decide if the user intends this runtime to be a production
deployment or not.)

Is this a faulty assumption? Is there a use-case we support where we
condone running production deployment on the non-durable `LocalFileSystem`?

2. The Quick Start experience should require no configuration at all.

Our stack is difficult enough to run in a fully durable production
environment. We should make it a priority to ensure it's as easy as
possible to try out HBase. Forcing a user to make decisions about data
durability before they even launch the web ui is a terrible experience, in
my opinion, and should be a non-starter for us as a project.

(In my opinion, the need to configure either `hbase.rootdir` or
`hbase.tmp.dir` away from `/tmp` is equally bad for a Getting Started
experience. It is a second, more subtle question of data durability that we
should avoid out of the box. But I'm happy to leave that for another
thread.)

Thank you for your time,
Nick

[DISCUSS] New User Experience and Data Durability Guarantees on LocalFileSystem (HBASE-24086)

Reply via email to