Hi folks, I'd like to bring up the topic of the experience of new users as it pertains to use of the `LocalFileSystem` and its associated (lack of) data durability guarantees. By default, an unconfigured HBase runs with its root directory on a `file:///` path. This patch is picked up as an instance of `LocalFileSystem`. Hadoop has long offered this class, but it has never supported `hsync` or `hflush` stream characteristics. Thus, when HBase runs on this configuration, it is unable to ensure that WAL writes are durable, and thus will ACK a write without this assurance. This is the case, even when running in a fully durable WAL mode.
This impacts a new user, someone kicking the tires on HBase following our Getting Started docs. On Hadoop 2.8 and before, an unconfigured HBase will WARN and cary on. Hadoop 2.10+, HBase will refuse to start. The book describes a process of disabling enforcement of stream capability enforcement as a first step. This is a mandatory configuration for running HBase directly out of our binary distribution. HBASE-24086 restores the behavior on Hadoop 2.10+ to that of running on 2.8: log a warning and cary on. The critique of this approach is that it's far too subtle, too quiet for a system operating in a state known to not provide data durability. I have two assumptions/concerns around the state of things, which prompted my solution on HBASE-24086 and the associated doc update on HBASE-24106. 1. No one should be running a production system on `LocalFileSystem`. The initial implementation checked both for `LocalFileSystem` and `hbase.cluster.distributed`. When running on the former and the latter is false, we assume the user is running a non-production deployment and carry on with the warning. When the latter is true, we assume the user intended a production deployment and the process terminates due to stream capability enforcement. Subsequent code review resulted in skipping the `hbase.cluster.distributed` check and simply warning, as was done on 2.8 and earlier. (As I understand it, we've long used the `hbase.cluster.distributed` configuration to decide if the user intends this runtime to be a production deployment or not.) Is this a faulty assumption? Is there a use-case we support where we condone running production deployment on the non-durable `LocalFileSystem`? 2. The Quick Start experience should require no configuration at all. Our stack is difficult enough to run in a fully durable production environment. We should make it a priority to ensure it's as easy as possible to try out HBase. Forcing a user to make decisions about data durability before they even launch the web ui is a terrible experience, in my opinion, and should be a non-starter for us as a project. (In my opinion, the need to configure either `hbase.rootdir` or `hbase.tmp.dir` away from `/tmp` is equally bad for a Getting Started experience. It is a second, more subtle question of data durability that we should avoid out of the box. But I'm happy to leave that for another thread.) Thank you for your time, Nick