Re: [DISCUSS] New User Experience and Data Durability Guarantees on LocalFileSystem (HBASE-24086)

Andrew Purtell Wed, 15 Apr 2020 10:28:25 -0700

> We could start with a fork of
> RawLocalFileSystem (which will call OutputStream flush operations in
> response to hflush/hsync) that properly advertises its
> StreamCapabilities to say that it supports the operations we need.


This is a worthy option to pursue.

Nick's mail doesn't make a distinction between avoiding data loss via
typical tmp cleaner configurations, unfortunately adjacent to mention of
"durability", and real data durability, which implies more than what a
single system configuration can offer, no matter how many tweaks we make to
LocalFileSystem. Maybe I'm being pedantic but this is something to be
really clear about IMHO.


On Wed, Apr 15, 2020 at 10:05 AM Sean Busbey <[email protected]> wrote:

> I think the first assumption no longer holds. Especially with the move
> to flexible compute environments I regularly get asked by folks what
> the smallest HBase they can start with for production. I can keep
> saying 3/5/7 nodes or whatever but I guarantee there are folks who
> want to and will run HBase with a single node. Probably those
> deployments won't want to have the distributed flag set. None of them
> really have a good option for where the WALs go, and failing loud when
> they try to go to LocalFileSystem is the best option I've seen so far
> to make sure folks realize they are getting into muddy waters.
>
> I agree with the second assumption. Our quickstart in general is too
> complicated. Maybe if we include big warnings in the guide itself, we
> could make a quickstart specific artifact to download that has the
> unsafe disabling config in place?
>
> Last fall I toyed with the idea of adding an "hbase-local" module to
> the hbase-filesystem repo that could start us out with some
> optimizations for single node set ups. We could start with a fork of
> RawLocalFileSystem (which will call OutputStream flush operations in
> response to hflush/hsync) that properly advertises its
> StreamCapabilities to say that it supports the operations we need.
> Alternatively we could make our own implementation of FileSystem that
> uses NIO stuff. Either of these approaches would solve both problems.
>
> On Wed, Apr 15, 2020 at 11:40 AM Nick Dimiduk <[email protected]> wrote:
> >
> > Hi folks,
> >
> > I'd like to bring up the topic of the experience of new users as it
> > pertains to use of the `LocalFileSystem` and its associated (lack of)
> data
> > durability guarantees. By default, an unconfigured HBase runs with its
> root
> > directory on a `file:///` path. This patch is picked up as an instance of
> > `LocalFileSystem`. Hadoop has long offered this class, but it has never
> > supported `hsync` or `hflush` stream characteristics. Thus, when HBase
> runs
> > on this configuration, it is unable to ensure that WAL writes are
> durable,
> > and thus will ACK a write without this assurance. This is the case, even
> > when running in a fully durable WAL mode.
> >
> > This impacts a new user, someone kicking the tires on HBase following our
> > Getting Started docs. On Hadoop 2.8 and before, an unconfigured HBase
> will
> > WARN and cary on. Hadoop 2.10+, HBase will refuse to start. The book
> > describes a process of disabling enforcement of stream capability
> > enforcement as a first step. This is a mandatory configuration for
> running
> > HBase directly out of our binary distribution.
> >
> > HBASE-24086 restores the behavior on Hadoop 2.10+ to that of running on
> > 2.8: log a warning and cary on. The critique of this approach is that
> it's
> > far too subtle, too quiet for a system operating in a state known to not
> > provide data durability.
> >
> > I have two assumptions/concerns around the state of things, which
> prompted
> > my solution on HBASE-24086 and the associated doc update on HBASE-24106.
> >
> > 1. No one should be running a production system on `LocalFileSystem`.
> >
> > The initial implementation checked both for `LocalFileSystem` and
> > `hbase.cluster.distributed`. When running on the former and the latter is
> > false, we assume the user is running a non-production deployment and
> carry
> > on with the warning. When the latter is true, we assume the user
> intended a
> > production deployment and the process terminates due to stream capability
> > enforcement. Subsequent code review resulted in skipping the
> > `hbase.cluster.distributed` check and simply warning, as was done on 2.8
> > and earlier.
> >
> > (As I understand it, we've long used the `hbase.cluster.distributed`
> > configuration to decide if the user intends this runtime to be a
> production
> > deployment or not.)
> >
> > Is this a faulty assumption? Is there a use-case we support where we
> > condone running production deployment on the non-durable
> `LocalFileSystem`?
> >
> > 2. The Quick Start experience should require no configuration at all.
> >
> > Our stack is difficult enough to run in a fully durable production
> > environment. We should make it a priority to ensure it's as easy as
> > possible to try out HBase. Forcing a user to make decisions about data
> > durability before they even launch the web ui is a terrible experience,
> in
> > my opinion, and should be a non-starter for us as a project.
> >
> > (In my opinion, the need to configure either `hbase.rootdir` or
> > `hbase.tmp.dir` away from `/tmp` is equally bad for a Getting Started
> > experience. It is a second, more subtle question of data durability that
> we
> > should avoid out of the box. But I'm happy to leave that for another
> > thread.)
> >
> > Thank you for your time,
> > Nick
>


-- 
Best regards,
Andrew

Words like orphans lost among the crosstalk, meaning torn from truth's
decrepit hands
   - A23, Crosstalk

Re: [DISCUSS] New User Experience and Data Durability Guarantees on LocalFileSystem (HBASE-24086)

Reply via email to