On Wed, Apr 15, 2020 at 10:28 AM Andrew Purtell <[email protected]> wrote:

> Nick's mail doesn't make a distinction between avoiding data loss via
> typical tmp cleaner configurations, unfortunately adjacent to mention of
> "durability", and real data durability, which implies more than what a
> single system configuration can offer, no matter how many tweaks we make to
> LocalFileSystem. Maybe I'm being pedantic but this is something to be
> really clear about IMHO.
>

I prefer to focus the attention of this thread to the question of data
durability via `FileSystem` characteristics. I agree that there are
concerns of durability (and others) around the use of the path under /tmp.
Let's keep that discussion in the other thread.

On Wed, Apr 15, 2020 at 10:05 AM Sean Busbey <[email protected]> wrote:
>
> > I think the first assumption no longer holds. Especially with the move
> > to flexible compute environments I regularly get asked by folks what
> > the smallest HBase they can start with for production. I can keep
> > saying 3/5/7 nodes or whatever but I guarantee there are folks who
> > want to and will run HBase with a single node. Probably those
> > deployments won't want to have the distributed flag set. None of them
> > really have a good option for where the WALs go, and failing loud when
> > they try to go to LocalFileSystem is the best option I've seen so far
> > to make sure folks realize they are getting into muddy waters.
> >
> > I agree with the second assumption. Our quickstart in general is too
> > complicated. Maybe if we include big warnings in the guide itself, we
> > could make a quickstart specific artifact to download that has the
> > unsafe disabling config in place?
> >
> > Last fall I toyed with the idea of adding an "hbase-local" module to
> > the hbase-filesystem repo that could start us out with some
> > optimizations for single node set ups. We could start with a fork of
> > RawLocalFileSystem (which will call OutputStream flush operations in
> > response to hflush/hsync) that properly advertises its
> > StreamCapabilities to say that it supports the operations we need.
> > Alternatively we could make our own implementation of FileSystem that
> > uses NIO stuff. Either of these approaches would solve both problems.
> >
> > On Wed, Apr 15, 2020 at 11:40 AM Nick Dimiduk <[email protected]>
> wrote:
> > >
> > > Hi folks,
> > >
> > > I'd like to bring up the topic of the experience of new users as it
> > > pertains to use of the `LocalFileSystem` and its associated (lack of)
> > data
> > > durability guarantees. By default, an unconfigured HBase runs with its
> > root
> > > directory on a `file:///` path. This patch is picked up as an instance
> of
> > > `LocalFileSystem`. Hadoop has long offered this class, but it has never
> > > supported `hsync` or `hflush` stream characteristics. Thus, when HBase
> > runs
> > > on this configuration, it is unable to ensure that WAL writes are
> > durable,
> > > and thus will ACK a write without this assurance. This is the case,
> even
> > > when running in a fully durable WAL mode.
> > >
> > > This impacts a new user, someone kicking the tires on HBase following
> our
> > > Getting Started docs. On Hadoop 2.8 and before, an unconfigured HBase
> > will
> > > WARN and cary on. Hadoop 2.10+, HBase will refuse to start. The book
> > > describes a process of disabling enforcement of stream capability
> > > enforcement as a first step. This is a mandatory configuration for
> > running
> > > HBase directly out of our binary distribution.
> > >
> > > HBASE-24086 restores the behavior on Hadoop 2.10+ to that of running on
> > > 2.8: log a warning and cary on. The critique of this approach is that
> > it's
> > > far too subtle, too quiet for a system operating in a state known to
> not
> > > provide data durability.
> > >
> > > I have two assumptions/concerns around the state of things, which
> > prompted
> > > my solution on HBASE-24086 and the associated doc update on
> HBASE-24106.
> > >
> > > 1. No one should be running a production system on `LocalFileSystem`.
> > >
> > > The initial implementation checked both for `LocalFileSystem` and
> > > `hbase.cluster.distributed`. When running on the former and the latter
> is
> > > false, we assume the user is running a non-production deployment and
> > carry
> > > on with the warning. When the latter is true, we assume the user
> > intended a
> > > production deployment and the process terminates due to stream
> capability
> > > enforcement. Subsequent code review resulted in skipping the
> > > `hbase.cluster.distributed` check and simply warning, as was done on
> 2.8
> > > and earlier.
> > >
> > > (As I understand it, we've long used the `hbase.cluster.distributed`
> > > configuration to decide if the user intends this runtime to be a
> > production
> > > deployment or not.)
> > >
> > > Is this a faulty assumption? Is there a use-case we support where we
> > > condone running production deployment on the non-durable
> > `LocalFileSystem`?
> > >
> > > 2. The Quick Start experience should require no configuration at all.
> > >
> > > Our stack is difficult enough to run in a fully durable production
> > > environment. We should make it a priority to ensure it's as easy as
> > > possible to try out HBase. Forcing a user to make decisions about data
> > > durability before they even launch the web ui is a terrible experience,
> > in
> > > my opinion, and should be a non-starter for us as a project.
> > >
> > > (In my opinion, the need to configure either `hbase.rootdir` or
> > > `hbase.tmp.dir` away from `/tmp` is equally bad for a Getting Started
> > > experience. It is a second, more subtle question of data durability
> that
> > we
> > > should avoid out of the box. But I'm happy to leave that for another
> > > thread.)
> > >
> > > Thank you for your time,
> > > Nick
> >
>
>
> --
> Best regards,
> Andrew
>
> Words like orphans lost among the crosstalk, meaning torn from truth's
> decrepit hands
>    - A23, Crosstalk
>

Reply via email to