Re: [DISCUSS] New User Experience and Data Durability Guarantees on LocalFileSystem (HBASE-24086)

Andrew Purtell Fri, 17 Apr 2020 12:00:21 -0700

I was aiming for a crisp statement but sure there's nuance.

> using  RawLocalFileSystem (and HBase changes to also avoid any unwrapping
back
> into LocalFileSystem) would be a nice little improvement that should be
> relatively straightforward.


Agreed. To be clear I am not arguing against that in any way.

> As people who work on distributed systems, obviously, we default to the
> line of thinking: "if you put your data on one node, you can't survive
> that node failing". Do our users also immediately come to that
> conclusion?

I have been fortunate to have worked at places for 10+ years now where new
users come to HBase from time to time. They usually surprise me in some
way, regarding the conclusions they draw or the preconceptions they bring.

In my experience the problem with claiming a single system suitable for
production (or even dev/test, really) is invariably something happens, and
then the questions begin: Can I recover accidentally deleted data? Or
HFiles I rm -rf-ed externally? Or after crash and fsck the trailer of this
HFile is full of zeros, what now? Or the last block of this WAL is full of
zeros, what now? Are HFiles redundantly stored on the local FS? Why not? Is
there a tool to replay WALs to make new/recovered HFiles? Why not? Why
aren't WALs retained indefinitely? Is there a tool to fix WALs? Crap, can
we set up two systems with primary-replica replication like postgres or
mysql?

Invariably we alight on a discussion of a system architecture with more
than one server. Do we claim cross-site replication can be used in this
way? Or since HDFS has better semantics with respect to durability,
availability, ordering, and visibility, do we recommend HDFS instead of
replication? And so on...

On Fri, Apr 17, 2020 at 11:17 AM Josh Elser <els...@apache.org> wrote:

> I think this is the core point to touch on.
>
> As people who work on distributed systems, obviously, we default to the
> line of thinking: "if you put your data on one node, you can't survive
> that node failing". Do our users also immediately come to that
> conclusion? If not, is it acceptable that we just do better in reminding
> them of this caveat?
>
> Like I think Busbey was alluding to earlier, I think we have room to be
> more prescriptive on how you can use the local filesystem to run HBase
> with the big-fat-warning that this is no longer a distributed system
> that can handle node failure. As far as I remember, using
> RawLocalFileSystem (and HBase changes to also avoid any unwrapping back
> into LocalFileSystem) would be a nice little improvement that should be
> relatively straightforward.
>
> Would that be acceptable in your eyes, Andrew? Or, is the issue more
> fundamental in your mind that we should not be telling users how they
> can run HBase in a manner that doesn't implicitly handle at least
> failure of one node?
>
> I think people are going to always come up with unexpected ways to try
> to run HBase. They're going to slap it on top of random filesystems. I
> don't think we can keep on top of every possible permutation (especially
> if we consider things like persistent volumes from K8s, AWS, Azure that
> try to make traditionally non-fault-tolerant volumes magically
> fault-tolerant).
>
> On 4/16/20 11:59 AM, Andrew Purtell wrote:
> > The data can not be said to be durable because there is one set of files
> that can be irreversibly corrupted or lost.
> >
> >> On Apr 15, 2020, at 3:52 PM, Vladimir Rodionov <vladrodio...@gmail.com>
> wrote:
> >>
> >> FileOutputStream.getFileChannel().force(true) will get all durability
> we
> >> need. Just a simple code change?
> >>
> >>
> >>> On Wed, Apr 15, 2020 at 12:32 PM Andrew Purtell <
> andrew.purt...@gmail.com>
> >>> wrote:
> >>>
> >>> This thread talks of “durability” via filesystem characteristics but
> also
> >>> for single system quick Start type deployments. For durability we need
> >>> multi server deployments. No amount of hacking a single system
> deployment
> >>> is going to give us durability as users will expect (“don’t lose my
> data”).
> >>> I believe my comments are on topic.
> >>>
> >>>
> >>>>> On Apr 15, 2020, at 11:03 AM, Nick Dimiduk <ndimi...@apache.org>
> wrote:
> >>>>
> >>>> On Wed, Apr 15, 2020 at 10:28 AM Andrew Purtell <apurt...@apache.org
> >
> >>> wrote:
> >>>>
> >>>>> Nick's mail doesn't make a distinction between avoiding data loss via
> >>>>> typical tmp cleaner configurations, unfortunately adjacent to
> mention of
> >>>>> "durability", and real data durability, which implies more than what
> a
> >>>>> single system configuration can offer, no matter how many tweaks we
> >>> make to
> >>>>> LocalFileSystem. Maybe I'm being pedantic but this is something to be
> >>>>> really clear about IMHO.
> >>>>>
> >>>>
> >>>> I prefer to focus the attention of this thread to the question of data
> >>>> durability via `FileSystem` characteristics. I agree that there are
> >>>> concerns of durability (and others) around the use of the path under
> >>> /tmp.
> >>>> Let's keep that discussion in the other thread.
> >>>>
> >>>>> On Wed, Apr 15, 2020 at 10:05 AM Sean Busbey <bus...@apache.org>
> wrote:
> >>>>>
> >>>>>> I think the first assumption no longer holds. Especially with the
> move
> >>>>>> to flexible compute environments I regularly get asked by folks what
> >>>>>> the smallest HBase they can start with for production. I can keep
> >>>>>> saying 3/5/7 nodes or whatever but I guarantee there are folks who
> >>>>>> want to and will run HBase with a single node. Probably those
> >>>>>> deployments won't want to have the distributed flag set. None of
> them
> >>>>>> really have a good option for where the WALs go, and failing loud
> when
> >>>>>> they try to go to LocalFileSystem is the best option I've seen so
> far
> >>>>>> to make sure folks realize they are getting into muddy waters.
> >>>>>>
> >>>>>> I agree with the second assumption. Our quickstart in general is too
> >>>>>> complicated. Maybe if we include big warnings in the guide itself,
> we
> >>>>>> could make a quickstart specific artifact to download that has the
> >>>>>> unsafe disabling config in place?
> >>>>>>
> >>>>>> Last fall I toyed with the idea of adding an "hbase-local" module to
> >>>>>> the hbase-filesystem repo that could start us out with some
> >>>>>> optimizations for single node set ups. We could start with a fork of
> >>>>>> RawLocalFileSystem (which will call OutputStream flush operations in
> >>>>>> response to hflush/hsync) that properly advertises its
> >>>>>> StreamCapabilities to say that it supports the operations we need.
> >>>>>> Alternatively we could make our own implementation of FileSystem
> that
> >>>>>> uses NIO stuff. Either of these approaches would solve both
> problems.
> >>>>>>
> >>>>>> On Wed, Apr 15, 2020 at 11:40 AM Nick Dimiduk <ndimi...@apache.org>
> >>>>> wrote:
> >>>>>>>
> >>>>>>> Hi folks,
> >>>>>>>
> >>>>>>> I'd like to bring up the topic of the experience of new users as it
> >>>>>>> pertains to use of the `LocalFileSystem` and its associated (lack
> of)
> >>>>>> data
> >>>>>>> durability guarantees. By default, an unconfigured HBase runs with
> its
> >>>>>> root
> >>>>>>> directory on a `file:///` path. This patch is picked up as an
> instance
> >>>>> of
> >>>>>>> `LocalFileSystem`. Hadoop has long offered this class, but it has
> >>> never
> >>>>>>> supported `hsync` or `hflush` stream characteristics. Thus, when
> HBase
> >>>>>> runs
> >>>>>>> on this configuration, it is unable to ensure that WAL writes are
> >>>>>> durable,
> >>>>>>> and thus will ACK a write without this assurance. This is the case,
> >>>>> even
> >>>>>>> when running in a fully durable WAL mode.
> >>>>>>>
> >>>>>>> This impacts a new user, someone kicking the tires on HBase
> following
> >>>>> our
> >>>>>>> Getting Started docs. On Hadoop 2.8 and before, an unconfigured
> HBase
> >>>>>> will
> >>>>>>> WARN and cary on. Hadoop 2.10+, HBase will refuse to start. The
> book
> >>>>>>> describes a process of disabling enforcement of stream capability
> >>>>>>> enforcement as a first step. This is a mandatory configuration for
> >>>>>> running
> >>>>>>> HBase directly out of our binary distribution.
> >>>>>>>
> >>>>>>> HBASE-24086 restores the behavior on Hadoop 2.10+ to that of
> running
> >>> on
> >>>>>>> 2.8: log a warning and cary on. The critique of this approach is
> that
> >>>>>> it's
> >>>>>>> far too subtle, too quiet for a system operating in a state known
> to
> >>>>> not
> >>>>>>> provide data durability.
> >>>>>>>
> >>>>>>> I have two assumptions/concerns around the state of things, which
> >>>>>> prompted
> >>>>>>> my solution on HBASE-24086 and the associated doc update on
> >>>>> HBASE-24106.
> >>>>>>>
> >>>>>>> 1. No one should be running a production system on
> `LocalFileSystem`.
> >>>>>>>
> >>>>>>> The initial implementation checked both for `LocalFileSystem` and
> >>>>>>> `hbase.cluster.distributed`. When running on the former and the
> latter
> >>>>> is
> >>>>>>> false, we assume the user is running a non-production deployment
> and
> >>>>>> carry
> >>>>>>> on with the warning. When the latter is true, we assume the user
> >>>>>> intended a
> >>>>>>> production deployment and the process terminates due to stream
> >>>>> capability
> >>>>>>> enforcement. Subsequent code review resulted in skipping the
> >>>>>>> `hbase.cluster.distributed` check and simply warning, as was done
> on
> >>>>> 2.8
> >>>>>>> and earlier.
> >>>>>>>
> >>>>>>> (As I understand it, we've long used the
> `hbase.cluster.distributed`
> >>>>>>> configuration to decide if the user intends this runtime to be a
> >>>>>> production
> >>>>>>> deployment or not.)
> >>>>>>>
> >>>>>>> Is this a faulty assumption? Is there a use-case we support where
> we
> >>>>>>> condone running production deployment on the non-durable
> >>>>>> `LocalFileSystem`?
> >>>>>>>
> >>>>>>> 2. The Quick Start experience should require no configuration at
> all.
> >>>>>>>
> >>>>>>> Our stack is difficult enough to run in a fully durable production
> >>>>>>> environment. We should make it a priority to ensure it's as easy as
> >>>>>>> possible to try out HBase. Forcing a user to make decisions about
> data
> >>>>>>> durability before they even launch the web ui is a terrible
> >>> experience,
> >>>>>> in
> >>>>>>> my opinion, and should be a non-starter for us as a project.
> >>>>>>>
> >>>>>>> (In my opinion, the need to configure either `hbase.rootdir` or
> >>>>>>> `hbase.tmp.dir` away from `/tmp` is equally bad for a Getting
> Started
> >>>>>>> experience. It is a second, more subtle question of data durability
> >>>>> that
> >>>>>> we
> >>>>>>> should avoid out of the box. But I'm happy to leave that for
> another
> >>>>>>> thread.)
> >>>>>>>
> >>>>>>> Thank you for your time,
> >>>>>>> Nick
> >>>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Best regards,
> >>>>> Andrew
> >>>>>
> >>>>> Words like orphans lost among the crosstalk, meaning torn from
> truth's
> >>>>> decrepit hands
> >>>>>   - A23, Crosstalk
> >>>>>
> >>>
>


-- 
Best regards,
Andrew

Words like orphans lost among the crosstalk, meaning torn from truth's
decrepit hands
   - A23, Crosstalk

Re: [DISCUSS] New User Experience and Data Durability Guarantees on LocalFileSystem (HBASE-24086)

Reply via email to