This thread talks of “durability” via filesystem characteristics but also for 
single system quick Start type deployments. For durability we need multi server 
deployments. No amount of hacking a single system deployment is going to give 
us durability as users will expect (“don’t lose my data”). I believe my 
comments are on topic. 


> On Apr 15, 2020, at 11:03 AM, Nick Dimiduk <ndimi...@apache.org> wrote:
> 
> On Wed, Apr 15, 2020 at 10:28 AM Andrew Purtell <apurt...@apache.org> wrote:
> 
>> Nick's mail doesn't make a distinction between avoiding data loss via
>> typical tmp cleaner configurations, unfortunately adjacent to mention of
>> "durability", and real data durability, which implies more than what a
>> single system configuration can offer, no matter how many tweaks we make to
>> LocalFileSystem. Maybe I'm being pedantic but this is something to be
>> really clear about IMHO.
>> 
> 
> I prefer to focus the attention of this thread to the question of data
> durability via `FileSystem` characteristics. I agree that there are
> concerns of durability (and others) around the use of the path under /tmp.
> Let's keep that discussion in the other thread.
> 
>> On Wed, Apr 15, 2020 at 10:05 AM Sean Busbey <bus...@apache.org> wrote:
>> 
>>> I think the first assumption no longer holds. Especially with the move
>>> to flexible compute environments I regularly get asked by folks what
>>> the smallest HBase they can start with for production. I can keep
>>> saying 3/5/7 nodes or whatever but I guarantee there are folks who
>>> want to and will run HBase with a single node. Probably those
>>> deployments won't want to have the distributed flag set. None of them
>>> really have a good option for where the WALs go, and failing loud when
>>> they try to go to LocalFileSystem is the best option I've seen so far
>>> to make sure folks realize they are getting into muddy waters.
>>> 
>>> I agree with the second assumption. Our quickstart in general is too
>>> complicated. Maybe if we include big warnings in the guide itself, we
>>> could make a quickstart specific artifact to download that has the
>>> unsafe disabling config in place?
>>> 
>>> Last fall I toyed with the idea of adding an "hbase-local" module to
>>> the hbase-filesystem repo that could start us out with some
>>> optimizations for single node set ups. We could start with a fork of
>>> RawLocalFileSystem (which will call OutputStream flush operations in
>>> response to hflush/hsync) that properly advertises its
>>> StreamCapabilities to say that it supports the operations we need.
>>> Alternatively we could make our own implementation of FileSystem that
>>> uses NIO stuff. Either of these approaches would solve both problems.
>>> 
>>> On Wed, Apr 15, 2020 at 11:40 AM Nick Dimiduk <ndimi...@apache.org>
>> wrote:
>>>> 
>>>> Hi folks,
>>>> 
>>>> I'd like to bring up the topic of the experience of new users as it
>>>> pertains to use of the `LocalFileSystem` and its associated (lack of)
>>> data
>>>> durability guarantees. By default, an unconfigured HBase runs with its
>>> root
>>>> directory on a `file:///` path. This patch is picked up as an instance
>> of
>>>> `LocalFileSystem`. Hadoop has long offered this class, but it has never
>>>> supported `hsync` or `hflush` stream characteristics. Thus, when HBase
>>> runs
>>>> on this configuration, it is unable to ensure that WAL writes are
>>> durable,
>>>> and thus will ACK a write without this assurance. This is the case,
>> even
>>>> when running in a fully durable WAL mode.
>>>> 
>>>> This impacts a new user, someone kicking the tires on HBase following
>> our
>>>> Getting Started docs. On Hadoop 2.8 and before, an unconfigured HBase
>>> will
>>>> WARN and cary on. Hadoop 2.10+, HBase will refuse to start. The book
>>>> describes a process of disabling enforcement of stream capability
>>>> enforcement as a first step. This is a mandatory configuration for
>>> running
>>>> HBase directly out of our binary distribution.
>>>> 
>>>> HBASE-24086 restores the behavior on Hadoop 2.10+ to that of running on
>>>> 2.8: log a warning and cary on. The critique of this approach is that
>>> it's
>>>> far too subtle, too quiet for a system operating in a state known to
>> not
>>>> provide data durability.
>>>> 
>>>> I have two assumptions/concerns around the state of things, which
>>> prompted
>>>> my solution on HBASE-24086 and the associated doc update on
>> HBASE-24106.
>>>> 
>>>> 1. No one should be running a production system on `LocalFileSystem`.
>>>> 
>>>> The initial implementation checked both for `LocalFileSystem` and
>>>> `hbase.cluster.distributed`. When running on the former and the latter
>> is
>>>> false, we assume the user is running a non-production deployment and
>>> carry
>>>> on with the warning. When the latter is true, we assume the user
>>> intended a
>>>> production deployment and the process terminates due to stream
>> capability
>>>> enforcement. Subsequent code review resulted in skipping the
>>>> `hbase.cluster.distributed` check and simply warning, as was done on
>> 2.8
>>>> and earlier.
>>>> 
>>>> (As I understand it, we've long used the `hbase.cluster.distributed`
>>>> configuration to decide if the user intends this runtime to be a
>>> production
>>>> deployment or not.)
>>>> 
>>>> Is this a faulty assumption? Is there a use-case we support where we
>>>> condone running production deployment on the non-durable
>>> `LocalFileSystem`?
>>>> 
>>>> 2. The Quick Start experience should require no configuration at all.
>>>> 
>>>> Our stack is difficult enough to run in a fully durable production
>>>> environment. We should make it a priority to ensure it's as easy as
>>>> possible to try out HBase. Forcing a user to make decisions about data
>>>> durability before they even launch the web ui is a terrible experience,
>>> in
>>>> my opinion, and should be a non-starter for us as a project.
>>>> 
>>>> (In my opinion, the need to configure either `hbase.rootdir` or
>>>> `hbase.tmp.dir` away from `/tmp` is equally bad for a Getting Started
>>>> experience. It is a second, more subtle question of data durability
>> that
>>> we
>>>> should avoid out of the box. But I'm happy to leave that for another
>>>> thread.)
>>>> 
>>>> Thank you for your time,
>>>> Nick
>>> 
>> 
>> 
>> --
>> Best regards,
>> Andrew
>> 
>> Words like orphans lost among the crosstalk, meaning torn from truth's
>> decrepit hands
>>   - A23, Crosstalk
>> 

Reply via email to