Josh:
Please capture the following in design doc.

Thanks

On Wed, Nov 2, 2016 at 3:28 PM, Enis Söztutar <enis....@gmail.com> wrote:

> Thanks Andrew,
>
> I forgot to mention that we have considered using the HDFS quota
> enforcement directly as well, but decided against it for a couple of
> reasons.
>  - Our current layout has files in the data directory, as well as archive
> directory and WALs, etc. Since there is no option for HDFS quotas to span
> multiple directories, we can only use the HDFS quotas for main data files,
> and not snapshots, etc unless we do major surgery in our file layouts. This
> will get more complicated if we want to do flat layout, etc later on.
>  - Since WALs would not be in any namespace unless we do wal-per-namespace,
> that means that once a single NS's HDFS quota is reached, it might affect
> everybody else and potentially cause havoc on the cluster. The problem
> would be that if a single NS is out of space, we cannot perform flushes at
> all. This would cause the WALs to be backed up and kept forever and affect
> all of the other regions from different tables / namespaces causing
> unavailability for unrelated tables. Wal-per-namespace also has to be
> implemented and WALs be moved under a shared NS directory to share the data
> and WAL requiring further layout changes. It also will not be optimal if
> there is a large number of namespaces.
>  - Will only work with HDFS, while HBase can use other file systems.
>
> Enis
>
> On Wed, Nov 2, 2016 at 3:01 PM, Andrew Purtell <apurt...@apache.org>
> wrote:
>
> > Another approach to hard limits could be pushing the quota down to the
> HDFS
> > level, because HDFS would have a very accurate assessment of quota
> > utilization at all times, but this would only work with HDFS and impose
> > limits on how HBase structures storage on the filesystem (e.g. all files
> > for a namespace must be under a common root). Still, implementation would
> > be "easy": over hard quota, all allocations would fail, the bulk of the
> > effort is hardening response to allocation failures.
> >
> > On Wed, Nov 2, 2016 at 1:11 PM, Enis Söztutar <e...@apache.org> wrote:
> >
> > > Thanks Josh for the doc and pursuing this.
> > >
> > > I was involved with some of the design choices so consider me a +1 on
> the
> > > general approach. One topic which is not covered here is that the other
> > > design decision that we could have pursued is a more strict control on
> > the
> > > quota usage so that we would always guarantee that the namespace /
> table
> > > cannot use more than allocated disk space. This hard-limit approach
> would
> > > differ from the proposed "soft-limit" approach because the soft limit
> > > approach can end up overusing the disk space by a small amount (because
> > it
> > > takes time to detect the quota limit is reached and enforcing of the
> > > limit).
> > >
> > > The hard-limit approach maybe built by doing a lease kind of mechanism
> > > where the master gives away disk space leases to region servers from
> the
> > > remaining limit, and the regionservers make sure that they cannot
> > allocate
> > > more space than the lease dictates. By ensuring that the space is
> > > pre-allocated via leases, we can always make sure that strict limits
> are
> > > applied. Though, this approach would be harder to build and stabilize
> > > because it will need new mechanisms for distributing and managing this
> > kind
> > > of leases as well as tuning the allocations to make sure that
> > regionservers
> > > never block flushes or compactions due to lack of lease in time would
> > prove
> > > challenging to get it right.
> > >
> > > We generally think that the "soft-limit" approach would be a good
> enough
> > > approximation and the error bounds on over-allocation would be minimal
> > and
> > > negligible in production.  Thus, the proposal is to implement the soft
> > > approach with good documentation about how much space can be
> > over-allocated
> > > in a worst-case scenario.
> > >
> > > Enis
> > >
> > > On Wed, Nov 2, 2016 at 12:15 PM, Josh Elser <els...@apache.org> wrote:
> > >
> > > > Thanks for the reviews so far, Ted and Stack. The comments were great
> > and
> > > > much appreciated.
> > > >
> > > > Interpreting consensus from lack of objection, I'm going to move
> ahead
> > in
> > > > earnest starting to work on what was described in the doc. Expect to
> > see
> > > > some work break-out happening under HBASE-16961 and patches starting
> to
> > > > land.
> > > >
> > > > I'm also happy to entertain more discussion if anyone hasn't found
> the
> > > > time to read/comment yet.
> > > >
> > > > Thanks!
> > > >
> > > > - Josh
> > > >
> > > >
> > > > Josh Elser wrote:
> > > >
> > > >> Sure thing, Ted.
> > > >>
> > > >> https://docs.google.com/document/d/1VtLWDkB2tpwc_zgCNPE1ulZO
> > > >> eecF-YA2FYSK3TSs_bw/edit?usp=sharing
> > > >>
> > > >>
> > > >> Let me open an umbrella issue for now. I can break up the work
> later.
> > > >>
> > > >> https://issues.apache.org/jira/browse/HBASE-16961
> > > >>
> > > >> Ted Yu wrote:
> > > >>
> > > >>> Josh:
> > > >>> Can you put the doc in google doc so that people can comment on it
> ?
> > > >>>
> > > >>> Is there a JIRA opened for this work ?
> > > >>> Please open one if there is none.
> > > >>>
> > > >>> Thanks
> > > >>>
> > > >>> On Fri, Oct 28, 2016 at 9:00 AM, Josh Elser<els...@apache.org>
> > wrote:
> > > >>>
> > > >>> Hi folks,
> > > >>>>
> > > >>>> I'd like to propose the introduction of FileSystem quotas to
> HBase.
> > > >>>>
> > > >>>> Here's a design doc[1] available which (hopefully) covers all of
> the
> > > >>>> salient points of what I think an initial version of such a
> feature
> > > >>>> would
> > > >>>> include.
> > > >>>>
> > > >>>> tl;dr We can define quotas on tables and namespaces. Region size
> is
> > > >>>> computed by RegionServers and sent to the Master. The Master
> > inspects
> > > >>>> the
> > > >>>> sizes of Regions, rolling up to table and namespace sizes. Defined
> > > >>>> quotas
> > > >>>> in the quota table are evaluated given the computed sizes, and,
> for
> > > >>>> those
> > > >>>> tables/namespaces violating the quota, RegionServers are informed
> to
> > > >>>> take
> > > >>>> some action to limit any further filesystem growth by that
> > > >>>> table/namespace.
> > > >>>>
> > > >>>> I'd encourage you to give the document a read -- I tried to cover
> as
> > > >>>> much
> > > >>>> as I could without getting unnecessarily bogged down in
> > implementation
> > > >>>> details.
> > > >>>>
> > > >>>> Feedback is, of course, welcomed. I'd like to start sketching out
> a
> > > >>>> breakdown of the work (all writing and no programming makes Josh a
> > sad
> > > >>>> boy). I'm happy to field any/all questions. Thanks in advance.
> > > >>>>
> > > >>>> - Josh
> > > >>>>
> > > >>>> [1] http://home.apache.org/~elserj/hbase/FileSystemQuotasforApac
> > > >>>> heHBase.pdf
> > > >>>>
> > > >>>>
> > > >>>
> > >
> >
> >
> >
> > --
> > Best regards,
> >
> >    - Andy
> >
> > Problems worthy of attack prove their worth by hitting back. - Piet Hein
> > (via Tom White)
> >
>

Reply via email to