>From your first email:

bq. including some with many regions in the KB size

Do you know if the above was result of the operation(s) from normalizer ?
Since assuming you use standard max hfile size, there shouldn't be such
small regions.

Cheers

On Sat, Apr 21, 2018 at 10:18 AM, Tim Robertson <timrobertson...@gmail.com>
wrote:

> Thanks Ted,
>
> I should have been explicit - for the cases I've been working with they can
> make their apps effectively go "read-only" for this house keeping step.  At
> the end a change of app config or a couple of table name changes (short
> outage) would be needed.
>
> I've been using the SimpleNormalizer in 1.2.0 (CDH 5.12+) - I'll dig into
> the recent changes.  I had to run several iterations of small region
> merging, plus a few iterations of SimpleNormalization to get a decent
> result which took a long time (days). On Normalizer - I had wondered if an
> approach of determining a good set of splits up front might be portable
> into a Normalizer implementation.
>
> I suspect a one time rewrite is cheaper than normalization when a table is
> in really bad shape.
>
> Thanks again,
> Tim
>
>
>
>
> On Sat, Apr 21, 2018 at 6:59 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>
> > Looking at proposed flow, have you considered the new data coming in
> > between steps #a and #d ?
> >
> > Also, how would client application switch between the original table and
> > the new table ?
> >
> > BTW since you mentioned SimpleNormalizer, which release are you using
> (just
> > want to see if all recent fixes to SimpleNormalizer were in the version
> you
> > use) ?
> >
> > Cheers
> >
> > On Sat, Apr 21, 2018 at 9:48 AM, Tim Robertson <
> timrobertson...@gmail.com>
> > wrote:
> >
> > > Hi folks
> > >
> > > Recently I've seen a few clusters with badly unbalanced tables,
> including
> > > some with many regions in the KB size. It seems it is easy to overlook
> > this
> > > in ops.
> > >
> > > Understandably SimpleNormalizer does a fairly poor job at addressing
> > this -
> > > takes a long time, doesn't aggressively merge small regions, eagerly
> > splits
> > > well sized regions if many small ones exist etc. It works well if
> enabled
> > > on a well set up table though.
> > >
> > > I have been exploring approaches to tackle:
> > >   1) determining region splits for a one time bulk load into a presplit
> > > table[1] and
> > >   2) approaches to fixing really badly skewed tables.
> > >
> > > I was thinking of creating a Jira which I'd assign to myself to add a
> > > utility tool that would:
> > >
> > >   a) read the HFiles for a table (optionally performing a MC first to
> > > discard old edits)
> > >   b) analyze the block headers and determine splits that would take you
> > > back to regions at e.g. 80% hbase.hregion.max.filesize
> > >   c) create a new pre-split table
> > >   d) run a table copy (or bulkload?)
> > >
> > > Does such a thing exist anywhere and I'm just missing it, or does
> anyone
> > > know of a better approach please?
> > >
> > > Thoughts, criticism, requests very welcome.
> > >
> > > Thanks,
> > > Tim
> > >
> > > [1]
> > > https://github.com/opencore/hbase-bulk-load-balanced/blob/
> > > master/src/test/java/com/opencore/hbase/example/ExampleUsageTest.java
> > >
> >
>

Reply via email to