Looking at proposed flow, have you considered the new data coming in
between steps #a and #d ?

Also, how would client application switch between the original table and
the new table ?

BTW since you mentioned SimpleNormalizer, which release are you using (just
want to see if all recent fixes to SimpleNormalizer were in the version you
use) ?

Cheers

On Sat, Apr 21, 2018 at 9:48 AM, Tim Robertson <timrobertson...@gmail.com>
wrote:

> Hi folks
>
> Recently I've seen a few clusters with badly unbalanced tables, including
> some with many regions in the KB size. It seems it is easy to overlook this
> in ops.
>
> Understandably SimpleNormalizer does a fairly poor job at addressing this -
> takes a long time, doesn't aggressively merge small regions, eagerly splits
> well sized regions if many small ones exist etc. It works well if enabled
> on a well set up table though.
>
> I have been exploring approaches to tackle:
>   1) determining region splits for a one time bulk load into a presplit
> table[1] and
>   2) approaches to fixing really badly skewed tables.
>
> I was thinking of creating a Jira which I'd assign to myself to add a
> utility tool that would:
>
>   a) read the HFiles for a table (optionally performing a MC first to
> discard old edits)
>   b) analyze the block headers and determine splits that would take you
> back to regions at e.g. 80% hbase.hregion.max.filesize
>   c) create a new pre-split table
>   d) run a table copy (or bulkload?)
>
> Does such a thing exist anywhere and I'm just missing it, or does anyone
> know of a better approach please?
>
> Thoughts, criticism, requests very welcome.
>
> Thanks,
> Tim
>
> [1]
> https://github.com/opencore/hbase-bulk-load-balanced/blob/
> master/src/test/java/com/opencore/hbase/example/ExampleUsageTest.java
>

Reply via email to