For #2, currently SimpleRegionNormalizer only takes into account region sizes. Meaning, even if the regions under consideration are receiving heavy read / write requests, the regions may still be chosen to be split / merged. It seems the normalizer should also consider read / write requests so that more intelligent decision can be made. e.g. if (small) region A receives heavy write load, merge involving the region can be delayed so that the normalizer wouldn't soon decide to split the merged region.
On Sat, Apr 21, 2018 at 7:35 PM, Lars Francke <lars.fran...@gmail.com> wrote: > Thanks Tim! > > I've seen both problems myself (independent of Tim I should add, even > though he helps us here) and continue to see them at customers almost every > month. > > 1) Hard to determine proper pre-split points: Everyone I meet either just > guesses or writes their own little program that basically does the same as > the little program another customer wrote. So I definitely think having > such a tool would be useful. We could probably make it pretty easy. For > people that have already written a BulkLoad using the HFileOutputFormat2 we > can just ignore the values and focus on keys only. Ideally (Tim, correct me > if I'm wrong) there should be no code changes for the user. > > 2) Approaches to fix badly skewed tables: > Definitely. My use-case here is different than Tim's. Tim can afford to > write into a new table which my customers usually can't easily do. So for > those kinds of users it would be great to have two things: > a) An easier way (the current UI shows lots of stats etc. but doesn't help > much with "insight" - so you need the knowledge of HBase's inner workings > to understand what those numbers mean > b) A defined way to get a HBase table into a good state. As you said: A > more aggressive Normalizer sounds like a great solution to me. As the > Normalizer is set on the cluster level I'm not sure what the best solution > would be here. It'd be great if we could start a one-off Normalizer from > the shell using a different class. I also don't see an obvious way to pass > in options to an invocation of the Normalizer. > > The Javadoc for RegionNormalizer says: "Please note that overly aggressive > normalization rules (attempting to make all regions perfectly equal in > size) could potentially lead to "split/merge storms" and I agree with that. > This "new" Normalizer shouldn't be the default. > > Cheers, > Lars > > On Sun, Apr 22, 2018 at 1:35 AM, Tim <timrobertson...@gmail.com> wrote: > > > Definitely not normalizer - wasn’t enabled at that point. I believe it > > happened in 2 places because of: > > > > 1) poorly implemented bulk load pre-split strategy > > 2) integer keys and lots of row deletions > > > > Seem plausible? > > > > Thanks > > Tim, > > Sent from my iPhone > > > > > On 21 Apr 2018, at 19:29, Ted Yu <yuzhih...@gmail.com> wrote: > > > > > > From your first email: > > > > > > bq. including some with many regions in the KB size > > > > > > Do you know if the above was result of the operation(s) from > normalizer ? > > > Since assuming you use standard max hfile size, there shouldn't be such > > > small regions. > > > > > > Cheers > > > > > > On Sat, Apr 21, 2018 at 10:18 AM, Tim Robertson < > > timrobertson...@gmail.com> > > > wrote: > > > > > >> Thanks Ted, > > >> > > >> I should have been explicit - for the cases I've been working with > they > > can > > >> make their apps effectively go "read-only" for this house keeping > > step. At > > >> the end a change of app config or a couple of table name changes > (short > > >> outage) would be needed. > > >> > > >> I've been using the SimpleNormalizer in 1.2.0 (CDH 5.12+) - I'll dig > > into > > >> the recent changes. I had to run several iterations of small region > > >> merging, plus a few iterations of SimpleNormalization to get a decent > > >> result which took a long time (days). On Normalizer - I had wondered > if > > an > > >> approach of determining a good set of splits up front might be > portable > > >> into a Normalizer implementation. > > >> > > >> I suspect a one time rewrite is cheaper than normalization when a > table > > is > > >> in really bad shape. > > >> > > >> Thanks again, > > >> Tim > > >> > > >> > > >> > > >> > > >>> On Sat, Apr 21, 2018 at 6:59 PM, Ted Yu <yuzhih...@gmail.com> wrote: > > >>> > > >>> Looking at proposed flow, have you considered the new data coming in > > >>> between steps #a and #d ? > > >>> > > >>> Also, how would client application switch between the original table > > and > > >>> the new table ? > > >>> > > >>> BTW since you mentioned SimpleNormalizer, which release are you using > > >> (just > > >>> want to see if all recent fixes to SimpleNormalizer were in the > version > > >> you > > >>> use) ? > > >>> > > >>> Cheers > > >>> > > >>> On Sat, Apr 21, 2018 at 9:48 AM, Tim Robertson < > > >> timrobertson...@gmail.com> > > >>> wrote: > > >>> > > >>>> Hi folks > > >>>> > > >>>> Recently I've seen a few clusters with badly unbalanced tables, > > >> including > > >>>> some with many regions in the KB size. It seems it is easy to > overlook > > >>> this > > >>>> in ops. > > >>>> > > >>>> Understandably SimpleNormalizer does a fairly poor job at addressing > > >>> this - > > >>>> takes a long time, doesn't aggressively merge small regions, eagerly > > >>> splits > > >>>> well sized regions if many small ones exist etc. It works well if > > >> enabled > > >>>> on a well set up table though. > > >>>> > > >>>> I have been exploring approaches to tackle: > > >>>> 1) determining region splits for a one time bulk load into a > presplit > > >>>> table[1] and > > >>>> 2) approaches to fixing really badly skewed tables. > > >>>> > > >>>> I was thinking of creating a Jira which I'd assign to myself to add > a > > >>>> utility tool that would: > > >>>> > > >>>> a) read the HFiles for a table (optionally performing a MC first to > > >>>> discard old edits) > > >>>> b) analyze the block headers and determine splits that would take > you > > >>>> back to regions at e.g. 80% hbase.hregion.max.filesize > > >>>> c) create a new pre-split table > > >>>> d) run a table copy (or bulkload?) > > >>>> > > >>>> Does such a thing exist anywhere and I'm just missing it, or does > > >> anyone > > >>>> know of a better approach please? > > >>>> > > >>>> Thoughts, criticism, requests very welcome. > > >>>> > > >>>> Thanks, > > >>>> Tim > > >>>> > > >>>> [1] > > >>>> https://github.com/opencore/hbase-bulk-load-balanced/blob/ > > >>>> master/src/test/java/com/opencore/hbase/example/ > ExampleUsageTest.java > > >>>> > > >>> > > >> > > >