Tools for balancing a poorly distributed table

Tim Robertson Sat, 21 Apr 2018 09:48:47 -0700

Hi folks

Recently I've seen a few clusters with badly unbalanced tables, including
some with many regions in the KB size. It seems it is easy to overlook this
in ops.


Understandably SimpleNormalizer does a fairly poor job at addressing this -
takes a long time, doesn't aggressively merge small regions, eagerly splits
well sized regions if many small ones exist etc. It works well if enabled
on a well set up table though.

I have been exploring approaches to tackle:
  1) determining region splits for a one time bulk load into a presplit
table[1] and
  2) approaches to fixing really badly skewed tables.

I was thinking of creating a Jira which I'd assign to myself to add a
utility tool that would:

  a) read the HFiles for a table (optionally performing a MC first to
discard old edits)
  b) analyze the block headers and determine splits that would take you
back to regions at e.g. 80% hbase.hregion.max.filesize
  c) create a new pre-split table
  d) run a table copy (or bulkload?)

Does such a thing exist anywhere and I'm just missing it, or does anyone
know of a better approach please?

Thoughts, criticism, requests very welcome.

Thanks,
Tim

[1]
https://github.com/opencore/hbase-bulk-load-balanced/blob/master/src/test/java/com/opencore/hbase/example/ExampleUsageTest.java

Tools for balancing a poorly distributed table

Reply via email to