[ 
https://issues.apache.org/jira/browse/HBASE-3373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12988329#action_12988329
 ] 

Matt Corgan commented on HBASE-3373:
------------------------------------

I can't really post my client code since it's intertwined with a bunch of other 
stuff, but I extracted the important parts into a junit test that i attached to 
this issue.  We run java (tomcat) so it's fairly easy to talk directly to hbase 
and integrate a few features into our admin console.  Printing friendly record 
names rather than escaped bytes, triggering backups, moving regions, etc...  
Don't think it requires knowing the keyspace ahead of time, just that you hash 
into a known output range, a 63 bit long in my example.

I think the consistent hashing scheme may be a good out-of-the-box methodology. 
 Even with something smarter, I'd worry about the underlying algorithms getting 
off course and starting a death spiral as bad outputs are fed back in creating 
even worse outputs.  Something like consistent hashing could be a good beacon 
to always be steering towards so things don't get too far off course.

I have about 20 tables with many different access patterns and I can't envision 
an algorithm that balances them truly well.  Everything could be going fine 
until I kick off a MR job that randomly digs up 100 very cold regions and find 
that they're all on the same server.

I'm thinking of a system where each region is either at home  (its consistent 
hash destination) or visiting another server because the balancer decided its 
home was too hot.  Each regionserver could identify it's hotter regions, and 
the balancer could move these around in an effort to smooth out the load.  In 
the mean time, colder regions would stay well distributed based on how good the 
hashing mechanism is.  If a regionserver cools down, the master brings home 
it's vacationing regions first, and if it's still cool, then it borrows someone 
else's hotter home regions.  Without an underlying scheme, I can envision 
things getting extremely chaotic, especially with regards to cold regions of a 
single table getting bundled up since they're being overlooked.  With this 
method, you're never too far from safely hitting the reset button.

...

Regarding your comment about moving the top or bottom child off the parent 
server after a split, I tend to prefer moving the bottom one.  With time series 
data it will keep writing to the bottom child, so if you don't move the bottom 
child then a single server will end up doing the appending forever.  I prefer 
to rotate the server that's doing the work even though it's not quite as 
efficient and may cause a longer split pause.... makes for a more balanced 
cluster.

> Allow regions of specific table to be load-balanced
> ---------------------------------------------------
>
>                 Key: HBASE-3373
>                 URL: https://issues.apache.org/jira/browse/HBASE-3373
>             Project: HBase
>          Issue Type: Improvement
>          Components: master
>    Affects Versions: 0.20.6
>            Reporter: Ted Yu
>             Fix For: 0.92.0
>
>
> From our experience, cluster can be well balanced and yet, one table's 
> regions may be badly concentrated on few region servers.
> For example, one table has 839 regions (380 regions at time of table 
> creation) out of which 202 are on one server.
> It would be desirable for load balancer to distribute regions for specified 
> tables evenly across the cluster. Each of such tables has number of regions 
> many times the cluster size.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to