[jira] [Commented] (HBASE-3373) Allow regions of specific table to be load-balanced

2012-01-02 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13178588#comment-13178588
 ] 

Zhihong Yu commented on HBASE-3373:
---

The following data structure is introduced:
MapString, MapServerName, ListHRegionInfo tableRegionDistribution
The String key is the name of table. The list holds the regions for the table 
on ServerName.

In HMaster.balance(), before calling balancer.balanceCluster(assignments), we 
translate (regroup) MapServerName, ListHRegionInfo returned by 
assignmentManager.getAssignments() into MapString, MapServerName, 
ListHRegionInfo. Then we iterate over the tables and call 
balancer.balanceCluster(assignments) for each table.

Additionally, new HBaseAdmin method can be added to filter tables which don't 
need to be balanced.

The ultimate goal is to distribute table regions evenly

 Allow regions of specific table to be load-balanced
 ---

 Key: HBASE-3373
 URL: https://issues.apache.org/jira/browse/HBASE-3373
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.20.6
Reporter: Ted Yu
 Attachments: 3373.txt, HbaseBalancerTest2.java


 From our experience, cluster can be well balanced and yet, one table's 
 regions may be badly concentrated on few region servers.
 For example, one table has 839 regions (380 regions at time of table 
 creation) out of which 202 are on one server.
 It would be desirable for load balancer to distribute regions for specified 
 tables evenly across the cluster. Each of such tables has number of regions 
 many times the cluster size.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3373) Allow regions of specific table to be load-balanced

2011-11-30 Thread Ben West (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13160177#comment-13160177
 ] 

Ben West commented on HBASE-3373:
-

We're running 0.94 and ran into this. With 4 region servers, we had one table 
with ~1800 regions, evenly balanced. We then used importtsv to import ~300 
regions of a new table. We ended up with virtually all regions on one server; 
when I look at the master's log it looks like there were 159 rebalances (which 
makes sense); 123 were moving regions from the old table, and 26 moved new 
table regions. The result is that about 90% of the regions of the new table are 
on one server.

When I look at DefaultLoadBalancer.balanceCluster, it has:

{CODE}
// fetch in alternate order if there is new region server
if (emptyRegionServerPresent) {
  fetchFromTail = !fetchFromTail;
}
{CODE}

so we're only doing the randomization stuff in HBASE-3609 if there's a new 
region server? Is there a reason we don't do this all the time?

 Allow regions of specific table to be load-balanced
 ---

 Key: HBASE-3373
 URL: https://issues.apache.org/jira/browse/HBASE-3373
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.20.6
Reporter: Ted Yu
 Attachments: HbaseBalancerTest2.java


 From our experience, cluster can be well balanced and yet, one table's 
 regions may be badly concentrated on few region servers.
 For example, one table has 839 regions (380 regions at time of table 
 creation) out of which 202 are on one server.
 It would be desirable for load balancer to distribute regions for specified 
 tables evenly across the cluster. Each of such tables has number of regions 
 many times the cluster size.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3373) Allow regions of specific table to be load-balanced

2011-11-30 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13160233#comment-13160233
 ] 

Ted Yu commented on HBASE-3373:
---

@Ben:
Thanks for trying out 0.94

The code snippet above deals with region server which recently joined the 
cluster. Its goal is to avoid hot region server which receives above average 
load.
This is part of the changes from HBASE-3609. The randomization is done on this 
line:
{code}
Collections.shuffle(sns, RANDOM);
{code}
where we schedule regions to region servers which are shuffled randomly.

Your observation about unbalanced table(s) in the cluster is valid. This is due 
to master not passing per-table region distribution to balanceCluster().
I have a patch which is in internal repository where master calls 
balanceCluster() for each table.
Once we test it in production cluster, I should be able to contribute back.

 Allow regions of specific table to be load-balanced
 ---

 Key: HBASE-3373
 URL: https://issues.apache.org/jira/browse/HBASE-3373
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.20.6
Reporter: Ted Yu
 Attachments: HbaseBalancerTest2.java


 From our experience, cluster can be well balanced and yet, one table's 
 regions may be badly concentrated on few region servers.
 For example, one table has 839 regions (380 regions at time of table 
 creation) out of which 202 are on one server.
 It would be desirable for load balancer to distribute regions for specified 
 tables evenly across the cluster. Each of such tables has number of regions 
 many times the cluster size.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3373) Allow regions of specific table to be load-balanced

2011-11-30 Thread Ben West (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13160277#comment-13160277
 ] 

Ben West commented on HBASE-3373:
-

@Ted: Thanks! I think I was looking at trunk instead of .94, I see that in .94 
it should be random:

{code}
ListHRegionInfo regions = randomize(server.getValue());
{code}

Your per-table balance will be very useful for this case. Look forward to 
seeing it!

 Allow regions of specific table to be load-balanced
 ---

 Key: HBASE-3373
 URL: https://issues.apache.org/jira/browse/HBASE-3373
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.20.6
Reporter: Ted Yu
 Attachments: HbaseBalancerTest2.java


 From our experience, cluster can be well balanced and yet, one table's 
 regions may be badly concentrated on few region servers.
 For example, one table has 839 regions (380 regions at time of table 
 creation) out of which 202 are on one server.
 It would be desirable for load balancer to distribute regions for specified 
 tables evenly across the cluster. Each of such tables has number of regions 
 many times the cluster size.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3373) Allow regions of specific table to be load-balanced

2011-11-30 Thread Jean-Daniel Cryans (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13160283#comment-13160283
 ] 

Jean-Daniel Cryans commented on HBASE-3373:
---

Just to clear up some confusion, trunk is going to be 0.94 so what you're 
playing with is probably 0.90.4

 Allow regions of specific table to be load-balanced
 ---

 Key: HBASE-3373
 URL: https://issues.apache.org/jira/browse/HBASE-3373
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.20.6
Reporter: Ted Yu
 Attachments: HbaseBalancerTest2.java


 From our experience, cluster can be well balanced and yet, one table's 
 regions may be badly concentrated on few region servers.
 For example, one table has 839 regions (380 regions at time of table 
 creation) out of which 202 are on one server.
 It would be desirable for load balancer to distribute regions for specified 
 tables evenly across the cluster. Each of such tables has number of regions 
 many times the cluster size.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3373) Allow regions of specific table to be load-balanced

2011-05-30 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13041395#comment-13041395
 ] 

stack commented on HBASE-3373:
--

The need for this issue keeps coming up.  I'm not sure if it the requesters are 
post 0.90.2 (and HBASE-3586).  I'd think the latter should have made our story 
better but maybe not (We should get folks to note if they are complaining post 
0.90.2 or not).

 Allow regions of specific table to be load-balanced
 ---

 Key: HBASE-3373
 URL: https://issues.apache.org/jira/browse/HBASE-3373
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.20.6
Reporter: Ted Yu
 Fix For: 0.92.0

 Attachments: HbaseBalancerTest2.java


 From our experience, cluster can be well balanced and yet, one table's 
 regions may be badly concentrated on few region servers.
 For example, one table has 839 regions (380 regions at time of table 
 creation) out of which 202 are on one server.
 It would be desirable for load balancer to distribute regions for specified 
 tables evenly across the cluster. Each of such tables has number of regions 
 many times the cluster size.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3373) Allow regions of specific table to be load-balanced

2011-04-13 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019635#comment-13019635
 ] 

Ted Yu commented on HBASE-3373:
---

Suggestion from Stan Barton:
This JIRA can be generalized as a new policy for load balancer. That is, to 
have balanced number of
regions per RS per table and not in total number of regions from all tables.

 Allow regions of specific table to be load-balanced
 ---

 Key: HBASE-3373
 URL: https://issues.apache.org/jira/browse/HBASE-3373
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.20.6
Reporter: Ted Yu
 Fix For: 0.92.0

 Attachments: HbaseBalancerTest2.java


 From our experience, cluster can be well balanced and yet, one table's 
 regions may be badly concentrated on few region servers.
 For example, one table has 839 regions (380 regions at time of table 
 creation) out of which 202 are on one server.
 It would be desirable for load balancer to distribute regions for specified 
 tables evenly across the cluster. Each of such tables has number of regions 
 many times the cluster size.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3373) Allow regions of specific table to be load-balanced

2011-04-01 Thread gaojinchao (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014906#comment-13014906
 ] 

gaojinchao commented on HBASE-3373:
---

In hbase version 0.20.6, If contiguous regions, do not assign adjacent 
regions in same region server. So it can break daughters of splits in same 
region server and avoid hot spot. The performance can improve.

In version 0.90.1, daughter is opened in region server that his parent is 
opened.
In the case A region server has thousands of regions. the contiguous region is 
difficult to
Choose by random. So the region server always is hot spot. 

Should the balance method be choose the contiguous region and then random or 
other way avoid hot spot? (eg: add configue parameter choose balance method 
base on applications ?)

 Allow regions of specific table to be load-balanced
 ---

 Key: HBASE-3373
 URL: https://issues.apache.org/jira/browse/HBASE-3373
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.20.6
Reporter: Ted Yu
 Fix For: 0.92.0

 Attachments: HbaseBalancerTest2.java


 From our experience, cluster can be well balanced and yet, one table's 
 regions may be badly concentrated on few region servers.
 For example, one table has 839 regions (380 regions at time of table 
 creation) out of which 202 are on one server.
 It would be desirable for load balancer to distribute regions for specified 
 tables evenly across the cluster. Each of such tables has number of regions 
 many times the cluster size.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3373) Allow regions of specific table to be load-balanced

2011-04-01 Thread zhoushuaifeng (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014927#comment-13014927
 ] 

zhoushuaifeng commented on HBASE-3373:
--

Agree with Gao's comments. When the region are splitting, it usually gets more 
write operations. It's better to assign the daughters to different 
regionservers to avoid hot spot.

 Allow regions of specific table to be load-balanced
 ---

 Key: HBASE-3373
 URL: https://issues.apache.org/jira/browse/HBASE-3373
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.20.6
Reporter: Ted Yu
 Fix For: 0.92.0

 Attachments: HbaseBalancerTest2.java


 From our experience, cluster can be well balanced and yet, one table's 
 regions may be badly concentrated on few region servers.
 For example, one table has 839 regions (380 regions at time of table 
 creation) out of which 202 are on one server.
 It would be desirable for load balancer to distribute regions for specified 
 tables evenly across the cluster. Each of such tables has number of regions 
 many times the cluster size.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (HBASE-3373) Allow regions of specific table to be load-balanced

2011-02-07 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12991720#comment-12991720
 ] 

Ted Yu commented on HBASE-3373:
---

The loop in balanceCluster(MapHServerInfo, ListHRegionInfo) which fills out 
destination servers for regionsToMove currently iterates underloadedServers in 
the same direction (from index 0 up).
This works for load metric being number of regions.
If we use number of requests as load metric, we should iterate from index 0 up, 
then back down to index 0, so forth. This would give us better load 
distribution.

 Allow regions of specific table to be load-balanced
 ---

 Key: HBASE-3373
 URL: https://issues.apache.org/jira/browse/HBASE-3373
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.20.6
Reporter: Ted Yu
 Fix For: 0.92.0

 Attachments: HbaseBalancerTest2.java


 From our experience, cluster can be well balanced and yet, one table's 
 regions may be badly concentrated on few region servers.
 For example, one table has 839 regions (380 regions at time of table 
 creation) out of which 202 are on one server.
 It would be desirable for load balancer to distribute regions for specified 
 tables evenly across the cluster. Each of such tables has number of regions 
 many times the cluster size.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HBASE-3373) Allow regions of specific table to be load-balanced

2011-01-28 Thread Matt Corgan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12988214#action_12988214
 ] 

Matt Corgan commented on HBASE-3373:


Have you guys considered using a consistent hashing method to choose which 
server a region belongs to?  You would create ~50 buckets for each server by 
hashing serverName_port_bucketNum, and then hash the start key of each region 
into the buckets.

There are a few benefits:

* when you add a server it takes an equal load from all existing servers
* if you remove a server it distributes its regions equally to the remaining 
servers
* adding a server does not cause all regions to shuffle like round robin 
assignment would
* assignment is nearly random, but repeatable, so no hot spots
* when a region splits the front half will stay on the same server, but the 
back half will usually be sent to another server

And a few drawbacks:

* each server wouldn't end up with exactly the same number of regions, but they 
would be close
* if a hot spot does end up developing, you can't do anything about it, at 
least not unless it supported a list of manual overrides



 Allow regions of specific table to be load-balanced
 ---

 Key: HBASE-3373
 URL: https://issues.apache.org/jira/browse/HBASE-3373
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.20.6
Reporter: Ted Yu
 Fix For: 0.92.0


 From our experience, cluster can be well balanced and yet, one table's 
 regions may be badly concentrated on few region servers.
 For example, one table has 839 regions (380 regions at time of table 
 creation) out of which 202 are on one server.
 It would be desirable for load balancer to distribute regions for specified 
 tables evenly across the cluster. Each of such tables has number of regions 
 many times the cluster size.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3373) Allow regions of specific table to be load-balanced

2011-01-28 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12988221#action_12988221
 ] 

Jonathan Gray commented on HBASE-3373:
--

I think consistent hashing would be a major step backwards for us and 
unnecessary because there is no cost of moving bits around in HBase.  The 
primary benefit of consistent hashing is that it reduces the amount of data you 
have to physically move around.  Because of our use of HDFS, we never have to 
move physical data around.

In your benefit list, we are already implementing almost all of these features, 
or if not, it is possible in the current architecture.  In addition, our 
architecture is extremely flexible and we can do all kinds of interesting load 
balancing techniques related to actual load profiles not just #s of 
shards/buckets as we do today or as would be done with consistent hashing.

 Allow regions of specific table to be load-balanced
 ---

 Key: HBASE-3373
 URL: https://issues.apache.org/jira/browse/HBASE-3373
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.20.6
Reporter: Ted Yu
 Fix For: 0.92.0


 From our experience, cluster can be well balanced and yet, one table's 
 regions may be badly concentrated on few region servers.
 For example, one table has 839 regions (380 regions at time of table 
 creation) out of which 202 are on one server.
 It would be desirable for load balancer to distribute regions for specified 
 tables evenly across the cluster. Each of such tables has number of regions 
 many times the cluster size.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3373) Allow regions of specific table to be load-balanced

2011-01-28 Thread Matt Corgan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12988229#action_12988229
 ] 

Matt Corgan commented on HBASE-3373:


Gotcha.  I guess I was thinking of it more as a quick upgrade to the current 
load balancer which only looks at region count.  We store a lot of time series 
data, and regions that split were left on the same server while it moved cold 
regions off.  I wrote a little client side consistent hashing balancer that 
solved the problem in our case, but there are definitely better ways.  
Consistent hashing also binds regions to severs across cluster restarts which 
helps keep regions near their last major compacted hdfs file.

Whatever balancing scheme you do use, don't you need some starting point for 
randomly distributing the regions?  If no other data is available or you need a 
tie breaker, maybe consistent hashing is better than round robin or purely 
random placement.

 Allow regions of specific table to be load-balanced
 ---

 Key: HBASE-3373
 URL: https://issues.apache.org/jira/browse/HBASE-3373
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.20.6
Reporter: Ted Yu
 Fix For: 0.92.0


 From our experience, cluster can be well balanced and yet, one table's 
 regions may be badly concentrated on few region servers.
 For example, one table has 839 regions (380 regions at time of table 
 creation) out of which 202 are on one server.
 It would be desirable for load balancer to distribute regions for specified 
 tables evenly across the cluster. Each of such tables has number of regions 
 many times the cluster size.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3373) Allow regions of specific table to be load-balanced

2011-01-28 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12988249#action_12988249
 ] 

Ted Yu commented on HBASE-3373:
---

This is what I added in HMaster:
{code}
  /**
   * Evenly distributes the regions of the tables (assuming the number of 
regions is much bigger
   *  than the number of region servers)
   * @param tableNames tables to load balance
   * @param ttl Time-to-live for load balance request. If negative, request is 
withdrawn
   * @throws IOException e
   */
  public void loadBalanceTable(final byte [][] tableNames, long ttl) throws 
IOException {
{code}

Our production environment has 150 to 300 tables. We run flow sequentially. 
Each flow creates about 10 new tables.
The above API would allow load balancer to distribute hot (recently split) 
regions off certain region server(s).

 Allow regions of specific table to be load-balanced
 ---

 Key: HBASE-3373
 URL: https://issues.apache.org/jira/browse/HBASE-3373
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.20.6
Reporter: Ted Yu
 Fix For: 0.92.0


 From our experience, cluster can be well balanced and yet, one table's 
 regions may be badly concentrated on few region servers.
 For example, one table has 839 regions (380 regions at time of table 
 creation) out of which 202 are on one server.
 It would be desirable for load balancer to distribute regions for specified 
 tables evenly across the cluster. Each of such tables has number of regions 
 many times the cluster size.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3373) Allow regions of specific table to be load-balanced

2011-01-28 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12988269#action_12988269
 ] 

Jonathan Gray commented on HBASE-3373:
--

Both of your solutions are rather specialized and I'm not sure generally 
applicable.  I would much prefer spending effort on improving our current load 
balancer and it seems to me that it would be possible to implement similar 
behaviors in a more generalized way.

Also, the addition of an HBaseAdmin region move API makes it so you don't need 
to muck with HBase server code to do specialized balancing logic.  With the 
current APIs, it's possible to basically push the balancer out into your own 
client.

@Matt, I don't think I'm really understanding how you upgrade our load balancer 
w/ consistent hashing?

The fact that split regions open back up on the same server is actually an 
optimization in many cases because it reduces the amount of time the regions 
are offline and when they come back online and do a compaction to drop 
references, all the files are more likely to be on the local DataNode rather 
than remote.  In some cases, like time-series, you may want the splits to move 
to different servers.  I could imagine some configurable logic in there to 
ensure the bottom half goes to a different server (or maybe the top half would 
actually be more efficient to move away since most the time you'll write more 
to the bottom half and thus want the data locality / quick turnaround).  
There's likely going to be a bit of split rework in 0.92 to make it more like 
the ZK-based regions-in-transition.

As far as binding regions to servers between cluster restarts, this is already 
implemented and on by default in 0.90.

Consistent hashing also requires a fixed keyspace (right?) and that's a 
mismatch for HBase's flexibility in this regard.

Do you have any code for this client-side consistent hashing balancer?  I'm 
confused about how that could be implemented without knowing a lot about your 
data, the regions, the servers available, etc.

 Allow regions of specific table to be load-balanced
 ---

 Key: HBASE-3373
 URL: https://issues.apache.org/jira/browse/HBASE-3373
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.20.6
Reporter: Ted Yu
 Fix For: 0.92.0


 From our experience, cluster can be well balanced and yet, one table's 
 regions may be badly concentrated on few region servers.
 For example, one table has 839 regions (380 regions at time of table 
 creation) out of which 202 are on one server.
 It would be desirable for load balancer to distribute regions for specified 
 tables evenly across the cluster. Each of such tables has number of regions 
 many times the cluster size.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3373) Allow regions of specific table to be load-balanced

2011-01-28 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12988290#action_12988290
 ] 

Ted Yu commented on HBASE-3373:
---

Hive, for instance, may create new (intermediate) tables for its map/reduce 
jobs.
The add-on I propose would be beneficial for that scenario.

Also, in LoadBalancer.balanceCluster(), line 210:
{code}
  while(numTaken  numToTake  regionidx  regionsToMove.size()) {
regionsToMove.get(regionidx).setDestination(server.getKey());
numTaken++;
regionidx++;
  }
{code}
The above code would offload regions from most loaded server to most 
underloaded server.
It is more desirable to round-robin the regions from loaded server(s) to 
underloaded server(s) so that the new daughter regions don't stay on the same 
server.

 Allow regions of specific table to be load-balanced
 ---

 Key: HBASE-3373
 URL: https://issues.apache.org/jira/browse/HBASE-3373
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.20.6
Reporter: Ted Yu
 Fix For: 0.92.0


 From our experience, cluster can be well balanced and yet, one table's 
 regions may be badly concentrated on few region servers.
 For example, one table has 839 regions (380 regions at time of table 
 creation) out of which 202 are on one server.
 It would be desirable for load balancer to distribute regions for specified 
 tables evenly across the cluster. Each of such tables has number of regions 
 many times the cluster size.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3373) Allow regions of specific table to be load-balanced

2011-01-28 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12988294#action_12988294
 ] 

Jonathan Gray commented on HBASE-3373:
--

Round-robin assignment at table creation is fine.  Bypassing the load balancer 
and doing your own thing is fine.  Adding intelligence into the balancer to get 
good balance of load is great.

I'm -1 on adding these kinds of specialized hooks into HBase proper.  They 
should either be an external component (seems that they can be) or we should 
make the balancer pluggable and you could provide alternative/configurable 
balancer implementations.

Assigning overloaded regions in a round-robin way to underloaded does make 
sense.  Would be happy to take a contribution to do that.  I'm not sure there's 
a very strong correlation with that and splitting up of daughter regions.  It 
certainly could be the case, but selection of which regions to move off an 
overloaded server is rather dumb so no guarantees that recently split regions 
get reassigned.

 Allow regions of specific table to be load-balanced
 ---

 Key: HBASE-3373
 URL: https://issues.apache.org/jira/browse/HBASE-3373
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.20.6
Reporter: Ted Yu
 Fix For: 0.92.0


 From our experience, cluster can be well balanced and yet, one table's 
 regions may be badly concentrated on few region servers.
 For example, one table has 839 regions (380 regions at time of table 
 creation) out of which 202 are on one server.
 It would be desirable for load balancer to distribute regions for specified 
 tables evenly across the cluster. Each of such tables has number of regions 
 many times the cluster size.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3373) Allow regions of specific table to be load-balanced

2011-01-28 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12988340#action_12988340
 ] 

Ted Yu commented on HBASE-3373:
---

We should sort regionsToMove by the creation time of regions. The rationale is 
that new regions tend to be the hot ones and should be round-robin assigned to 
underloaded servers.

 Allow regions of specific table to be load-balanced
 ---

 Key: HBASE-3373
 URL: https://issues.apache.org/jira/browse/HBASE-3373
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.20.6
Reporter: Ted Yu
 Fix For: 0.92.0

 Attachments: HbaseBalancerTest2.java


 From our experience, cluster can be well balanced and yet, one table's 
 regions may be badly concentrated on few region servers.
 For example, one table has 839 regions (380 regions at time of table 
 creation) out of which 202 are on one server.
 It would be desirable for load balancer to distribute regions for specified 
 tables evenly across the cluster. Each of such tables has number of regions 
 many times the cluster size.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3373) Allow regions of specific table to be load-balanced

2011-01-28 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12988396#action_12988396
 ] 

Ted Yu commented on HBASE-3373:
---

@Matt:
The following code can be improved through randomization in case of empty tail 
to avoid clustering at consistentHashRing.firstKey()
{code}
regionHash = tail.isEmpty() ? consistentHashRing.firstKey() : tail.firstKey();
{code}


 Allow regions of specific table to be load-balanced
 ---

 Key: HBASE-3373
 URL: https://issues.apache.org/jira/browse/HBASE-3373
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.20.6
Reporter: Ted Yu
 Fix For: 0.92.0

 Attachments: HbaseBalancerTest2.java


 From our experience, cluster can be well balanced and yet, one table's 
 regions may be badly concentrated on few region servers.
 For example, one table has 839 regions (380 regions at time of table 
 creation) out of which 202 are on one server.
 It would be desirable for load balancer to distribute regions for specified 
 tables evenly across the cluster. Each of such tables has number of regions 
 many times the cluster size.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3373) Allow regions of specific table to be load-balanced

2011-01-01 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12976431#action_12976431
 ] 

Ted Yu commented on HBASE-3373:
---

Currently, regions offloaded from the most overloaded server would be assigned 
to most underloaded server first. When some regions are actively splitting on 
the most overloaded server, this arrangement is sub-optimal.
A better way is to round-robin assign regionsToMove over underloaded servers.

 Allow regions of specific table to be load-balanced
 ---

 Key: HBASE-3373
 URL: https://issues.apache.org/jira/browse/HBASE-3373
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.20.6
Reporter: Ted Yu
 Fix For: 0.90.1


 From our experience, cluster can be well balanced and yet, one table's 
 regions may be badly concentrated on few region servers.
 For example, one table has 839 regions (380 regions at time of table 
 creation) out of which 202 are on one server.
 It would be desirable for load balancer to distribute regions for specified 
 tables evenly across the cluster. Each of such tables has number of regions 
 many times the cluster size.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3373) Allow regions of specific table to be load-balanced

2010-12-20 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12973437#action_12973437
 ] 

Ted Yu commented on HBASE-3373:
---

List of regions for the table can be given to AssignmentManager so that the 
regions can be evenly distributed.
We need to consider the regions in the table that are actively splitting. These 
regions and their daughters would lead to imbalance after the above round-robin 
assignment.

 Allow regions of specific table to be load-balanced
 ---

 Key: HBASE-3373
 URL: https://issues.apache.org/jira/browse/HBASE-3373
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.20.6
Reporter: Ted Yu
 Fix For: 0.90.1


 From our experience, cluster can be well balanced and yet, one table's 
 regions may be badly concentrated on few region servers.
 For example, one table has 839 regions (380 regions at time of table 
 creation) out of which 202 are on one server.
 It would be desirable for load balancer to distribute regions for specified 
 tables evenly across the cluster. Each of such tables has number of regions 
 many times the cluster size.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3373) Allow regions of specific table to be load-balanced

2010-12-18 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972870#action_12972870
 ] 

Jonathan Gray commented on HBASE-3373:
--

On cluster startup in 0.90, regions are assigned in one of two ways.  By 
default, it will attempt to retain the previous assignment of the cluster.  The 
other option which I've also used is round-robin.  This will evenly distribute 
each table.

That plus the change to do round-robin on table create should probably cover 
per-table distribution fairly well.

I think the next step in the load balancer is a major effort to switch to 
something with more of a cost-based approach.  I think ideally you don't need 
even distribution of each table, you want even distribution of load.  If one 
hot table, it will get evenly balanced anyways.

One thing we could do is get rid of all random assignments and always try to do 
some kind of quick load balance or round-robin.  It does seem like randomness 
always leads to one guy who gets an unfair share :)

 Allow regions of specific table to be load-balanced
 ---

 Key: HBASE-3373
 URL: https://issues.apache.org/jira/browse/HBASE-3373
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.20.6
Reporter: Ted Yu
 Fix For: 0.90.1


 From our experience, cluster can be well balanced and yet, one table's 
 regions may be badly concentrated on few region servers.
 For example, one table has 839 regions (380 regions at time of table 
 creation) out of which 202 are on one server.
 It would be desirable for load balancer to distribute regions for specified 
 tables evenly across the cluster. Each of such tables has number of regions 
 many times the cluster size.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3373) Allow regions of specific table to be load-balanced

2010-12-18 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972908#action_12972908
 ] 

Ted Yu commented on HBASE-3373:
---

If a table is heavily written to, its regions split over relatively long period 
of time.
The new daughter regions may not have good distribution.

E.g. after a region server comes back online from crash, it takes time for 
balancer to assign regions from other servers. The new regions from above tend 
to get assigned to this region server.



 Allow regions of specific table to be load-balanced
 ---

 Key: HBASE-3373
 URL: https://issues.apache.org/jira/browse/HBASE-3373
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.20.6
Reporter: Ted Yu
 Fix For: 0.90.1


 From our experience, cluster can be well balanced and yet, one table's 
 regions may be badly concentrated on few region servers.
 For example, one table has 839 regions (380 regions at time of table 
 creation) out of which 202 are on one server.
 It would be desirable for load balancer to distribute regions for specified 
 tables evenly across the cluster. Each of such tables has number of regions 
 many times the cluster size.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.