Hey this is pretty cool, I've been wanting something like that for a while. It's a little bit rough but the idea is there. Ultimately, I hope HBase will be able to provide fine-grained metrics on a per-region basis and use that to do load balancing.
The problem right now is that load balancing is very costly for clients, because it takes too long to move a region around in a real cluster. The region stays down for several seconds (!). This is really disruptive for high-throughput low-latency user-facing applications. So until we make the region migration process more seamless, we can't really have a very aggressive / proactive load balancer. The other day I suggested an idea to Stack to change the region migration process with virtually zero downtime. It basically involves telling the source region server where the region is going to land, and telling the destination region server to prepare to receive the region. The source RS would do a first flush and remember the point at which it is (there's a generation ID or something already, whatever is needed for ACID). The source RS would send an RPC to the destination RS to tell it to start loading whatever was flushed and then it would replicate all the edits to the destination RS. Once both RS are in sync, the source RS would block requests to the region (by locking it), tell the destination RS about it, and after getting the final ACK from it, would update META and send a special NSRE to all the clients of the blocked requests. The special NSRE would basically just be like a normal NSRE but it would also say "hint: I think the region is now on that RS". From the clients' point of view, the region downtime would be pretty minimal (almost unnoticeable). Also, this scheme allows for opportunities such as warming up the block cache in the destination RS before it starts serving. -- Benoit "tsuna" Sigoure Software Engineer @ www.StumbleUpon.com
