This doesn't answer your question per se, but this is how we dealt with
load on HBase at Lithium. We power klout.com with HBase. On a nightly
basis, we load user profile data and Klout scores for approx. 600 million
users into HBase. We also do maintenance on HBase such as major compactions
on a regular basis. When either a load or maintenance is being performed,
site performance on klout.com used to degrade pretty severely. In order to
mitigate this, we stood up 2 HBase clusters and now power klout.com off
both. We run these in a custom built active/passive mode. The application
layer uses a zookeeper flag to connect to the active cluster and serves
from there. We load data or do maintenance on the passive, then flip the
clusters so repeat the load/maintenance on the previously active cluster.
This mechanism of active/passive systems has been working pretty well for
us. It does however require a significant cost in terms of maintaining 2
clusters.

On Sun Dec 07 2014 at 9:35:00 AM gomes <sankarm...@gmail.com> wrote:

> Currently some system cleaning tasks do read all the rows, and then perform
> some operations on that. It impacts other users who are being served at the
> same time. System cleaning tasks are of lower priority, and I can delay the
> requests, but I am just wondering if there is anyway I can hook into hbase
> system, and if I can continuously measure the load of the system, and based
> on that I can limit the lower priority tasks? How can I do that, if there
> are any pointers, or helpful suggestions, please provide them.
>
>
>
> --
> View this message in context: http://apache-hbase.679495.n3.
> nabble.com/Reduce-load-to-hbase-tp4066727.html
> Sent from the HBase User mailing list archive at Nabble.com.
>

Reply via email to