This doesn't answer your question per se, but this is how we dealt with load on HBase at Lithium. We power klout.com with HBase. On a nightly basis, we load user profile data and Klout scores for approx. 600 million users into HBase. We also do maintenance on HBase such as major compactions on a regular basis. When either a load or maintenance is being performed, site performance on klout.com used to degrade pretty severely. In order to mitigate this, we stood up 2 HBase clusters and now power klout.com off both. We run these in a custom built active/passive mode. The application layer uses a zookeeper flag to connect to the active cluster and serves from there. We load data or do maintenance on the passive, then flip the clusters so repeat the load/maintenance on the previously active cluster. This mechanism of active/passive systems has been working pretty well for us. It does however require a significant cost in terms of maintaining 2 clusters.
On Sun Dec 07 2014 at 9:35:00 AM gomes <sankarm...@gmail.com> wrote: > Currently some system cleaning tasks do read all the rows, and then perform > some operations on that. It impacts other users who are being served at the > same time. System cleaning tasks are of lower priority, and I can delay the > requests, but I am just wondering if there is anyway I can hook into hbase > system, and if I can continuously measure the load of the system, and based > on that I can limit the lower priority tasks? How can I do that, if there > are any pointers, or helpful suggestions, please provide them. > > > > -- > View this message in context: http://apache-hbase.679495.n3. > nabble.com/Reduce-load-to-hbase-tp4066727.html > Sent from the HBase User mailing list archive at Nabble.com. >