Hello,

I have been wanting to write a proposal on using Apache Helix for cluster
management in HBase but wanted to hear some thoughts/feedback on whether
such an exercise would be useful.

I understand that Hbase has built its own cluster management solution but
integrating with Helix can provide additional benefits such as

   - Multiple replicas per region. Assigning roles to each replica such as
   primary/secondary.
   - Handle fail over - promote secondary to master.
   - Expansion- Redistributes the load when new nodes are added.

In terms of architecture, Helix fits well with HBase architecture. Helix
controller is similar to HBase Master. Similar to Hbase it uses Zookeeper
to store the cluster state. We have built in many optimizations to make the
fail over fast and reliable.

The core philosophy behind Helix as been to abstract out cluster management
from core functionality and treat cluster management as first class
citizen. This allows systems to benefit from features developed at Helix
such as flapping detection, non trivial fault detection, chaos monkey,
throttling movement of data etc.
Apache Helix is currently used to power the core back end infrastructure
(data store, search & pub/sub) components at LinkedIn and is in production
for more than a year managing more than 1k machines.

Appreciate feedback/thoughts on this topic.

thanks,
Kishore G


Additional info:
Helix - http://helix.incubator.apache.org
SOCC Paper <http://www.slideshare.net/KishoreGopalakrishna/helix-onecol>
Reading material: Systems that use Helix
Espresso <http://www.slideshare.net/amywtang/espresso-20952131>
Databus<https://915bbc94-a-62cb3a1a-s-sites.googlegroups.com/site/acm2012socc/s18-das.pdf>

Reply via email to