+1. This would be a great addition as a sub project for zookeeper and since its 
been used extensively in production at yahoo! I'd bet folks would be interested 
in deploying this in production setups.

Thanks
mahadev


On 1/11/11 3:42 PM, "Fournier, Camille F. [Tech]" <camille.fourn...@gs.com> 
wrote:

Is the code somewhere we can look at it right now?

C

-----Original Message-----
From: Avery Ching [mailto:ach...@yahoo-inc.com]
Sent: Tuesday, January 11, 2011 2:02 PM
To: dev@zookeeper.apache.org
Subject: Discussion - Clusterlib as a subproject for ZooKeeper

Hello,

We have been working on Clusterlib at Yahoo! and would like to contribute it as 
a subproject to ZooKeeper.  Clusterlib was developed as a next-generation 
platform for creating/coordinating search applications/services (including 
crawling, processing, indexing, and front end) at Yahoo!.  We suspect much of 
this work will be useful for others trying to build up large-scale/distributed 
applications that would like to coordinate and share the same semantics.

Here is a (relatively) short summary of why Clusterlib was developed:

Large-scale distributed applications are difficult and time-consuming to 
develop since a great deal of effort is spent solving the same
challenges (consistency, fault-tolerance, naming problems, etc.).  
Additionally, coordinating these applications is typically ad-hoc and
hard to maintain.  Clusterlib fills the gap by providing distributed 
application developers with an object-oriented data model,
asynchronous event handling system, well-defined consistency semantics, and 
methods for making coordination easy across
cooperating applications.  Some example applications might include a search 
engine, scalable file system, large-scale data cache, etc.

Clusterlib is a middleware library for building distributed applications. It 
was designed to simplify the job of application developers and provides a set 
of distributed objects that all inherit from the same Notifyable interface. The 
set of distributed objects includes: Root, Application, Group, 
DataDistribution, Node, ProcessSlot, PropertyList, and Queue. In order to give 
context, each object is described briefly.

* Root is a point-of-entry object at the top of the hierarchy in Clusterlib and 
manages its Applications. There is only one Root per Clusterlib instance.
* Applications are used as a namespace for managing Groups, Nodes, 
DataDistributions, Queues, and PropertyLists in a user-defined application. 
Using the application concept (as opposed to only having groups) makes 
accessing another Application's child objects explicit to developers.
* Groups are a logical association of Clusterlib objects that can be nested. 
Since large-scale applications often require hundreds or thousands of nodes to 
operate, there might a "node" Group that has an "alive" child Group and a 
"dead" child Group that are each populated with their respective sets of nodes.
* DataDistributions balance load and data across a set of objects. 
DataDistributions provide user-extensible key hashing to variable-sized hash 
ranges for user flexibility.
* Nodes typically represent a physical or virtual node in an application. It 
has child ProcessSlots that can be used to reserve system resources.
* ProcessSlots maintain an actual process running locally on the physical 
machine. It can also contain other information about the process, such as a PID 
or port array.
* PropertyLists may be created and maintained as a child of any Notifyable 
object. It is basically a key-value storage that can, for instance, be used to 
determine how long a timeout would be on a particular server or the number of 
retries to allow before giving up. PropertyLists are leafs in the Clusterlib 
hierarchy and cannot have any children.
* Queues are distributed FIFO queues. They can be used to synchronize threads, 
pass messages between threads, and for JSON-RPC.

Clusterlib objects are composed in a hierarchy and maintain ACID compliance. 
Distributed, non-blocking, fault-tolerant locks can be acquired on any 
Clusterlib object and asynchronous event handlers can be registered for 
object-specific changes. For example, if a ProcessSlot changed, an asynchronous 
event handler might check to see if the process is still running and if not, 
try to restart it. There are 3 types of Clusterlib-defined locks (child, 
notifyable, and ownership). Clusterlib internally uses a child lock on a parent 
object to access child objects, however users may also use this lock if 
desired. A notifyable lock is intended as a general-purpose lock on a 
Notifyable. Finally, ownership locks are intended to express concepts suchs as 
"leadership" in a Group or "reservation" of a Node. In order to allow more 
parallelism, Clusterlib locks can be accessed in shared or exclusive modes.

Since Clusterlib relies upon Zookeeper as a fault-tolerant, consensus service, 
it inherits many of its performance and fault-tolerance properties. As the 
number of Zookeeper servers increases, read performance scales up nearly 
linearly, however write performance scales inversely due to Zookeeper's 
internal atomic broadcast protocol. As long as the number of correctly 
functioning Zookeeper servers maintains a quorum, Zookeeper can continue to 
operate. The same is true for Clusterlib applications. The locks and leadership 
election algorithms in Clusterlib are fault-tolerant to client failure due to 
the use of Zookeeper ephemeral nodes.

In addition to being a library, Clusterlib comes with a http server to 
viewing/manipulating Clusterlib objects and/or ZooKeeper znodes directly.  I've 
linked some PNGs to illustrate this.  It also is bundled with a CLI that is 
extensible.  We have also developed a suite of over 90 unittests that simulate 
distributed event ordering using MPI to test for many of those hard-to-find 
distributed bugs.  It's been tested to build on flavors of Redhat Linux, Ubuntu 
Linux, and OSX.

We would like to see it as a subproject of ZooKeeper because its tightly 
integrated with ZooKeeper. What do folks think about Clusterlib as a subproject 
of ZooKeeper?

Thanks,

Avery

Clusterlib-UI snapshot link
http://users.eecs.northwestern.edu/~aching/clusterlib-ui.png

ZooKeeper-UI snapshot link
http://users.eecs.northwestern.edu/~aching/zookeeper-ui.png



Reply via email to