Re: [PROPOSAL] Whirr Project

Leo Simons Thu, 22 Apr 2010 15:47:21 -0700

On 4/15/10 10:42 PM, Tom White wrote:

I would like to propose Whirr as an incubator proposal.


Whirr will be a set of libraries for running cloud services, such as
Hadoop or Cassandra. The initial code (for Hadoop) is hosted as a
Hadoop contrib module, but I believe it would flourish as its own
project with its own community.

The proposal is on the incubator wiki at
http://wiki.apache.org/incubator/WhirrProposal.

...and pasted inline below (as is customary). The proposal looks fine tome. Like you mention your initial group of committers is a bit smallwhich is a risk but hey, cloud is hot, go build community :)

You do know any ASF member can sign up to be an incubator mentor, right?If I count correctly you have two on your list :)


cheers,

Leo

--
= Whirr, a library of cloud services =

== Abstract ==
Whirr will be a set of libraries for running cloud services.

== Proposal ==

Whirr will provide code for running a variety of software services oncloud infrastructure. It will provide bindings in several languages(e.g. Python and Java) for popular cloud providers to make it easy tostart and stop services like Hadoop clusters. The project will not belimited to a particular set of services, rather it will be expected thata range of services are developed, as determined by the projectcontributors. Possible services include Hadoop, HBase, !ZooKeeper,Cassandra.


== Background ==

The ability to run services on cloud providers is very useful,particularly for proofs of concept, testing, and also ad hoc productionwork. Bringing up clusters in the cloud is non-trivial, since carefulchoreography is required. (Designing an interface that is convenient aswell as secure is also a challenge in a cloud context.) Making servicesthat runs on a variety of cloud providers is harder, even with theavailability of libraries like libcloud and jclouds, since eachplatform's quirks and extra features must be considered (and eitherworked around, or possibly taken advantage of, as appropriate) . Whirrwill facilitate sharing of best practices, both for a particular service(such as Hadoop configuration on a particular provider), and for commoncloud operations (such as installation of dependencies across cloudproviders). It will provide a space to share good configurations andwill encode service-specific knowledge.


== Rationale ==

There are already scripts in the Hadoop project that allow users to runHadoop clusters on Amazon EC2 and other cloud providers. While usershave found these scripts useful, their current home as a Hadoop Commoncontrib project has the following limitations:* Tying the scripts' release cycle to Hadoop's means that it isdifficult to distribute updates to the scripts which are changing fast(new features and bugfixes).* The scripts support multiple versions of Hadoop, so it makes moresense to distribute them separately from Hadoop itself.* They are general: people want to contribute code for non-Hadoopservices like Cassandra (for example:http://github.com/johanoskarsson/cassandra-ec2).* Having a uniform approach to running services in the cloud, hostedin one project, makes launching sets of complementary services easierfor the user. Today, the scripts and libraries hosted within eachproject (e.g. in Hadoop, HBase, Cassandra) have slightly differentconventions and semantics, and are likely to diverge over time. Buildinga community around cloud infrastructure services will help enforce acommon approach to running services in the cloud.


== Initial Goals ==
 * Provide a new home for the existing Hadoop cloud scripts.
 * Add more services (e.g. HBase)
 * Develop Java libraries for Hadoop clusters
 * Add new cloud providers by taking advantage of libcloud and jclouds.

* (Future) Run on own hardware, so users can take advantage of thesame interface to control services running locally or in the cloud.


== Current Status ==
=== Meritocracy ===

The Hadoop scripts were originally created by Tom White, and have had asubstantial number of contributions from members of the Hadoopcommunity. By becoming its own project, significant contributors toWhirr would become committers, and allow the project to grow.


=== Community ===

The community interested in cloud service infrastructure is currentlyspread across many smaller projects, and one of the main goals of thisproject is to build a vibrant community to share best practices andbuild common infrastructure. For example, this project would provide ahome to facilitate collaboration between the groups of Hadoop and HBasedevelopers who are building cloud services.


=== Core developers ===

Tom White wrote most of the original code and is familiar with opensource and Apache-style development, being a Hadoop committer and an ASFmember. There have been a number of contributors who have providedpatches to these scripts over time. Andrew Purtell who created the HBasecloud scripts is a HBase committer. Johan Oskarsson (Hadoop andCassandra committer) ported the scripts to Cassandra.


=== Alignment ===

Whirr complements libcloud, currently in the Incubator. Libcloudprovides multi-cloud provider support, while Whirr will providemulti-service support in the cloud. Whirr will build cloud componentsfor several Apache projects, such as Hadoop, HBase, !ZooKeeper,Cassandra, and hopefully more.


== Known Risks ==
=== Orphaned products ===

There is a risk that Whirr will not gain adoption. However, the currentHadoop scripts seem to be fairly widely used. The small number ofinitial committers is also a risk, although by starting the project itis expected that new contributors will quickly be attracted to theproject and help it grow.


=== Inexperience with Open Source ===

The initial code comes from Hadoop where it was developed in anopen-source, collaborative way. All the initial committers arecommitters on other Apache projects, and are experienced in working withnew contributors.


=== Homogenous Developers ===

The initial set of committers is from a diverse set of organizations,and geographic locations. They are all experienced with developing in adistributed environment.


=== Reliance on Salaried Developers ===

It is expected that Whirr will be developed on salaried and volunteertime, although all of the initial developers will work on it mainly onsalaried time.


=== Relationships with Other Apache Products ===

Whirr will depend on many other Apache Projects as already mentionedabove (e.g. Hadoop, !ZooKeeper). If the project develops some commoninfrastructure then it is possible that it becomes a dependency on aproject that wishes to use that infrastructure for running in the cloud.


=== A Excessive Fascination with the Apache Brand ===

We think that Whirr will benefit from the community sharing ideas andbest practices for running cloud services. The ASF does a great job atbuilding communities, which is why we want to build Whirr at Apache.


== Documentation ==
Information on the current scripts and general background can be found at
 * http://wiki.apache.org/hadoop/AmazonEC2
 * http://archive.cloudera.com/docs/ec2.html
 * http://hbase.s3.amazonaws.com/hbase/HBase-EC2-HUG9.pdf
 * http://www.slideshare.net/steve_l/new-roles-for-the-cloud

== Initial Source ==
http://svn.apache.org/viewvc/hadoop/common/trunk/src/contrib/cloud/

== Source and Intellectual Property Submission Plan ==

The initial source is already in an Apache project's SVN repository(Hadoop), so there should be no action required here.


== External Dependencies ==

The existing external dependencies all have Apache compatible licenses:boto (MIT), libcloud (Apache 2.0), simplejson (MIT). Jclouds is not adependency of the current source, but it is Apache 2.0 licensed, so itwill be possible to use it in the future if required.


== Cryptography ==
Whirr uses standard APIs and tools for SSH and SSL.

== Required Resources ==
=== Mailing lists ===
 * whirr-private (with moderated subscriptions)
 * whirr-dev
 * whirr-commits
 * whirr-user

=== Subversion Directory ===
 * https://svn.apache.org/repos/asf/incubator/whirr

=== Issue Tracking ===
 * JIRA Whirr (WHIRR)

=== Other Resources ===

The existing code already has unit and integration tests so we wouldlike a Hudson instance to run them whenever a new patch is submitted.This can be added after project creation.


== Initial Committers ==
 * Tom White (tomwhite at apache dot org)
 * Andrew Purtell (apurtell at apache dot org)
 * Johan Oskarsson (johan at apache dot org)
 * Steve Loughran (stevel at apache dot org)

== Affiliations ==
 * Tom White, Cloudera
 * Andrew Purtell, Trend Micro
 * Johan Oskarsson, Twitter
 * Steve Loughran, HP Labs


== Sponsors ==
=== Champion ===
 * Tom White

=== Nominated Mentors ===
 * TBD

=== Sponsoring Entity ===
 * Incubator PMC

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org

Re: [PROPOSAL] Whirr Project

Reply via email to