Dharmesh, see in-line

On Apr 30, 2013, at 5:34 AM, Dharmesh Kakadia <dhkaka...@gmail.com> wrote:

> Hi,
> 
> I am Dharmesh Kakdia and interested in project "Integration project to
> deploy and use Mesos on a CloudStack based cloud" (
> https://issues.apache.org/jira/browse/CLOUDSTACK-1784)
> 
> I am working on proposal and want to get feedback. Please provide
> suggestions :)
> 
> *
> 
> Abstract:
> 
> The project aims to bring cloudformation[1] like service to cloudstack. One
> of the prime use-case is cluster computing frameworks on cloudstack. A
> cloudformation service will give users and administrators of cloudstack
> ability to manage and control a set of resources easily. The cloudformation
> will allow booting and configuring a set of VMs and form a cluster. Simple
> example would be LAMP stack. More complex clusters such as mesos or hadoop
> cluster requires a little more advanced configuration. There is already
> some work done by Chiradeep Vittal at this front [5] using route and

it's using ruote: http://ruote.rubyforge.org

> sinatra. In this project, I will implement cloudformation service and
> demonstrate how to run mesos cluster using it.

You will create cloud formation templates that describe a mesos cluster

> 
> Mesos:
> 
> Mesos is a resource management platform for clusters [2]. It aims to
> increase resource utilization of clusters by sharing cluster resources
> among multiple processing frameworks(like MapReduce, MPI, Graph Processing)
> or multiple instances of same framework. It provides efficient resource
> isolation through use of containers. Uses zookeeper for state maintenance
> and fault tolerance.
> 
> What can run on mesos ?
> 
> Spark: A cluster computing framework based on the Resilient Distributed
> Datasets (RDDs) abstraction. RDD is more generalized than MapReduce and can
> support iterative and interactive computation while retaining fault
> tolerance, scalability, data locality etc.
> 
> Hadoop: Hadoop is fault tolerant and scalable distributed computing
> framework based on MapReduce abstraction.
> 
> Begel: A graph processing framework based on pregel.
> 
> and other frameworks like MPI, Hypertable.
> 
> How to deploy mesos
> 
> Mesos provides cluster installation scripts [7] for cluster deployment.
> There are also scripts available to deploy a cluster on Amazon EC2 [8].

It would be nice to see if these scripts can be used as is with the CloudStack 
EC2 service.

> 
> Deliverables:
> 
> 1. Cloudformation service implementation on cloudstack.
> 
> 2. Integration of cloudformation with cloudmonkey, CLI tool.

2. is a little confusing. I believe that what Chiradeep prototype runs on the 
client side. What is needed is a server side implementation.
That way we could use existing cloudformation cli tools to talk to it.
I don't understand where cloudmonkey comes into play. CloudMonkey is a cli for 
the CloudStack API. Unless you plan to integrate the cloudformation API 
directly in the cloudstack source code, the integration you propose is not 
clear to me.


> 
> 2. Proof of concept of running mesos on top of cloudstack using the service.
> 
> 3. Related documentation.
> 
> Architecture and Tools:
> 
> The high level architecture I propose is as follows:
> 
>  It includes following components:
> 
> 1. CloudFormation ReST server:
> 
> This acts as a point of contact to and exposes CloudFormation functionality
> as ReST service.

I believe CloudFormation is really a Query API.

> This can be accessed directly or through cloudmonkey. I
> will add those functionalities in cloudmonkey. I plan to use dropwizard [3]
> to start with. Later may be the API server can be merged with management
> server. I plan to use mysql for storing details of clusters.

At first, you could do a prototype that is decoupled from CloudStack. You need 
to clarify the integration with CloudMonkey.

> 
> 2. Provisioning:
> 
> Provisioning module is responsible for handling the booting process of the
> VMs through cloudstack. This uses the cloudstack APIs for launching VMs. I
> plan to use preconfigured templates/images with required dependencies
> installed, which will make cluster creation process much faster even for
> large clusters. Error handling is very important part of this module. For
> example, what you do if few VMs fail to boot in cluster ?
> 
> 3. Configuration:
> 
> This module deals with configuring the VMs to form a cluster. This can be
> done via manual scripts/code or via configuration management tools like
> chef. I plan to use workflow automation tools like rundeck [4].

knife-cloudstack provides chef/cloudstack provisioning. You may want to have a 
look at this.
I would prefer seeing chef or puppet recipes for Mesos (they probably already 
exist), rather than rundeck.
However if you do want to use rundeck, check the Apache incubator project: 
Provisionr , I know they use it to provision Hadoop.

> 
> In general, I want to use tools around java as much as possible as
> cloudstack is mostly in java. This will make the project easier to maintain
> and develop.
> 
> Why ReST ?
> 
> I believe decoupling provided by the ReST architecture makes it easy to
> extend in future.  Say for example, if one wants to extend the
> cloudformation service to include features like auto-scaling of clusters
> based on some user criteria (rule-based/monitoring etc).
> 

We need to clarify why you want a REST service. If I understand correctly you 
want to provide a server side implementation of what Chiradeep has started with 
stackmate. However I believe that the CloudFormation is really a Query API, so 
REST may not be needed and could provide interoperability issues with existing 
CloudFormation tools. Also we need to clarify how you plan to integrate with 
cloudmonkey.

If you integrate your CloudFormation API tightly with the mgt server then 
cloudmonkey will be able to discover them automatically, but otherwise I don't 
see the link.

It might be easier to create a server side implementation of stackmate (and we 
would need Chiradeep input on that, I cc him), then create mesos cluster CF 
templates. This server would talk directly to an unmodified CloudStack mgt 
server.


thanks, this is very exciting.

-sebastien

> Services:
> 
> 1. POST : create a cluster
> 
>   -
> 
>      accepts : cluster configuration json
>      -
> 
>      produces : clusterId
> 
> 2. GET : get the current status of request
> 
>   -
> 
>      accepts : clusterId
>      -
> 
>      produces : json describing current status if the cluster.
> 
> 3. DELETE : remove a cluster
> 
>   -
> 
>      accepts : clusterId
>      -
> 
>      produces : result (sucess/failure)
> 
> 4. UPDATE : adding a node to a cluster
> 
>   -
> 
>      accepts : cluster configuration json and clusterId
>      -
> 
>      produces : result (sucess/failure)
> 
> 
> Timeline:
> 
> 1-1.5 week : project design. Architecture, tools selection, API design.
> 
> 1-1.5 week : getting familiar with cloudstack codebase and architecture
> details.
> 
> 1-1.5 week : getting familiar with mesos internals.
> 
> 1-1.5 week : setting up the dev environment
> 
> 2-3 week : build provisioning and configuration module
> 
> Midterm evaluation: provisioning module, configuration module
> 
> 1-2 week : develope ReST server
> 
> 2-3 week : test and integrate
> 
> About me:
> 
> I am MS by Research student at International Institute of Information
> Technology Hyderabad (IIIT-H), Hyderabad, India. I operate our small lab
> cluster operating on Openstack and I am working on a similar project,
> HadoopStack [6], which aims to bring data processing to a multi-cloud
> environment (work in progress). My area of research is scheduling in large
> scale distributed systems. I have experience with related tools like
> Hadoop, Mesos, OpenStack, Chef, Ironfan and jClouds.
> 
> Email-contact : dhkaka...@gmail.com
> 
> More info: http://researchweb.iiit.ac.in/~dharmesh.kakadia/
> 
> Why me ?
> 
> I love open-source projects. I am fascinated by distributed computing and
> interested in building and optimizing large scale systems and data
> processing frameworks.
> 
> References
> 
> [1] http://aws.amazon.com/cloudformation/
> 
> [2] http://incubator.apache.org/mesos/
> 
> [3] http://dropwizard.codahale.com/
> 
> [4] http://rundeck.org/
> 
> [5] https://github.com/chiradeep/stackmate
> 
> [6] http://siel-iiith.github.io/HadoopStack/
> 
> [7] https://github.com/apache/mesos/blob/trunk/docs/Deploy-Scripts.textile
> 
> [8] https://github.com/apache/mesos/blob/trunk/docs/EC2-Scripts.textile
> **
> 
> In case you are having trouble in reading, google docs of above is here :
> 
> https://docs.google.com/document/d/1ocoBmyHDtOVnBhCELVt1QcgkubSzCyksls2MCTuDPL0
> 
> *

Reply via email to