I would be interested in seeing the objectives, constraints,
inputs, etc. guiding this effort.  Depending on these, metascheduling
can be dead simple or impossible.

Kenneth

On Fri, Sep 23, 2016 at 11:00:57PM +0000, Shenoy, Gourav Ganesh wrote:
> My understanding is that we will provide Airavata the capability to run and 
> manage jobs across different clusters. Suresh can confirm that.
> 
> Thanks and Regards,
> Gourav Shenoy 
> 
> On 9/23/16, 4:47 PM, "K Yoshimoto" <kenn...@sdsc.edu> wrote:
> 
>     What do you mean by "meta-scheduler" here?  Are you trying to
>     coordinate running of jobs across or amongst a number of different
>     clusters?
>     
>     On Fri, Sep 23, 2016 at 08:43:19PM +0000, Shenoy, Gourav Ganesh wrote:
>     > Hi Dev,
>     > 
>     > I am working on this project of building a Mesos based meta-scheduler 
> for Airavata, along with Shameera & Mangirish. Here is the jira link: 
> https://issues.apache.org/jira/browse/AIRAVATA-2082.
>     > 
>     > 
>     > ·         We have identified some tasks that would be needed for 
> achieving this, and at the higher level it would consist of:
>     > 
>     > 1.       Resource provisioning – We need to provision resources on 
> cloud & hpc infrastructures such as EC2, Jetstream, Comet, etc.
>     > 
>     > 2.       Building a cluster – Deploying a Mesos cluster on set of nodes 
> obtained from (1) above for task management.
>     > 
>     > 3.       Selecting a scheduler – We need to investigate the scheduler 
> to use with Mesos cluster. Some of the options are Marathon, Aurora. But we 
> need to find one that suits our needs of running serial as well as parallel 
> (MPI) jobs.
>     > 
>     > 4.       Installing & running applications on this cluster – Once the 
> cluster has been deployed and a scheduler choice made, we need to be able to 
> install and run applications on this cluster using Airavata.
>     > 
>     > 
>     > ·         Until now we were able to look into the following:
>     > 
>     > o    Resource provisioning:
>     > 
>     > §  We explored several options of provisioning resources – using cloud 
> libraries as well as via ansible scripts.
>     > 
>     > §  We built a OpenStack4J Java module which would provision instances 
> on OpenStack based clouds (eg: Jetstream).
>     > 
>     > §  We also built a CloudBridge Python module for provisioning EC2 
> instances on Amazon. CloudBridge can also be used to provision instances on 
> OpenStack
>     > 
>     > §  We wrote Ansible scripts for bringing up instances on both AWS and 
> OpenStack based clouds.
>     > 
>     > 
>     > §  Key Points: CloudBridge, OpenStack4J are powerful libraries for 
> resource provisioning, but currently they do single-instance provisioning, 
> and not support templated boot options such as CloudFormation (for AWS) & 
> Heat (for OpenStack).
>     > 
>     > 
>     > o    Building a cluster:
>     > 
>     > §  We wrote Ansible script for deploying a Mesos-Marathon cluster on a 
> set of nodes. This script will install necessary dependencies such as 
> Zookeeper.
>     > 
>     > §  We tested this on OpenStack based clouds & on EC2.
>     > 
>     > §  OpenStack Magnum provides excellent support for doing resource 
> provisioning & deploying mesos cluster, but we are running into some problems 
> while trying it.
>     > 
>     > 
>     > o    Installing a scheduler:
>     > 
>     > §  Our Ansible script is currently installing Marathon as the scheduler 
> on Mesos. We haven’t yet submitted jobs using Marathon.
>     > 
>     > 
>     > ·         Although not finalized, but we are inclined towards using 
> Ansible approach for the above, as Ansible also provides Python APIs and 
> which will allow us to integrate it with Airavata via Thrift. Hence we will 
> be able to easily invoke the Ansible scripts from code without needing to use 
> the command-line interface.
>     > 
>     > 
>     > ·         We are also progressively working on some work-items such as:
>     > 
>     > o    Exploring options to provision and deploy a Mesos-Marathon cluster 
> on HPC systems such as Comet. The challenge would be to use Ansible to 
> provision resources and deploy the cluster. Once we have a cluster, we can 
> try running applications.
>     > 
>     > o    Exploring different scheduler options for running serial and 
> parallel (MPI) jobs on such heterogeneous clusters.
>     > 
>     > o    Exploring orchestration options such as OpenStack Heat, AWS 
> CloudFormation, OpenStack Magnum, etc.
>     > 
>     > Any suggestions and comments are highly appreciated.
>     > 
>     > Thanks and Regards,
>     > Gourav Shenoy
>     > 
>     > 
>     
> 

Reply via email to