Hello Devs, Thanks Gourav and Shameera for all the work w.r.t. setting up the Mesos-Marathon cluster on Jetstream.
I am currently evaluating MPICH (http://www.mpich.org/about/overview/) to be used for launching MPI jobs on top of mesos. MPICH version 1.2 supports Mesos based MPI scheduling. I have been also trying to submit jobs to the cluster through Marathon. However, in either cases I am currently facing issues which I am working to get resolved. I am compiling my notes into the following google doc. You may please review and let me know your comments, suggestions. https://docs.google.com/document/d/1p_Y4Zd4I4lgt264IHspXJli3la25y6bcPcmrTD6nR8g/edit?usp=sharing Thanks and Regards, Mangirish Wagle On Wed, Sep 21, 2016 at 3:20 PM, Shenoy, Gourav Ganesh <goshe...@indiana.edu > wrote: > Hi Mangirish, > > > > I have set up a Mesos-Marathon cluster for you on Jetstream. I will share > with you with the cluster details in a separate email. Kindly note that > there are 3 masters & 2 slaves in this cluster. > > > > I am also working on automating this process for Jetstream (similar to > Shameera’s ansible script for EC2) and when that is ready, we can create > clusters or add/remove slave machines from the cluster. > > > > Thanks and Regards, > > Gourav Shenoy > > > > *From: *Mangirish Wagle <vaglomangir...@gmail.com> > *Reply-To: *"dev@airavata.apache.org" <dev@airavata.apache.org> > *Date: *Wednesday, September 21, 2016 at 2:36 PM > *To: *"dev@airavata.apache.org" <dev@airavata.apache.org> > *Subject: *Running MPI jobs on Mesos based clusters > > > > Hello All, > > > > I would like to post for everybody's awareness about the study that I am > undertaking this fall, i.e. to evaluate various different frameworks that > would facilitate MPI jobs on Mesos based clusters for Apache Airavata. > > > > Some of the options that I am looking at are:- > > 1. MPI support framework bundled with Mesos > 2. Apache Aurora > 3. Marathon > 4. Chronos > > Some of the evaluation criteria that I am planning to base my > investigation are:- > > - Ease of setup > - Documentation > - Reliability features like HA > - Scaling and Fault recovery > - Performance > - Community Support > > Gourav and Shameera are working on ansible based automation to spin up a > mesos based cluster and I am planning to use it to setup a cluster for > experimentation. > > > > Any suggestions or information about prior work on this would be highly > appreciated. > > > > Thank you. > > > > Best Regards, > > Mangirish Wagle > >