One suggestion: this approach requires that the job be executed using “mpirun”. 
Another approach would be to integrate PMIx into Kubernetes, thus allowing any 
job to call MPI_Init regardless of how it was started. The advantage would be 
that it enables the use of MPI by workflow-based applications that really 
aren’t supported by mpirun and require their own application manager.

See https://pmix.org <https://pmix.org/> for more info

Ralph


> On May 24, 2018, at 9:02 PM, Rong Ou <rong...@gmail.com> wrote:
> 
> Hi guys,
> 
> Thanks for all the suggestions! It's been a while but we finally got it 
> approved for open sourcing. I've submitted a proposal to kubeflow: 
> https://github.com/kubeflow/community/blob/master/proposals/mpi-operator-proposal.md
>  
> <https://github.com/kubeflow/community/blob/master/proposals/mpi-operator-proposal.md>.
>  In this version we've managed to not use ssh, relying on `kubectl exec` 
> instead. It's still pretty "ghetto", but at least we've managed to train some 
> tensorflow models with it. :) Please take a look and let me know what you 
> think.
> 
> Thanks,
> 
> Rong
> 
> On Fri, Mar 16, 2018 at 11:38 AM r...@open-mpi.org <mailto:r...@open-mpi.org> 
> <r...@open-mpi.org <mailto:r...@open-mpi.org>> wrote:
> I haven’t really spent any time with Kubernetes, but it seems to me you could 
> just write a Kubernetes plm (and maybe an odls) component and bypass the ssh 
> stuff completely given that you say there is a launcher API.
> 
> > On Mar 16, 2018, at 11:02 AM, Jeff Squyres (jsquyres) <jsquy...@cisco.com 
> > <mailto:jsquy...@cisco.com>> wrote:
> > 
> > On Mar 16, 2018, at 10:01 AM, Gilles Gouaillardet 
> > <gilles.gouaillar...@gmail.com <mailto:gilles.gouaillar...@gmail.com>> 
> > wrote:
> >> 
> >> By default, Open MPI uses the rsh PLM in order to start a job.
> > 
> > To clarify one thing here: the name of our plugin is "rsh" for historical 
> > reasons, but it defaults to looking to looking for "ssh" first.  If it 
> > finds ssh, it uses it.  Otherwise, it tries to find rsh and use that.
> > 
> > -- 
> > Jeff Squyres
> > jsquy...@cisco.com <mailto:jsquy...@cisco.com>
> > 
> > _______________________________________________
> > devel mailing list
> > devel@lists.open-mpi.org <mailto:devel@lists.open-mpi.org>
> > https://lists.open-mpi.org/mailman/listinfo/devel 
> > <https://lists.open-mpi.org/mailman/listinfo/devel>
> 
> _______________________________________________
> devel mailing list
> devel@lists.open-mpi.org <mailto:devel@lists.open-mpi.org>
> https://lists.open-mpi.org/mailman/listinfo/devel 
> <https://lists.open-mpi.org/mailman/listinfo/devel>_______________________________________________
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel

_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Reply via email to