Hi,

I've implemented a Kubernetes operator
<https://coreos.com/blog/introducing-operators.html> for running OpenMPI
jobs there. An operator is just a Custom Resource Definition (CRD) for a
new type of mpi-aware batch job, and a custom controller that manages the
states of those jobs. Unfortunately the code can't be open sourced just
yet, but I've put together a demo that roughly matches the current design (
https://github.com/rongou/k8s-openmpi).

The basic steps are:

   - Create a pair of one-time-use ssh keys and store them as a Kubernetes
   secret.
   - Launch a set of worker pods (in this context, a pod is just a
   container), and start sshd on them.
   - Find the IP addresses of the worker pods (OpenMPI doesn't seem to like
   Kubernetes' DNS names "foo.bar.my-namespace.svc.cluster.local").
   - Pass the list of worker IPs to a launcher job that invokes `mpirun`.

This seems to work pretty well. I was able to run the tensorflow benchmark
on 32 GPUs across 4 nodes, on both AWS and an on-prem cluster.

Although I don't think it's an issue, some people are nervous about running
sshd in a container in a shared cluster environment. So I have two
questions:

   - From your operational experience, how do you deal with security
   concerns with regard to allowing ssh traffic?
   - If we want to do this without using ssh, what's the best option?
   Kubernetes has a remote execution api similar to `docker exec`, would it be
   possible to launch all the worker pods first, and exec some command on all
   of them? Similar to what SLURM does?

I'm pretty new to OpenMPI and haven't dug around the codebase much yet, so
I'm at a bit of a loss on how to proceed. Any pointer would be greatly
appreciated!

Thanks,

Rong
_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Reply via email to