Justin,

IOF stands for Input/Output (aka I/O) Forwarding

here is a very high level overview of a quite simple case.
on host A, you run
mpirun -host B,C -np 2 a.out
without any batch manager and TCP interconnect

first, mpirun will fork&exec
ssh B orted ...
ssh C orted ...

the orted daemons will first connect back to mpirun, using TCP and ip/port passed on the orted command line.

then the orted daemons will fork&exec a.out
a.out will contact its parent orted (iirc, TCP on v1.10 and Unix socket from v2.x) via ip/port of port from the environment when a.out want to communicate, they will connect to the remote a.out via TCP using ip/port obtained from orted.

from a.out point of view :
- stdin is either a pipe to orted or /dev/null
- stdout is a pty with orted on the other side
- stderr is a pipe to orted

this is basically what happens in a quite simple case,
back to your question, mpi_hello.so does not contact mpirun.
orted.so contacts mpirun, and mpi_hello.so contacts orted.so,
and then mpi_hello.so contact other mpi_hello.so


note it is also possible to use direct launch (SLURM or cray alps can do that)
instead of running
mpirun a.out
you simply do
srun a.out (or aprun a.out)
in the case of slurm (i am not sure about alps) there is no orted daemons involved. instead of contacting its orted, a.out contact the slurm daemons (slurmd) so it can exchange information with remote a.out and figure out how to contact them. direct launch does not support dynamic process creation (MPI_Comm_spawn and friends)


you can run
ompi_info --all
to list all the parameters.
and then you can do
mpirun --mca <name> <value> ...
to modify a parameter (such as timeout)

that being said, i do not think that should be needed ... just make sure there is no firewall running on your system, and you should be fine. if some hosts have several interfaces, you can restrict to the one that should work (e.g. eth0) with
mpirun --mca oob_tcp_if_include eth0 --mca btl_tcp_if_include eth0 ...


i hope this helps

Gilles


On 10/16/2015 2:59 AM, Justin Cinkelj wrote:
I'm trying to run OpenMPI in OSv container
(https://github.com/cloudius-systems/osv). It's a single process, single
address space VM, without fork, exec, openpty function. With some
butchering of OSv and OpenMPI I was able to compile orted.so, and run it
inside OSv via mpirun (mpirun is on remote machine). The orted.so loads
mpi_hello.so and executes its main() in new pthread.

Which than aborts due to communication failure/timeout - as reported by
mpirun. I assume that that mpi_hello.so should connect back to mpirun,
and report 'something' about itself. What could that be?
Plus, where could I extend that timeout period - once mpirun closes,
output from opal_output is not shown any more.

Is there some highlevel overview about OpenMPI, how are modules
connected, what is 'startup' sequence etc?
ompi_info lists compiled modules, but I still don't know how are they
connected.

So basically - I lack knowledge of OpenMPI internals, and would highly
appreciate links for "rookie" developers.
Say https://github.com/open-mpi/ompi/wiki/IOFDesign tells me what IOF
is, and a bit about its working. So, if someone has any list of such
links - could it be shared?

_______________________________________________
devel mailing list
de...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: 
http://www.open-mpi.org/community/lists/devel/2015/10/18181.php


Reply via email to