Thank you. At least its clear now that for the immediate problem I have
to look at IOF code.


On 16. 10. 2015 03:32, Gilles Gouaillardet wrote:
> Justin,
>
> IOF stands for Input/Output (aka I/O) Forwarding
>
> here is a very high level overview of a quite simple case.
> on host A, you run
> mpirun -host B,C -np 2 a.out
> without any batch manager and TCP interconnect
>
> first, mpirun will fork&exec
> ssh B orted ...
> ssh C orted ...
>
> the orted daemons will first connect back to mpirun, using TCP and
> ip/port passed on the orted command line.
>
> then the orted daemons will fork&exec a.out
> a.out will contact its parent orted (iirc, TCP on v1.10 and Unix
> socket from v2.x) via ip/port of port from the environment
> when a.out want to communicate, they will connect to the remote a.out
> via TCP using ip/port obtained from orted.
>
> from a.out point of view :
> - stdin is either a pipe to orted or /dev/null
> - stdout is a pty with orted on the other side
> - stderr is a pipe to orted
>
> this is basically what happens in a quite simple case,
> back to your question, mpi_hello.so does not contact mpirun.
> orted.so contacts mpirun, and mpi_hello.so contacts orted.so,
> and then mpi_hello.so contact other mpi_hello.so
>
>
> note it is also possible to use direct launch (SLURM or cray alps can
> do that)
> instead of running
> mpirun a.out
> you simply do
> srun a.out (or aprun a.out)
> in the case of slurm (i am not sure about alps) there is no orted
> daemons involved.
> instead of contacting its orted, a.out contact the slurm daemons
> (slurmd) so it can exchange information with remote a.out and figure
> out how to contact them.
> direct launch does not support dynamic process creation
> (MPI_Comm_spawn and friends)
>
>
> you can run
> ompi_info --all
> to list all the parameters.
> and then you can do
> mpirun --mca <name> <value> ...
> to modify a parameter (such as timeout)
>
> that being said, i do not think that should be needed ... just make
> sure there is no firewall running on your system, and you should be fine.
> if some hosts have several interfaces, you can restrict to the one
> that should work (e.g. eth0) with
> mpirun --mca oob_tcp_if_include eth0 --mca btl_tcp_if_include eth0 ...
>
>
> i hope this helps
>
> Gilles
>
>
> On 10/16/2015 2:59 AM, Justin Cinkelj wrote:
>> I'm trying to run OpenMPI in OSv container
>> (https://github.com/cloudius-systems/osv). It's a single process, single
>> address space VM, without fork, exec, openpty function. With some
>> butchering of OSv and OpenMPI I was able to compile orted.so, and run it
>> inside OSv via mpirun (mpirun is on remote machine). The orted.so loads
>> mpi_hello.so and executes its main() in new pthread.
>>
>> Which than aborts due to communication failure/timeout - as reported by
>> mpirun. I assume that that mpi_hello.so should connect back to mpirun,
>> and report 'something' about itself. What could that be?
>> Plus, where could I extend that timeout period - once mpirun closes,
>> output from opal_output is not shown any more.
>>
>> Is there some highlevel overview about OpenMPI, how are modules
>> connected, what is 'startup' sequence etc?
>> ompi_info lists compiled modules, but I still don't know how are they
>> connected.
>>
>> So basically - I lack knowledge of OpenMPI internals, and would highly
>> appreciate links for "rookie" developers.
>> Say https://github.com/open-mpi/ompi/wiki/IOFDesign tells me what IOF
>> is, and a bit about its working. So, if someone has any list of such
>> links - could it be shared?
>>
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2015/10/18181.php
>>
>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2015/10/18189.php

Reply via email to