So "remote spawn" and children refer to orted daemons only, and I was looking 
into wrong modules.

Which module(s) are then responsible to send command to orted to start mpi 
application?
Which event names should I search for?

Thank you,
Justin

----- Original Message -----
> From: r...@open-mpi.org
> To: "OpenMPI Devel" <devel@lists.open-mpi.org>
> Sent: Wednesday, May 3, 2017 3:29:16 PM
> Subject: Re: [OMPI devel] remote spawn - have no children
> 
> I should have looked more closely as you already have the routed verbose
> output there. Everything in fact looks correct. The node with mpirun has 1
> child, which is the daemon on the other node. The vpid=1 daemon on node 250
> doesn’t have any children as there aren’t any more daemons in the system.
> 
> Note that the output has nothing to do with spawning your mpi_hello - it is
> solely describing the startup of the daemons.
> 
> 
> > On May 3, 2017, at 6:26 AM, r...@open-mpi.org wrote:
> > 
> > The orte routed framework does that for you - there is an API for that
> > purpose.
> > 
> > 
> >> On May 3, 2017, at 12:17 AM, Justin Cinkelj <justin.cink...@xlab.si>
> >> wrote:
> >> 
> >> Important detail first: I get this message from significantly modified
> >> Open MPI code, so problem exists solely due to my mistake.
> >> 
> >> Orterun on 192.168.122.90 starts orted on remote node 192.168.122.91, than
> >> orted figures out it has nothing to do.
> >> If I request to start workers on the same 192.168.122.90 IP, the mpi_hello
> >> is started.
> >> 
> >> Partial log:
> >> /usr/bin/mpirun -np 1 ... mpi_hello
> >> #
> >> [osv:00252] [[50738,0],0] plm:base:setup_job
> >> [osv:00252] [[50738,0],0] plm:base:setup_vm
> >> [osv:00252] [[50738,0],0] plm:base:setup_vm creating map
> >> [osv:00252] [[50738,0],0] setup:vm: working unmanaged allocation
> >> [osv:00252] [[50738,0],0] using dash_host
> >> [osv:00252] [[50738,0],0] checking node 192.168.122.91
> >> [osv:00252] [[50738,0],0] plm:base:setup_vm add new daemon [[50738,0],1]
> >> [osv:00252] [[50738,0],0] plm:base:setup_vm assigning new daemon
> >> [[50738,0],1] to node 192.168.122.91
> >> [osv:00252] [[50738,0],0] routed:binomial rank 0 parent 0 me 0 num_procs 2
> >> [osv:00252] [[50738,0],0] routed:binomial 0 found child 1
> >> [osv:00252] [[50738,0],0] routed:binomial rank 0 parent 0 me 1 num_procs 2
> >> [osv:00252] [[50738,0],0] routed:binomial find children of rank 0
> >> [osv:00252] [[50738,0],0] routed:binomial find children checking peer 1
> >> [osv:00252] [[50738,0],0] routed:binomial find children computing tree
> >> [osv:00252] [[50738,0],0] routed:binomial rank 1 parent 0 me 1 num_procs 2
> >> [osv:00252] [[50738,0],0] routed:binomial find children returning found
> >> value 0
> >> [osv:00252] [[50738,0],0]: parent 0 num_children 1
> >> [osv:00252] [[50738,0],0]:      child 1
> >> [osv:00252] [[50738,0],0] plm:osvrest: launching vm
> >> #
> >> [osv:00250] [[50738,0],1] plm:osvrest: remote spawn called
> >> [osv:00250] [[50738,0],1] routed:binomial rank 0 parent 0 me 1 num_procs 2
> >> [osv:00250] [[50738,0],1] routed:binomial find children of rank 0
> >> [osv:00250] [[50738,0],1] routed:binomial find children checking peer 1
> >> [osv:00250] [[50738,0],1] routed:binomial find children computing tree
> >> [osv:00250] [[50738,0],1] routed:binomial rank 1 parent 0 me 1 num_procs 2
> >> [osv:00250] [[50738,0],1] routed:binomial find children returning found
> >> value 0
> >> [osv:00250] [[50738,0],1]: parent 0 num_children 0
> >> [osv:00250] [[50738,0],1] plm:osvrest: remote spawn - have no children!
> >> 
> >> In the plm mca module remote_spawn() function (my plm is based on
> >> orte/mca/plm/rsh/), the &coll.targets list has zero length. My question
> >> is, which module(s) are responsible for filling in the coll.targets? Then
> >> I will turn on the correct mca xzy_base_verbose level, and hopefully
> >> narrow down my problem. I have quite a problem guessing/finding out what
> >> various xyz strings mean :)
> >> 
> >> Thank you, Justin
> >> _______________________________________________
> >> devel mailing list
> >> devel@lists.open-mpi.org
> >> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
> > 
> > _______________________________________________
> > devel mailing list
> > devel@lists.open-mpi.org
> > https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
> 
> _______________________________________________
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Reply via email to