So "remote spawn" and children refer to orted daemons only, and I was looking into wrong modules.
Which module(s) are then responsible to send command to orted to start mpi application? Which event names should I search for? Thank you, Justin ----- Original Message ----- > From: r...@open-mpi.org > To: "OpenMPI Devel" <devel@lists.open-mpi.org> > Sent: Wednesday, May 3, 2017 3:29:16 PM > Subject: Re: [OMPI devel] remote spawn - have no children > > I should have looked more closely as you already have the routed verbose > output there. Everything in fact looks correct. The node with mpirun has 1 > child, which is the daemon on the other node. The vpid=1 daemon on node 250 > doesn’t have any children as there aren’t any more daemons in the system. > > Note that the output has nothing to do with spawning your mpi_hello - it is > solely describing the startup of the daemons. > > > > On May 3, 2017, at 6:26 AM, r...@open-mpi.org wrote: > > > > The orte routed framework does that for you - there is an API for that > > purpose. > > > > > >> On May 3, 2017, at 12:17 AM, Justin Cinkelj <justin.cink...@xlab.si> > >> wrote: > >> > >> Important detail first: I get this message from significantly modified > >> Open MPI code, so problem exists solely due to my mistake. > >> > >> Orterun on 192.168.122.90 starts orted on remote node 192.168.122.91, than > >> orted figures out it has nothing to do. > >> If I request to start workers on the same 192.168.122.90 IP, the mpi_hello > >> is started. > >> > >> Partial log: > >> /usr/bin/mpirun -np 1 ... mpi_hello > >> # > >> [osv:00252] [[50738,0],0] plm:base:setup_job > >> [osv:00252] [[50738,0],0] plm:base:setup_vm > >> [osv:00252] [[50738,0],0] plm:base:setup_vm creating map > >> [osv:00252] [[50738,0],0] setup:vm: working unmanaged allocation > >> [osv:00252] [[50738,0],0] using dash_host > >> [osv:00252] [[50738,0],0] checking node 192.168.122.91 > >> [osv:00252] [[50738,0],0] plm:base:setup_vm add new daemon [[50738,0],1] > >> [osv:00252] [[50738,0],0] plm:base:setup_vm assigning new daemon > >> [[50738,0],1] to node 192.168.122.91 > >> [osv:00252] [[50738,0],0] routed:binomial rank 0 parent 0 me 0 num_procs 2 > >> [osv:00252] [[50738,0],0] routed:binomial 0 found child 1 > >> [osv:00252] [[50738,0],0] routed:binomial rank 0 parent 0 me 1 num_procs 2 > >> [osv:00252] [[50738,0],0] routed:binomial find children of rank 0 > >> [osv:00252] [[50738,0],0] routed:binomial find children checking peer 1 > >> [osv:00252] [[50738,0],0] routed:binomial find children computing tree > >> [osv:00252] [[50738,0],0] routed:binomial rank 1 parent 0 me 1 num_procs 2 > >> [osv:00252] [[50738,0],0] routed:binomial find children returning found > >> value 0 > >> [osv:00252] [[50738,0],0]: parent 0 num_children 1 > >> [osv:00252] [[50738,0],0]: child 1 > >> [osv:00252] [[50738,0],0] plm:osvrest: launching vm > >> # > >> [osv:00250] [[50738,0],1] plm:osvrest: remote spawn called > >> [osv:00250] [[50738,0],1] routed:binomial rank 0 parent 0 me 1 num_procs 2 > >> [osv:00250] [[50738,0],1] routed:binomial find children of rank 0 > >> [osv:00250] [[50738,0],1] routed:binomial find children checking peer 1 > >> [osv:00250] [[50738,0],1] routed:binomial find children computing tree > >> [osv:00250] [[50738,0],1] routed:binomial rank 1 parent 0 me 1 num_procs 2 > >> [osv:00250] [[50738,0],1] routed:binomial find children returning found > >> value 0 > >> [osv:00250] [[50738,0],1]: parent 0 num_children 0 > >> [osv:00250] [[50738,0],1] plm:osvrest: remote spawn - have no children! > >> > >> In the plm mca module remote_spawn() function (my plm is based on > >> orte/mca/plm/rsh/), the &coll.targets list has zero length. My question > >> is, which module(s) are responsible for filling in the coll.targets? Then > >> I will turn on the correct mca xzy_base_verbose level, and hopefully > >> narrow down my problem. I have quite a problem guessing/finding out what > >> various xyz strings mean :) > >> > >> Thank you, Justin > >> _______________________________________________ > >> devel mailing list > >> devel@lists.open-mpi.org > >> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel > > > > _______________________________________________ > > devel mailing list > > devel@lists.open-mpi.org > > https://rfd.newmexicoconsortium.org/mailman/listinfo/devel > > _______________________________________________ > devel mailing list > devel@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/devel _______________________________________________ devel mailing list devel@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/devel