I should have looked more closely as you already have the routed verbose output there. Everything in fact looks correct. The node with mpirun has 1 child, which is the daemon on the other node. The vpid=1 daemon on node 250 doesn’t have any children as there aren’t any more daemons in the system.
Note that the output has nothing to do with spawning your mpi_hello - it is solely describing the startup of the daemons. > On May 3, 2017, at 6:26 AM, r...@open-mpi.org wrote: > > The orte routed framework does that for you - there is an API for that > purpose. > > >> On May 3, 2017, at 12:17 AM, Justin Cinkelj <justin.cink...@xlab.si> wrote: >> >> Important detail first: I get this message from significantly modified Open >> MPI code, so problem exists solely due to my mistake. >> >> Orterun on 192.168.122.90 starts orted on remote node 192.168.122.91, than >> orted figures out it has nothing to do. >> If I request to start workers on the same 192.168.122.90 IP, the mpi_hello >> is started. >> >> Partial log: >> /usr/bin/mpirun -np 1 ... mpi_hello >> # >> [osv:00252] [[50738,0],0] plm:base:setup_job >> [osv:00252] [[50738,0],0] plm:base:setup_vm >> [osv:00252] [[50738,0],0] plm:base:setup_vm creating map >> [osv:00252] [[50738,0],0] setup:vm: working unmanaged allocation >> [osv:00252] [[50738,0],0] using dash_host >> [osv:00252] [[50738,0],0] checking node 192.168.122.91 >> [osv:00252] [[50738,0],0] plm:base:setup_vm add new daemon [[50738,0],1] >> [osv:00252] [[50738,0],0] plm:base:setup_vm assigning new daemon >> [[50738,0],1] to node 192.168.122.91 >> [osv:00252] [[50738,0],0] routed:binomial rank 0 parent 0 me 0 num_procs 2 >> [osv:00252] [[50738,0],0] routed:binomial 0 found child 1 >> [osv:00252] [[50738,0],0] routed:binomial rank 0 parent 0 me 1 num_procs 2 >> [osv:00252] [[50738,0],0] routed:binomial find children of rank 0 >> [osv:00252] [[50738,0],0] routed:binomial find children checking peer 1 >> [osv:00252] [[50738,0],0] routed:binomial find children computing tree >> [osv:00252] [[50738,0],0] routed:binomial rank 1 parent 0 me 1 num_procs 2 >> [osv:00252] [[50738,0],0] routed:binomial find children returning found >> value 0 >> [osv:00252] [[50738,0],0]: parent 0 num_children 1 >> [osv:00252] [[50738,0],0]: child 1 >> [osv:00252] [[50738,0],0] plm:osvrest: launching vm >> # >> [osv:00250] [[50738,0],1] plm:osvrest: remote spawn called >> [osv:00250] [[50738,0],1] routed:binomial rank 0 parent 0 me 1 num_procs 2 >> [osv:00250] [[50738,0],1] routed:binomial find children of rank 0 >> [osv:00250] [[50738,0],1] routed:binomial find children checking peer 1 >> [osv:00250] [[50738,0],1] routed:binomial find children computing tree >> [osv:00250] [[50738,0],1] routed:binomial rank 1 parent 0 me 1 num_procs 2 >> [osv:00250] [[50738,0],1] routed:binomial find children returning found >> value 0 >> [osv:00250] [[50738,0],1]: parent 0 num_children 0 >> [osv:00250] [[50738,0],1] plm:osvrest: remote spawn - have no children! >> >> In the plm mca module remote_spawn() function (my plm is based on >> orte/mca/plm/rsh/), the &coll.targets list has zero length. My question is, >> which module(s) are responsible for filling in the coll.targets? Then I will >> turn on the correct mca xzy_base_verbose level, and hopefully narrow down my >> problem. I have quite a problem guessing/finding out what various xyz >> strings mean :) >> >> Thank you, Justin >> _______________________________________________ >> devel mailing list >> devel@lists.open-mpi.org >> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel > > _______________________________________________ > devel mailing list > devel@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/devel _______________________________________________ devel mailing list devel@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/devel