I should have looked more closely as you already have the routed verbose output 
there. Everything in fact looks correct. The node with mpirun has 1 child, 
which is the daemon on the other node. The vpid=1 daemon on node 250 doesn’t 
have any children as there aren’t any more daemons in the system.

Note that the output has nothing to do with spawning your mpi_hello - it is 
solely describing the startup of the daemons.


> On May 3, 2017, at 6:26 AM, r...@open-mpi.org wrote:
> 
> The orte routed framework does that for you - there is an API for that 
> purpose.
> 
> 
>> On May 3, 2017, at 12:17 AM, Justin Cinkelj <justin.cink...@xlab.si> wrote:
>> 
>> Important detail first: I get this message from significantly modified Open 
>> MPI code, so problem exists solely due to my mistake.
>> 
>> Orterun on 192.168.122.90 starts orted on remote node 192.168.122.91, than 
>> orted figures out it has nothing to do.
>> If I request to start workers on the same 192.168.122.90 IP, the mpi_hello 
>> is started.
>> 
>> Partial log:
>> /usr/bin/mpirun -np 1 ... mpi_hello
>> #
>> [osv:00252] [[50738,0],0] plm:base:setup_job
>> [osv:00252] [[50738,0],0] plm:base:setup_vm
>> [osv:00252] [[50738,0],0] plm:base:setup_vm creating map
>> [osv:00252] [[50738,0],0] setup:vm: working unmanaged allocation
>> [osv:00252] [[50738,0],0] using dash_host
>> [osv:00252] [[50738,0],0] checking node 192.168.122.91
>> [osv:00252] [[50738,0],0] plm:base:setup_vm add new daemon [[50738,0],1]
>> [osv:00252] [[50738,0],0] plm:base:setup_vm assigning new daemon 
>> [[50738,0],1] to node 192.168.122.91
>> [osv:00252] [[50738,0],0] routed:binomial rank 0 parent 0 me 0 num_procs 2
>> [osv:00252] [[50738,0],0] routed:binomial 0 found child 1
>> [osv:00252] [[50738,0],0] routed:binomial rank 0 parent 0 me 1 num_procs 2
>> [osv:00252] [[50738,0],0] routed:binomial find children of rank 0
>> [osv:00252] [[50738,0],0] routed:binomial find children checking peer 1
>> [osv:00252] [[50738,0],0] routed:binomial find children computing tree
>> [osv:00252] [[50738,0],0] routed:binomial rank 1 parent 0 me 1 num_procs 2
>> [osv:00252] [[50738,0],0] routed:binomial find children returning found 
>> value 0
>> [osv:00252] [[50738,0],0]: parent 0 num_children 1
>> [osv:00252] [[50738,0],0]:      child 1
>> [osv:00252] [[50738,0],0] plm:osvrest: launching vm
>> #
>> [osv:00250] [[50738,0],1] plm:osvrest: remote spawn called
>> [osv:00250] [[50738,0],1] routed:binomial rank 0 parent 0 me 1 num_procs 2
>> [osv:00250] [[50738,0],1] routed:binomial find children of rank 0
>> [osv:00250] [[50738,0],1] routed:binomial find children checking peer 1
>> [osv:00250] [[50738,0],1] routed:binomial find children computing tree
>> [osv:00250] [[50738,0],1] routed:binomial rank 1 parent 0 me 1 num_procs 2
>> [osv:00250] [[50738,0],1] routed:binomial find children returning found 
>> value 0
>> [osv:00250] [[50738,0],1]: parent 0 num_children 0
>> [osv:00250] [[50738,0],1] plm:osvrest: remote spawn - have no children!
>> 
>> In the plm mca module remote_spawn() function (my plm is based on 
>> orte/mca/plm/rsh/), the &coll.targets list has zero length. My question is, 
>> which module(s) are responsible for filling in the coll.targets? Then I will 
>> turn on the correct mca xzy_base_verbose level, and hopefully narrow down my 
>> problem. I have quite a problem guessing/finding out what various xyz 
>> strings mean :)
>> 
>> Thank you, Justin
>> _______________________________________________
>> devel mailing list
>> devel@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
> 
> _______________________________________________
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Reply via email to