On May 28, 2014, at 6:41 AM, Gilles Gouaillardet 
<gilles.gouaillar...@gmail.com> wrote:

> Ralph,
> 
> 
> On Wed, May 28, 2014 at 9:33 PM, Ralph Castain <r...@open-mpi.org> wrote:
>> This is definetly what happens : only some tasks call MPI_Comm_free()
> 
> Really? I don't see how that can happen in loop_spawn - every process is 
> clearly calling comm_free. Or are you referring to the intercomm_create test?
> 
> yes, i am referring intercomm_create test

kewl - thanks

> 
> about loop_spawn, i could not get any error on my single host single socket 
> VM.
> (i tried --mca btl tcp,sm,self and --mca btl tcp,self)
> 
> MPI_Finalize will end up calling ompi_dpm_dyn_finalize which causes the error 
> message on the parent of intercomm_create.
> a necessary condition is ompi_comm_num_dyncomm > 1
> /* which by the way sounds odd to me, should it be 0 ? */

That does sound odd

> which imho cannot happen if all communicators have been freed
> 
> can you detail your full mpirun command line, the number of servers you are 
> using, the btl involved and the ompi release that can be used to reproduce 
> the issue ?

Running on only one server, using the current head of the svn repo. My cluster 
only has Ethernet, and I let it freely choose the BTLs (so I imagine the 
candidates are sm,self,tcp,vader). The cmd line is really trivial:

mpirun -n 1 ./loop_spawn

I modified loop_spawn to only run 100 iterations as I am not patient enough to 
wait for 1000, and the number of iters isn't a factor so long as it is greater 
than 1. When the parent calls finalize, I get one of the following emitted for 
every iteration that was done:

dpm_base_disconnect_init: error -12 in isend to process 0

So in other words, the parent is attempting to isend to every child that was 
spawned during the test - it thinks that every comm_spawn'd process remains 
connected to it.

I'm wondering if the issue is that the parent and child are calling comm_free, 
but neither side called comm_disconnect. So unless comm_free is calling 
disconnect under-the-covers, it might explain why the parent thinks all the 
children are still present.


> 
> i will try to reproduce this myself
> 
> Cheers,
> 
> Gilles
> 
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/05/14890.php

Reply via email to