Jeff,

On Wed, May 28, 2014 at 8:31 PM, Jeff Squyres (jsquyres)
> To be totally clear: MPI says it is erroneous for only some (not all)
processes in a communicator to call MPI_COMM_FREE.  So if that's the real
problem, then the discussion about why the parent(s) is(are) trying to
contact the children is moot -- the test is erroneous, and erroneous
application behavior is undefined.

This is definetly what happens : only some tasks call MPI_Comm_free()
i will commit my changes and the initially reported issue is solved :-)



about the "bonus points" :

v1.8 does not have this issue

i digged it and bottom line, the parent (who did not call MPI_Comm_free
unlike the children) calls ompi_dpm_base_dyn_finalize, which tries to isend
the already exited tasks.


bottom line, in pml_ob1_sendreq.h line 450

with v1,8
mca_bml_base_btl_array_get_size(&endpoint->btl_eager) = 0
nothing is sent but isend is reported successful

with trunk
mca_bml_base_btl_array_get_size(&endpoint->btl_eager) = 1
and then try to send the message => BOUM

i found various things that seem counter intuitive to me and will summarize
all this tomorrow.

Cheers,

Gilles

Reply via email to