On May 27, 2014, at 9:11 PM, Gilles Gouaillardet 
<gilles.gouaillar...@gmail.com> wrote:

> in the case of intercomm_create, the children free all the communicators and 
> then MPI_Disconnect() and then MPI_Finalize() and exits.
> the parent only MPI_Disconnect() without freeing all the communicators. 
> MPI_Finalize() tries to disconnect and communicate with already exited 
> processes.
> 
> my understanding is that there are two ways of seeing things :
> a) the "R-way" : the problem is the parent should not try to communicate to 
> already exited processes
> b) the "J-way" : the problem is the children should have waited either in 
> MPI_Comm_free() or MPI_Finalize()

I didn't ignore Ralph's email; I was pointing out what the MPI semantics are 
supposed to be.

I had only a short time this morning to look at the intercomm_create test 
program, and it looks like Gilles might be correct -- the children are freeing 
all relevant communicators *but the parent is not*.  If this is, indeed, 
correct, then a) OMPI's implementation might be fine because the test program 
is erroneous (i.e., the children *think* that they are disconnected and 
therefore allow themselves to exit, but the parents *think* that they are still 
connected and therefore try to contact the children during the parents' 
MPI_FINALIZE), and b) his original patch to the test program could well be 
correct.

I won't have time to investigate this today; if someone else could look at the 
test code and confirm whether this is correct or not, that would be appreciated.

> as far as i am concerned, i have no opinion on which of a) or b) is the 
> correct/most appropriate approach.

To be totally clear: MPI says it is erroneous for only some (not all) processes 
in a communicator to call MPI_COMM_FREE.  So if that's the real problem, then 
the discussion about why the parent(s) is(are) trying to contact the children 
is moot -- the test is erroneous, and erroneous application behavior is 
undefined.

All that being said, if we want to make this error case a bit friendlier to the 
user, that would be great (i.e., a show_help something like "It looks like an 
MPI process is trying to contact another MPI process that has already 
exited/called MPI_FINALIZE.  This is almost certainly an error in the 
application...").  But that's definitely extra bonus points, and not required.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

Reply via email to