Ralph, in the case of intercomm_create, the children free all the communicators and then MPI_Disconnect() and then MPI_Finalize() and exits. the parent only MPI_Disconnect() without freeing all the communicators. MPI_Finalize() tries to disconnect and communicate with already exited processes.
my understanding is that there are two ways of seeing things : a) the "R-way" : the problem is the parent should not try to communicate to already exited processes b) the "J-way" : the problem is the children should have waited either in MPI_Comm_free() or MPI_Finalize() i did not investigate the loop_spawn test yet, and will do today. as far as i am concerned, i have no opinion on which of a) or b) is the correct/most appropriate approach. Cheers, Gilles On Wed, May 28, 2014 at 9:46 AM, Ralph Castain <r...@open-mpi.org> wrote: > Since you ignored my response, I'll reiterate and clarify it here. The > problem in the case of loop_spawn is that the parent process remains > "connected" to children after the child has finalized and died. Hence, when > the parent attempts to finalize, it tries to "disconnect" itself from > processes that no longer exist - and that is what generates the error > message. > > So the issue in that case appears to be that "finalize" is not marking the > child process as "disconnected", thus leaving the parent thinking that it > needs to disconnect when it finally ends. > > > On May 27, 2014, at 5:33 PM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> > wrote: > > > Note that MPI says that COMM_DISCONNECT simply disconnects that > individual communicator. It does *not* guarantee that the processes > involved will be fully disconnected. > > > > So I think that the freeing of communicators is good app behavior, but > it is not required by the MPI spec. > > > > If OMPI is requiring this for correct termination, then something is > wrong. MPI_FINALIZE is supposed to be collective across all connected MPI > procs -- and if the parent and spawned procs in this test are still > connected (because they have not disconnected all communicators between > them), the FINALIZE is supposed to be collective across all of them. > > > > This means that FINALIZE is allowed to block if it needs to, such that > OMPI sending control messages to procs that are still "connected" (in the > MPI sense) should never cause a race condition. > > > > As such, this sounds like an OMPI bug. > > > > > > > > > > On May 27, 2014, at 2:27 AM, Gilles Gouaillardet < > gilles.gouaillar...@gmail.com> wrote: > > > >> Folks, > >> > >> currently, the dynamic/intercomm_create test from the ibm test suite > output the following messages : > >> > >> dpm_base_disconnect_init: error -12 in isend to process 1 > >> > >> the root cause it task 0 tries to send messages to already exited tasks. > >> > >> one way of seeing things is that this is an application issue : > >> task 0 should have MPI_Comm_free'd all its communicator before calling > MPI_Comm_disconnect. > >> This can be achieved via the attached patch > >> > >> an other way of seeing things is that this is a bug in OpenMPI. > >> In this case, what would be the the right approach ? > >> - automatically free communicators (if needed) when MPI_Comm_disconnect > is invoked ? > >> - simply remove communicators (if needed) from ompi_mpi_communicators > when MPI_Comm_disconnect is invoked ? > >> /* this causes a memory leak, but the application can be seen as > responsible of it */ > >> - other ? > >> > >> Thanks in advance for your feedback, > >> > >> Gilles > >> <intercomm_create.patch>_______________________________________________ > >> devel mailing list > >> de...@open-mpi.org > >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > >> Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/05/14847.php > > > > > > -- > > Jeff Squyres > > jsquy...@cisco.com > > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > > _______________________________________________ > > devel mailing list > > de...@open-mpi.org > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/05/14875.php > > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/05/14876.php >