I filed a PR against v1.10.7 though v1.10.7 may not be released. https://github.com/open-mpi/ompi/pull/3276
I'm not aware of v2.1.x issue, sorry. Other developer may be able to answer. Takahiro Kawashima, MPI development team, Fujitsu > Bullseye! > > Thank you, Takahiro, for your quick answer. Brief tests with 1.10.6 show > that this did indeed solve the problem! I will look at this in more > detail, but it looks really good now. > > About MPI_Comm_accept in 2.1.x. I've seen a thread here by Adam > Sylvester, where it essentially says that it is not working now, nor in > 2.0.x. I've checked the master, and it also does not work there. Is > there any time line for this? > > Thanks a lot! > > Marcin > > > > On 04/04/2017 11:03 AM, Kawashima, Takahiro wrote: > > Hi, > > > > I encountered a similar problem using MPI_COMM_SPAWN last month. > > Your problem my be same. > > > > The problem was fixed by commit 0951a34 in Open MPI master and > > backported to v2.1.x v2.0.x but not backported to v1.8.x and > > v1.10.x. > > > > https://github.com/open-mpi/ompi/commit/0951a34 > > > > Please try the attached patch. It was backported for v1.10 branch. > > > > The problem exists in the memory registration limit calculation > > in openib BTL and processes loop forever in OMPI_FREE_LIST_WAIT_MT > > when connecting to other ORTE jobs because openib_reg_mr returns > > OMPI_ERR_OUT_OF_RESOURCE. It probably affects MPI_COMM_SPAWN, > > MPI_COMM_SPAWN_MULTIPLE, MPI_COMM_ACCEPT, and MPI_COMM_CONNECT. > > > > Takahiro Kawashima, > > MPI development team, > > Fujitsu > > > >> Dear Developers, > >> > >> This is an old problem, which I described in an email to the users list > >> in 2015, but I continue to struggle with it. In short, MPI_Comm_accept / > >> MPI_Comm_disconnect combo causes any communication over openib btl > >> (e.g., also a barrier) to hang after a few clients connect and > >> disconnect from the server. I've noticed that the number of successful > >> connects depends on the number of server ranks, e.g., if my server has > >> 32 ranks, then the communication hangs already for the second connecting > >> client. > >> > >> I have now checked that the problem exists also in 1.10.6. As far as I > >> could tell, MPI_Comm_accept is not working in 2.0 and 2.1 at all, so I > >> could not test those versions. My previous investigations have shown > >> that the problem was introduced in 1.8.4. > >> > >> I wonder, will this be addressed in OpenMPI, or is this part of the MPI > >> functionality considered less important than the core? Should I file a > >> bug report? > >> > >> Thanks! > >> > >> Marcin Krotkiewski > >> > >> > >> On 09/16/2015 04:06 PM, marcin.krotkiewski wrote: > >>> I have run into a freeze / potential bug when using MPI_Comm_accept in > >>> a simple client / server implementation. I have attached two simplest > >>> programs I could produce: > >>> > >>> 1. mpi-receiver.c opens a port using MPI_Open_port, saves the port > >>> name to a file > >>> > >>> 2. mpi-receiver enters infinite loop and waits for connections using > >>> MPI_Comm_accept > >>> > >>> 3. mpi-sender.c connects to that port using MPI_Comm_connect, sends > >>> one MPI_UNSIGNED_LONG, calls barrier and disconnects using > >>> MPI_Comm_disconnect > >>> > >>> 4. mpi-receiver reads the MPI_UNSIGNED_LONG, prints it, calls barrier > >>> and disconnects using MPI_Comm_disconnect and goes to point 2 - > >>> infinite loop > >>> > >>> All works fine, but only exactly 5 times. After that the receiver > >>> hangs in MPI_Recv, after exit from MPI_Comm_accept. That is 100% > >>> repeatable. I have tried with Intel MPI - no such problem. > >>> > >>> I execute the programs using OpenMPI 1.10 as follows > >>> > >>> mpirun -np 1 --mca mpi_leave_pinned 0 ./mpi-receiver > >>> > >>> > >>> Do you have any clues what could be the reason? Am I doing sth wrong, > >>> or is it some problem with internal state of OpenMPI? > >>> > >>> Thanks a lot! > >>> > >>> Marcin _______________________________________________ devel mailing list devel@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/devel