Hi,

I encountered a similar problem using MPI_COMM_SPAWN last month.
Your problem my be same.

The problem was fixed by commit 0951a34 in Open MPI master and
backported to v2.1.x v2.0.x but not backported to v1.8.x and
v1.10.x.

  https://github.com/open-mpi/ompi/commit/0951a34

Please try the attached patch. It was backported for v1.10 branch.

The problem exists in the memory registration limit calculation
in openib BTL and processes loop forever in OMPI_FREE_LIST_WAIT_MT
when connecting to other ORTE jobs because openib_reg_mr returns
OMPI_ERR_OUT_OF_RESOURCE. It probably affects MPI_COMM_SPAWN,
MPI_COMM_SPAWN_MULTIPLE, MPI_COMM_ACCEPT, and MPI_COMM_CONNECT.

Takahiro Kawashima,
MPI development team,
Fujitsu

> Dear Developers,
> 
> This is an old problem, which I described in an email to the users list 
> in 2015, but I continue to struggle with it. In short, MPI_Comm_accept / 
> MPI_Comm_disconnect combo causes any communication over openib btl 
> (e.g., also a barrier) to hang after a few clients connect and 
> disconnect from the server. I've noticed that the number of successful 
> connects depends on the number of server ranks, e.g., if my server has 
> 32 ranks, then the communication hangs already for the second connecting 
> client.
> 
> I have now checked that the problem exists also in 1.10.6. As far as I 
> could tell, MPI_Comm_accept is not working in 2.0 and 2.1 at all, so I 
> could not test those versions. My previous investigations have shown 
> that the problem was introduced in 1.8.4.
> 
> I wonder, will this be addressed in OpenMPI, or is this part of the MPI 
> functionality considered less important than the core? Should I file a 
> bug report?
> 
> Thanks!
> 
> Marcin Krotkiewski
> 
> 
> On 09/16/2015 04:06 PM, marcin.krotkiewski wrote:
> > I have run into a freeze / potential bug when using MPI_Comm_accept in 
> > a simple client / server implementation. I have attached two simplest 
> > programs I could produce:
> >
> >  1. mpi-receiver.c opens a port using MPI_Open_port, saves the port 
> > name to a file
> >
> >  2. mpi-receiver enters infinite loop and waits for connections using 
> > MPI_Comm_accept
> >
> >  3. mpi-sender.c connects to that port using MPI_Comm_connect, sends 
> > one MPI_UNSIGNED_LONG, calls barrier and disconnects using 
> > MPI_Comm_disconnect
> >
> >  4. mpi-receiver reads the MPI_UNSIGNED_LONG, prints it, calls barrier 
> > and disconnects using MPI_Comm_disconnect and goes to point 2 - 
> > infinite loop
> >
> > All works fine, but only exactly 5 times. After that the receiver 
> > hangs in MPI_Recv, after exit from MPI_Comm_accept. That is 100% 
> > repeatable. I have tried with Intel MPI - no such problem.
> >
> > I execute the programs using OpenMPI 1.10 as follows
> >
> > mpirun -np 1 --mca mpi_leave_pinned 0 ./mpi-receiver
> >
> >
> > Do you have any clues what could be the reason? Am I doing sth wrong, 
> > or is it some problem with internal state of OpenMPI?
> >
> > Thanks a lot!
> >
> > Marcin
diff --git a/ompi/mca/btl/openib/btl_openib.c b/ompi/mca/btl/openib/btl_openib.c
index 7030aa1..38790af 100644
--- a/ompi/mca/btl/openib/btl_openib.c
+++ b/ompi/mca/btl/openib/btl_openib.c
@@ -1074,7 +1074,7 @@ int mca_btl_openib_add_procs(
     }
 
     openib_btl->local_procs += local_procs;
-    openib_btl->device->mem_reg_max /= openib_btl->local_procs;
+    openib_btl->device->mem_reg_max = openib_btl->device->mem_reg_max_total / openib_btl->local_procs;
 
     return mca_btl_openib_size_queues(openib_btl, nprocs);
 }
diff --git a/ompi/mca/btl/openib/btl_openib.h b/ompi/mca/btl/openib/btl_openib.h
index a3b1f87..38d9d6f 100644
--- a/ompi/mca/btl/openib/btl_openib.h
+++ b/ompi/mca/btl/openib/btl_openib.h
@@ -417,7 +417,7 @@ typedef struct mca_btl_openib_device_t {
     /* Maximum value supported by this device for max_inline_data */
     uint32_t max_inline_data;
     /* Registration limit and current count */
-    uint64_t mem_reg_max, mem_reg_active;
+    uint64_t mem_reg_max, mem_reg_max_total, mem_reg_active;
     /* Device is ready for use */
     bool ready_for_use;
 } mca_btl_openib_device_t;
diff --git a/ompi/mca/btl/openib/btl_openib_component.c b/ompi/mca/btl/openib/btl_openib_component.c
index 40831f2..06ff9d4 100644
--- a/ompi/mca/btl/openib/btl_openib_component.c
+++ b/ompi/mca/btl/openib/btl_openib_component.c
@@ -1549,7 +1549,8 @@ static int init_one_device(opal_list_t *btl_list, struct ibv_device* ib_dev)
     }
 
     device->mem_reg_active = 0;
-    device->mem_reg_max    = calculate_max_reg(ibv_get_device_name(ib_dev));
+    device->mem_reg_max_total = calculate_max_reg(ibv_get_device_name(ib_dev));
+    device->mem_reg_max = device->mem_reg_max_total;
 
     device->ib_dev = ib_dev;
     device->ib_dev_context = dev_context;
_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Reply via email to