On Tue, Aug 26, 2014 at 07:03:24PM +0300, Lisandro Dalcin wrote:
> I finally managed to track down some issues in mpi4py's test suite
> using Open MPI 1.8+. The code below should be enough to reproduce the
> problem. Run it under valgrind to make sense of my following
> diagnostics.
>
> In this code I'm creating a 2D, periodic Cartesian topology out of
> COMM_SELF. In this case, the process in COMM_SELF has 4 logical in/out
> links to itself. So we have size=1 but indegree=outdegree=4. However,
> in ompi/mca/coll/basic/coll_basic_module.c, "size * 2" request are
> being allocated to manage communication:
>
> if (OMPI_COMM_IS_INTER(comm)) {
> size = ompi_comm_remote_size(comm);
> } else {
> size = ompi_comm_size(comm);
> }
> basic_module->mccb_num_reqs = size * 2;
> basic_module->mccb_reqs = (ompi_request_t**)
> malloc(sizeof(ompi_request_t *) * basic_module->mccb_num_reqs);
>
> I guess you have to also special-case for topologies and allocate
> indegree+outdegree requests (not sure about this number, just
> guessing).
>I wish this was possible but the topology information is not available at that point. We may be able to change that but I don't see the work completing anytime soon. I committed an alternative fix as r32796 and CMR'd it to 1.8.3. I can confirm that the attached reproducer no longer produces a SEGV. Let me know if you run into any more issues. -Nathan
pgpiboDbxhbSj.pgp
Description: PGP signature
