On 25 September 2014 20:50, Nathan Hjelm <hje...@lanl.gov> wrote: > On Tue, Aug 26, 2014 at 07:03:24PM +0300, Lisandro Dalcin wrote: >> I finally managed to track down some issues in mpi4py's test suite >> using Open MPI 1.8+. The code below should be enough to reproduce the >> problem. Run it under valgrind to make sense of my following >> diagnostics. >> >> In this code I'm creating a 2D, periodic Cartesian topology out of >> COMM_SELF. In this case, the process in COMM_SELF has 4 logical in/out >> links to itself. So we have size=1 but indegree=outdegree=4. However, >> in ompi/mca/coll/basic/coll_basic_module.c, "size * 2" request are >> being allocated to manage communication: >> >> if (OMPI_COMM_IS_INTER(comm)) { >> size = ompi_comm_remote_size(comm); >> } else { >> size = ompi_comm_size(comm); >> } >> basic_module->mccb_num_reqs = size * 2; >> basic_module->mccb_reqs = (ompi_request_t**) >> malloc(sizeof(ompi_request_t *) * basic_module->mccb_num_reqs); >> >> I guess you have to also special-case for topologies and allocate >> indegree+outdegree requests (not sure about this number, just >> guessing). >> > > I wish this was possible but the topology information is not available > at that point. We may be able to change that but I don't see the work > completing anytime soon. I committed an alternative fix as r32796 and > CMR'd it to 1.8.3. I can confirm that the attached reproducer no longer > produces a SEGV. Let me know if you run into any more issues. >
Did your fix get in for 1.8.3? I'm still getting the segfault. -- Lisandro Dalcin ============ Research Scientist Computer, Electrical and Mathematical Sciences & Engineering (CEMSE) Numerical Porous Media Center (NumPor) King Abdullah University of Science and Technology (KAUST) http://numpor.kaust.edu.sa/ 4700 King Abdullah University of Science and Technology al-Khawarizmi Bldg (Bldg 1), Office # 4332 Thuwal 23955-6900, Kingdom of Saudi Arabia http://www.kaust.edu.sa Office Phone: +966 12 808-0459