Nathan? On Oct 1, 2014, at 10:27 AM, George Bosilca <bosi...@icl.utk.edu> wrote:
> I see no change in the topo interface in any of the patches attached to this > thread. Is there any other patch related to this discussion? > > George. > > > >> On Oct 1, 2014, at 14:52, Jeff Squyres (jsquyres) <jsquy...@cisco.com> wrote: >> >>> On Oct 1, 2014, at 6:48 AM, Gilles Gouaillardet >>> <gilles.gouaillar...@iferc.org> wrote: >>> >>> oops, i missed graph and dist graph :-( >>> >>> the attached patch fixes this >>> /* it was straightforward for graph and a bit more challenging for dist >>> graph */ >>> >>> i am fine for Nathan's patch for v1.8 >>> /* v1.8 and trunk have already a different way to handle a memory leak */ >>> /* the attached patch is vs v1.8 but it is for review only, i will port >>> it to the trunk */ >>> >>> imho, there is no ABI break with this patch. >>> out of curiosity, what is the ABI break that is/will be introduced with >>> the "real" fix ? >> >> I haven't looked at the code at all, but Nathan said it changed the topo >> framework interface. That's the ABI I was referring to -- not the MPI ABI. >> Sorry for the confusions. >> >>> FYI, i also enhanced the collective/neighbor_allgather_self from the ibm >>> test suite in order to test >>> graph and dist_graph >>> >>> Cheers, >>> >>> Gilles >>> >>> >>>> On 2014/10/01 0:58, Jeff Squyres (jsquyres) wrote: >>>> On the call today, we decided that Nathan's v1.8 patch is sufficient for >>>> the v1.8 series, and the "real" fix will be applied to the trunk and >>>> carried forward to the v1.9 series (since it introduces an ABI break for >>>> the topo framework, which we try not to do in the middle of a release >>>> series). >>>> >>>> On Sep 30, 2014, at 10:40 AM, Nathan Hjelm <hje...@lanl.gov> wrote: >>>> >>>>> Not quite right. There still is no topology information at collective >>>>> selection time for either graph or dist graph. >>>>> >>>>> -Nathan >>>>> >>>>>> On Tue, Sep 30, 2014 at 02:03:27PM +0900, Gilles Gouaillardet wrote: >>>>>> Nathan, >>>>>> >>>>>> here is a revision of the previously attached patch, and that supports >>>>>> graph and dist graph. >>>>>> >>>>>> Cheers, >>>>>> >>>>>> Gilles >>>>>> >>>>>> On 2014/09/30 0:05, Nathan Hjelm wrote: >>>>>> >>>>>> An equivalent change would need to be made for graph and dist graph as >>>>>> well. That will take a little more work. Also, I was avoiding changing >>>>>> anything in topo for 1.8. >>>>>> >>>>>> -Nathan >>>>>> >>>>>> On Mon, Sep 29, 2014 at 08:02:41PM +0900, Gilles Gouaillardet wrote: >>>>>> >>>>>> Nathan, >>>>>> >>>>>> why not just make the topology information available at that point as you >>>>>> described it ? >>>>>> >>>>>> the attached patch does this, could you please review it ? >>>>>> >>>>>> Cheers, >>>>>> >>>>>> Gilles >>>>>> >>>>>> On 2014/09/26 2:50, Nathan Hjelm wrote: >>>>>> >>>>>> On Tue, Aug 26, 2014 at 07:03:24PM +0300, Lisandro Dalcin wrote: >>>>>> >>>>>> I finally managed to track down some issues in mpi4py's test suite >>>>>> using Open MPI 1.8+. The code below should be enough to reproduce the >>>>>> problem. Run it under valgrind to make sense of my following >>>>>> diagnostics. >>>>>> >>>>>> In this code I'm creating a 2D, periodic Cartesian topology out of >>>>>> COMM_SELF. In this case, the process in COMM_SELF has 4 logical in/out >>>>>> links to itself. So we have size=1 but indegree=outdegree=4. However, >>>>>> in ompi/mca/coll/basic/coll_basic_module.c, "size * 2" request are >>>>>> being allocated to manage communication: >>>>>> >>>>>> if (OMPI_COMM_IS_INTER(comm)) { >>>>>> size = ompi_comm_remote_size(comm); >>>>>> } else { >>>>>> size = ompi_comm_size(comm); >>>>>> } >>>>>> basic_module->mccb_num_reqs = size * 2; >>>>>> basic_module->mccb_reqs = (ompi_request_t**) >>>>>> malloc(sizeof(ompi_request_t *) * basic_module->mccb_num_reqs); >>>>>> >>>>>> I guess you have to also special-case for topologies and allocate >>>>>> indegree+outdegree requests (not sure about this number, just >>>>>> guessing). >>>>>> >>>>>> >>>>>> I wish this was possible but the topology information is not available >>>>>> at that point. We may be able to change that but I don't see the work >>>>>> completing anytime soon. I committed an alternative fix as r32796 and >>>>>> CMR'd it to 1.8.3. I can confirm that the attached reproducer no longer >>>>>> produces a SEGV. Let me know if you run into any more issues. >>>>>> >>>>>> >>>>>> -Nathan >>>>>> >>>>>> _______________________________________________ >>>>>> devel mailing list >>>>>> de...@open-mpi.org >>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>>> Link to this post: >>>>>> http://www.open-mpi.org/community/lists/devel/2014/09/15915.php >>>>>> >>>>>> Index: ompi/mca/topo/base/topo_base_cart_create.c >>>>>> =================================================================== >>>>>> --- ompi/mca/topo/base/topo_base_cart_create.c (revision 32807) >>>>>> +++ ompi/mca/topo/base/topo_base_cart_create.c (working copy) >>>>>> @@ -163,10 +163,18 @@ >>>>>> return MPI_ERR_INTERN; >>>>>> } >>>>>> >>>>>> + assert(NULL == new_comm->c_topo); >>>>>> + assert(!(new_comm->c_flags & OMPI_COMM_CART)); >>>>>> + new_comm->c_topo = topo; >>>>>> + new_comm->c_topo->mtc.cart = cart; >>>>>> + new_comm->c_topo->reorder = reorder; >>>>>> + new_comm->c_flags |= OMPI_COMM_CART; >>>>>> ret = ompi_comm_enable(old_comm, new_comm, >>>>>> new_rank, num_procs, topo_procs); >>>>>> if (OMPI_SUCCESS != ret) { >>>>>> /* something wrong happened during setting the communicator */ >>>>>> + new_comm->c_topo = NULL; >>>>>> + new_comm->c_flags &= ~OMPI_COMM_CART; >>>>>> ompi_comm_free (&new_comm); >>>>>> free(topo_procs); >>>>>> if(NULL != cart->periods) free(cart->periods); >>>>>> @@ -176,10 +184,6 @@ >>>>>> return ret; >>>>>> } >>>>>> >>>>>> - new_comm->c_topo = topo; >>>>>> - new_comm->c_topo->mtc.cart = cart; >>>>>> - new_comm->c_topo->reorder = reorder; >>>>>> - new_comm->c_flags |= OMPI_COMM_CART; >>>>>> *comm_topo = new_comm; >>>>>> >>>>>> if( MPI_UNDEFINED == new_rank ) { >>>>>> Index: ompi/mca/coll/basic/coll_basic_module.c >>>>>> =================================================================== >>>>>> --- ompi/mca/coll/basic/coll_basic_module.c (revision 32807) >>>>>> +++ ompi/mca/coll/basic/coll_basic_module.c (working copy) >>>>>> @@ -13,6 +13,8 @@ >>>>>> * Copyright (c) 2012 Sandia National Laboratories. All rights >>>>>> reserved. >>>>>> * Copyright (c) 2013 Los Alamos National Security, LLC. All rights >>>>>> * reserved. >>>>>> + * Copyright (c) 2014 Research Organization for Information Science >>>>>> + * and Technology (RIST). All rights reserved. >>>>>> * $COPYRIGHT$ >>>>>> * >>>>>> * Additional copyrights may follow >>>>>> @@ -28,6 +30,7 @@ >>>>>> #include "mpi.h" >>>>>> #include "ompi/mca/coll/coll.h" >>>>>> #include "ompi/mca/coll/base/base.h" >>>>>> +#include "ompi/mca/topo/topo.h" >>>>>> #include "coll_basic.h" >>>>>> >>>>>> >>>>>> @@ -70,6 +73,15 @@ >>>>>> } else { >>>>>> size = ompi_comm_size(comm); >>>>>> } >>>>>> + if (comm->c_flags & OMPI_COMM_CART) { >>>>>> + int cart_size; >>>>>> + assert (NULL != comm->c_topo); >>>>>> + comm->c_topo->topo.cart.cartdim_get(comm, &cart_size); >>>>>> + cart_size *= 2; >>>>>> + if (cart_size > size) { >>>>>> + size = cart_size; >>>>>> + } >>>>>> + } >>>>>> basic_module->mccb_num_reqs = size * 2; >>>>>> basic_module->mccb_reqs = (ompi_request_t**) >>>>>> malloc(sizeof(ompi_request_t *) * basic_module->mccb_num_reqs); >>>>>> >>>>>> _______________________________________________ >>>>>> devel mailing list >>>>>> de...@open-mpi.org >>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>>> Link to this post: >>>>>> http://www.open-mpi.org/community/lists/devel/2014/09/15929.php >>>>>> >>>>>> _______________________________________________ >>>>>> devel mailing list >>>>>> de...@open-mpi.org >>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>>> Link to this post: >>>>>> http://www.open-mpi.org/community/lists/devel/2014/09/15930.php >>>>>> Index: ompi/mca/topo/base/topo_base_cart_create.c >>>>>> =================================================================== >>>>>> --- ompi/mca/topo/base/topo_base_cart_create.c (revision 32815) >>>>>> +++ ompi/mca/topo/base/topo_base_cart_create.c (working copy) >>>>>> @@ -163,10 +163,18 @@ >>>>>> return MPI_ERR_INTERN; >>>>>> } >>>>>> >>>>>> + assert(NULL == new_comm->c_topo); >>>>>> + assert(!(new_comm->c_flags & OMPI_COMM_CART)); >>>>>> + new_comm->c_topo = topo; >>>>>> + new_comm->c_topo->mtc.cart = cart; >>>>>> + new_comm->c_topo->reorder = reorder; >>>>>> + new_comm->c_flags |= OMPI_COMM_CART; >>>>>> ret = ompi_comm_enable(old_comm, new_comm, >>>>>> new_rank, num_procs, topo_procs); >>>>>> if (OMPI_SUCCESS != ret) { >>>>>> /* something wrong happened during setting the communicator */ >>>>>> + new_comm->c_topo = NULL; >>>>>> + new_comm->c_flags &= ~OMPI_COMM_CART; >>>>>> ompi_comm_free (&new_comm); >>>>>> free(topo_procs); >>>>>> if(NULL != cart->periods) free(cart->periods); >>>>>> @@ -176,10 +184,6 @@ >>>>>> return ret; >>>>>> } >>>>>> >>>>>> - new_comm->c_topo = topo; >>>>>> - new_comm->c_topo->mtc.cart = cart; >>>>>> - new_comm->c_topo->reorder = reorder; >>>>>> - new_comm->c_flags |= OMPI_COMM_CART; >>>>>> *comm_topo = new_comm; >>>>>> >>>>>> if( MPI_UNDEFINED == new_rank ) { >>>>>> Index: ompi/mca/coll/basic/coll_basic_module.c >>>>>> =================================================================== >>>>>> --- ompi/mca/coll/basic/coll_basic_module.c (revision 32815) >>>>>> +++ ompi/mca/coll/basic/coll_basic_module.c (working copy) >>>>>> @@ -13,6 +13,8 @@ >>>>>> * Copyright (c) 2012 Sandia National Laboratories. All rights >>>>>> reserved. >>>>>> * Copyright (c) 2013 Los Alamos National Security, LLC. All rights >>>>>> * reserved. >>>>>> + * Copyright (c) 2014 Research Organization for Information Science >>>>>> + * and Technology (RIST). All rights reserved. >>>>>> * $COPYRIGHT$ >>>>>> * >>>>>> * Additional copyrights may follow >>>>>> @@ -28,6 +30,8 @@ >>>>>> #include "mpi.h" >>>>>> #include "ompi/mca/coll/coll.h" >>>>>> #include "ompi/mca/coll/base/base.h" >>>>>> +#include "ompi/mca/topo/topo.h" >>>>>> +#include "ompi/mca/topo/base/base.h" >>>>>> #include "coll_basic.h" >>>>>> >>>>>> >>>>>> @@ -70,7 +74,36 @@ >>>>>> } else { >>>>>> size = ompi_comm_size(comm); >>>>>> } >>>>>> - basic_module->mccb_num_reqs = size * 2; >>>>>> + size *= 2; >>>>>> + if (OMPI_COMM_IS_CART(comm)) { >>>>>> + int cart_size; >>>>>> + mca_topo_base_comm_cart_2_1_0_t *cart; >>>>>> + assert (NULL != comm->c_topo); >>>>>> + cart = comm->c_topo->mtc.cart; >>>>>> + cart_size = cart->ndims * 4; >>>>>> + if (cart_size > size) { >>>>>> + size = cart_size; >>>>>> + } >>>>>> + } else if (OMPI_COMM_IS_GRAPH(comm)) { >>>>>> + int rank, degree; >>>>>> + assert (NULL != comm->c_topo); >>>>>> + rank = ompi_comm_rank (comm); >>>>>> + mca_topo_base_graph_neighbors_count (comm, rank, °ree); >>>>>> + degree *= 2; >>>>>> + if (degree > size) { >>>>>> + size = degree; >>>>>> + } >>>>>> + } else if (OMPI_COMM_IS_DIST_GRAPH(comm)) { >>>>>> + int dist_graph_size; >>>>>> + mca_topo_base_comm_dist_graph_2_1_0_t *dist_graph; >>>>>> + assert (NULL != comm->c_topo); >>>>>> + dist_graph = comm->c_topo->mtc.dist_graph; >>>>>> + dist_graph_size = dist_graph->indegree + dist_graph->outdegree; >>>>>> + if (dist_graph_size > size) { >>>>>> + size = dist_graph_size; >>>>>> + } >>>>>> + } >>>>>> + basic_module->mccb_num_reqs = size; >>>>>> basic_module->mccb_reqs = (ompi_request_t**) >>>>>> malloc(sizeof(ompi_request_t *) * basic_module->mccb_num_reqs); >>>>>> >>>>>> _______________________________________________ >>>>>> devel mailing list >>>>>> de...@open-mpi.org >>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>>> Link to this post: >>>>>> http://www.open-mpi.org/community/lists/devel/2014/09/15941.php >>>>> _______________________________________________ >>>>> devel mailing list >>>>> de...@open-mpi.org >>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>> Link to this post: >>>>> http://www.open-mpi.org/community/lists/devel/2014/09/15945.php >>> >>> <mmcb.v3.patch>_______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> Link to this post: >>> http://www.open-mpi.org/community/lists/devel/2014/10/15957.php >> >> >> -- >> Jeff Squyres >> jsquy...@cisco.com >> For corporate legal information go to: >> http://www.cisco.com/web/about/doing_business/legal/cri/ >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2014/10/15959.php > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/10/15960.php -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/