Great! I'll welcome the patch - feel free to back mine out when you do. Thanks!
On Sep 17, 2013, at 2:43 PM, George Bosilca <bosi...@icl.utk.edu> wrote: > On Sep 17, 2013, at 23:19 , Ralph Castain <r...@open-mpi.org> wrote: > >> I very much doubt that it would work, though I can give it a try, as the >> patch addresses Intercomm_merge and not Intercomm_create. I debated about >> putting the patch into "create" instead, but nobody was citing that as being >> a problem. In my opinion, it makes more sense for it to be in "create", and >> I can certainly shift it to that location easily enough. > > So we converge here. If the problem was correctly addressed at > Intercomm_create time there will be no need to address it Intercomm_merge, as > the only way to get an intercomm where peers don't know each other modex info > is via Intercomm_create. Every other function that create an > inter-communicators do so starting from a common group, so the peers know > each other. > >> My concern with your approach is that I'm not convinced it will work. The >> problem is that not all the MPI procs can communicate via MPI at this point >> because they lack the required info and haven't added the procs into the >> BTLs yet. So packing modex info into a buffer and attempting to send it via >> MPI could just cause the lockup to occur sooner. > > You will have to believe me on this one, but MPI_Intercomm_create is a one of > a kind call, not a very straightforward concept (this is why I suggested the > read of the 6.6.2). One of the arguments to this function is a bridge > communicator, where the two leaders belong together. So the two sides are not > totally unknown to each other, their leaders know each other as they belong > already to this "bridge" communicator (obviously each group should know how > to communicate with their leader). My solution was to reduce the modex info > on each group on their leader, let the leaders exchange this "local group > modex information", and then broadcast locally the remote modex info. > >> Hence the approach of ensuring all procs have the required info. Not >> optimal, I agree, but performance isn't an issue with this function, and the >> trivial amount of RTE effort didn't seem worth worrying about. > > My concern is that it forces every other RTE supported by Open MPI to provide > a functionality that is so MPI specific that even the MPI libraries have a > hard time supporting it. > > I have a half working patch. Don't push the CMR yet, I'll ping you back soon. > > George. > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel