I got your point, I was just pointing to an other option "in the meantime".
fwiw, the bits of code I posted earlier can be put in a shared library that is LD_PRELOAD'ed, so the application is kept unmodified. Cheers, Gilles On Sunday, August 21, 2016, r...@open-mpi.org <r...@open-mpi.org> wrote: > As I said earlier, modifying these legacy apps is not a desirable > solution. The coll/sync component was developed specifically to alleviate > these problems in an acceptable manner, albeit not optimal. Performance, in > this case, is secondary to just getting the app to run. > > > On Aug 20, 2016, at 7:38 PM, Gilles Gouaillardet < > gilles.gouaillar...@gmail.com > <javascript:_e(%7B%7D,'cvml','gilles.gouaillar...@gmail.com');>> wrote: > > Ralph, > > in the meantime, and if not done already, your user can simply redefine > MPI_Bcast in the app. > > > int MPI_Bcast(void *buffer, int count, MPI_Datatype type, int root, > MPI_Comm comm) { > PMPI_Barrier(comm); > return PMPI_Bcast(buffer, count, datatype, root, comm); > } > > the root causes are > - no control flow in Open MPI for eager messages (as explained by George) > and > - some processes are much slower than others. > > so even if Open MPI provides a fix or workaround, the end user will be > left with some important load imbalance, which is far from being optimal > from his/her performance point of view. > > > Cheers, > > Gilles > > On Sunday, August 21, 2016, r...@open-mpi.org > <javascript:_e(%7B%7D,'cvml','r...@open-mpi.org');> <r...@open-mpi.org > <javascript:_e(%7B%7D,'cvml','r...@open-mpi.org');>> wrote: > >> I don’t disagree with anything you said - however, this problem has been >> reported in our library for more than a decade (goes way back into the old >> Trac days), and has yet to be resolved. Meantime, we have a user that is >> “down” and needs a solution. Whether it is a “cheap shot” or not is >> irrelevant to them. >> >> I’ll leave it to you deeper MPI wonks to solve the problem correctly :-) >> When you have done so, I will happily remove the coll/sync component and >> tell the user “all has been resolved”. >> >> >> RROn Aug 20, 2016, at 11:44 AM, George Bosilca <bosi...@icl.utk.edu> >> wrote: >> >> Ralph, >> >> Bringing back the coll/sync is a cheap shot at hiding a real issue behind >> a smoke curtain. As Nathan described in his email, Open MPI lacks of >> control flow on eager messages is the real culprit here, and the loop >> around any one-to-many collective (bcast and scatter*) was only helping to >> exacerbate the issue. However, doing a loop around a small MPI_Send will >> also end on a memory exhaustion issue, one that would not be easily >> circumvented by adding synchronizations deep inside the library. >> >> George. >> >> >> On Sat, Aug 20, 2016 at 12:30 AM, r...@open-mpi.org <r...@open-mpi.org> >> wrote: >> >>> I can not provide the user report as it is a proprietary problem. >>> However, it consists of a large loop of calls to MPI_Bcast that crashes due >>> to unexpected messages. We have been looking at instituting flow control, >>> but that has way too widespread an impact. The coll/sync component would be >>> a simple solution. >>> >>> I honestly don’t believe the issue I was resolving was due to a bug - it >>> was a simple problem of one proc running slow and creating an overload of >>> unexpected messages that eventually consumed too much memory. Rather, I >>> think you solved a different problem - by the time you arrived at LANL, the >>> app I was working with had already modified their code to no longer create >>> the problem (essentially refactoring the algorithm to avoid the massive >>> loop over allreduce). >>> >>> I have no issue supporting it as it takes near-zero effort to maintain, >>> and this is a fairly common problem with legacy codes that don’t want to >>> refactor their algorithms. >>> >>> >>> > On Aug 19, 2016, at 8:48 PM, Nathan Hjelm <hje...@me.com> wrote: >>> > >>> >> On Aug 19, 2016, at 4:24 PM, r...@open-mpi.org wrote: >>> >> >>> >> Hi folks >>> >> >>> >> I had a question arise regarding a problem being seen by an OMPI user >>> - has to do with the old bugaboo I originally dealt with back in my LANL >>> days. The problem is with an app that repeatedly hammers on a collective, >>> and gets overwhelmed by unexpected messages when one of the procs falls >>> behind. >>> > >>> > I did some investigation on roadrunner several years ago and >>> determined that the user code issue coll/sync was attempting to fix was due >>> to a bug in ob1/cksum (really can’t remember). coll/sync was simply masking >>> a live-lock problem. I committed a workaround for the bug in r26575 ( >>> https://github.com/open-mpi/ompi/commit/59e529cf1dfe986e40d >>> 14ec4d2a2e5ef0cea5e35) and tested it with the user code. After this >>> change the user code ran fine without coll/sync. Since lanl no longer had >>> any users of coll/sync we stopped supporting it. >>> > >>> >> I solved this back then by introducing the “sync” component in >>> ompi/mca/coll, which injected a barrier operation every N collectives. You >>> could even “tune” it by doing the injection for only specific collectives. >>> >> >>> >> However, I can no longer find that component in the code base - I >>> find it in the 1.6 series, but someone removed it during the 1.7 series. >>> >> >>> >> Can someone tell me why this was done??? Is there any reason not to >>> bring it back? It solves a very real, not uncommon, problem. >>> >> Ralph >>> > >>> > This was discussed during one (or several) tel-cons years ago. We >>> agreed to kill it and bring it back if there is 1) a use case, and 2) >>> someone is willing to support it. See https://github.com/open-mpi/om >>> pi/commit/5451ee46bd6fcdec002b333474dec919475d2d62 . >>> > >>> > Can you link the user email? >>> > >>> > -Nathan >>> > _______________________________________________ >>> > devel mailing list >>> > devel@lists.open-mpi.org >>> > https://rfd.newmexicoconsortium.org/mailman/listinfo/devel >>> >>> _______________________________________________ >>> devel mailing list >>> devel@lists.open-mpi.org >>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel >>> >> >> _______________________________________________ >> devel mailing list >> devel@lists.open-mpi.org >> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel >> >> >> _______________________________________________ > devel mailing list > devel@lists.open-mpi.org > <javascript:_e(%7B%7D,'cvml','devel@lists.open-mpi.org');> > https://rfd.newmexicoconsortium.org/mailman/listinfo/devel > > >
_______________________________________________ devel mailing list devel@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/devel