I got your point, I was just pointing to an other option "in the meantime".

fwiw, the bits of code I posted earlier can be put in a shared library that
is LD_PRELOAD'ed,
so the application is kept unmodified.

Cheers,

Gilles

On Sunday, August 21, 2016, r...@open-mpi.org <r...@open-mpi.org> wrote:

> As I said earlier, modifying these legacy apps is not a desirable
> solution. The coll/sync component was developed specifically to alleviate
> these problems in an acceptable manner, albeit not optimal. Performance, in
> this case, is secondary to just getting the app to run.
>
>
> On Aug 20, 2016, at 7:38 PM, Gilles Gouaillardet <
> gilles.gouaillar...@gmail.com
> <javascript:_e(%7B%7D,'cvml','gilles.gouaillar...@gmail.com');>> wrote:
>
> Ralph,
>
> in the meantime, and if not done already, your user can simply redefine
> MPI_Bcast in the app.
>
>
> int MPI_Bcast(void *buffer, int count, MPI_Datatype type, int root,
> MPI_Comm comm) {
>     PMPI_Barrier(comm);
>     return PMPI_Bcast(buffer, count, datatype, root, comm);
> }
>
> the root causes are
> - no control flow in Open MPI for eager messages (as explained by George)
> and
> - some processes are much slower than others.
>
> so even if Open MPI provides a fix or workaround, the end user will be
> left with some important load imbalance, which is far from being optimal
> from his/her performance point of view.
>
>
> Cheers,
>
> Gilles
>
> On Sunday, August 21, 2016, r...@open-mpi.org
> <javascript:_e(%7B%7D,'cvml','r...@open-mpi.org');> <r...@open-mpi.org
> <javascript:_e(%7B%7D,'cvml','r...@open-mpi.org');>> wrote:
>
>> I don’t disagree with anything you said - however, this problem has been
>> reported in our library for more than a decade (goes way back into the old
>> Trac days), and has yet to be resolved. Meantime, we have a user that is
>> “down” and needs a solution. Whether it is a “cheap shot” or not is
>> irrelevant to them.
>>
>> I’ll leave it to you deeper MPI wonks to solve the problem correctly :-)
>> When you have done so, I will happily remove the coll/sync component and
>> tell the user “all has been resolved”.
>>
>>
>> RROn Aug 20, 2016, at 11:44 AM, George Bosilca <bosi...@icl.utk.edu>
>> wrote:
>>
>> Ralph,
>>
>> Bringing back the coll/sync is a cheap shot at hiding a real issue behind
>> a smoke curtain. As Nathan described in his email, Open MPI lacks of
>> control flow on eager messages is the real culprit here, and the loop
>> around any one-to-many collective (bcast and scatter*) was only helping to
>> exacerbate the issue. However, doing a loop around a small MPI_Send will
>> also end on a memory exhaustion issue, one that would not be easily
>> circumvented by adding synchronizations deep inside the library.
>>
>>   George.
>>
>>
>> On Sat, Aug 20, 2016 at 12:30 AM, r...@open-mpi.org <r...@open-mpi.org>
>> wrote:
>>
>>> I can not provide the user report as it is a proprietary problem.
>>> However, it consists of a large loop of calls to MPI_Bcast that crashes due
>>> to unexpected messages. We have been looking at instituting flow control,
>>> but that has way too widespread an impact. The coll/sync component would be
>>> a simple solution.
>>>
>>> I honestly don’t believe the issue I was resolving was due to a bug - it
>>> was a simple problem of one proc running slow and creating an overload of
>>> unexpected messages that eventually consumed too much memory. Rather, I
>>> think you solved a different problem - by the time you arrived at LANL, the
>>> app I was working with had already modified their code to no longer create
>>> the problem (essentially refactoring the algorithm to avoid the massive
>>> loop over allreduce).
>>>
>>> I have no issue supporting it as it takes near-zero effort to maintain,
>>> and this is a fairly common problem with legacy codes that don’t want to
>>> refactor their algorithms.
>>>
>>>
>>> > On Aug 19, 2016, at 8:48 PM, Nathan Hjelm <hje...@me.com> wrote:
>>> >
>>> >> On Aug 19, 2016, at 4:24 PM, r...@open-mpi.org wrote:
>>> >>
>>> >> Hi folks
>>> >>
>>> >> I had a question arise regarding a problem being seen by an OMPI user
>>> - has to do with the old bugaboo I originally dealt with back in my LANL
>>> days. The problem is with an app that repeatedly hammers on a collective,
>>> and gets overwhelmed by unexpected messages when one of the procs falls
>>> behind.
>>> >
>>> > I did some investigation on roadrunner several years ago and
>>> determined that the user code issue coll/sync was attempting to fix was due
>>> to a bug in ob1/cksum (really can’t remember). coll/sync was simply masking
>>> a live-lock problem. I committed a workaround for the bug in r26575 (
>>> https://github.com/open-mpi/ompi/commit/59e529cf1dfe986e40d
>>> 14ec4d2a2e5ef0cea5e35) and tested it with the user code. After this
>>> change the user code ran fine without coll/sync. Since lanl no longer had
>>> any users of coll/sync we stopped supporting it.
>>> >
>>> >> I solved this back then by introducing the “sync” component in
>>> ompi/mca/coll, which injected a barrier operation every N collectives. You
>>> could even “tune” it by doing the injection for only specific collectives.
>>> >>
>>> >> However, I can no longer find that component in the code base - I
>>> find it in the 1.6 series, but someone removed it during the 1.7 series.
>>> >>
>>> >> Can someone tell me why this was done??? Is there any reason not to
>>> bring it back? It solves a very real, not uncommon, problem.
>>> >> Ralph
>>> >
>>> > This was discussed during one (or several) tel-cons years ago. We
>>> agreed to kill it and bring it back if there is 1) a use case, and 2)
>>> someone is willing to support it. See https://github.com/open-mpi/om
>>> pi/commit/5451ee46bd6fcdec002b333474dec919475d2d62 .
>>> >
>>> > Can you link the user email?
>>> >
>>> > -Nathan
>>> > _______________________________________________
>>> > devel mailing list
>>> > devel@lists.open-mpi.org
>>> > https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel@lists.open-mpi.org
>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>>>
>>
>> _______________________________________________
>> devel mailing list
>> devel@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>>
>>
>> _______________________________________________
> devel mailing list
> devel@lists.open-mpi.org
> <javascript:_e(%7B%7D,'cvml','devel@lists.open-mpi.org');>
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>
>
>
_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Reply via email to