Gilles,

You should also remove the reduce_scatter from the sync (it has an implicit
synchronization).

  George.


On Mon, Aug 22, 2016 at 10:02 AM, r...@open-mpi.org <r...@open-mpi.org> wrote:

> If you see ways to improve it, you are welcome to do so.
>
> On Aug 22, 2016, at 12:30 AM, Gilles Gouaillardet <gil...@rist.or.jp>
> wrote:
>
> Folks,
>
>
> i was reviewing the sources of the coll/sync module, and
>
>
> 1) i noticed the same pattern is used in *every* sources :
>
>     if (s->in_operation) {
>         return s->c_coll.coll_xxx(...);
>     } else {
>         COLL_SYNC(s, s->c_coll.coll_xxx(...));
>
>   }
>
> is there any rationale for not moving the if(s->in_operation) test into
> the COLL_SYNC macro ?
>
>
> 2) i could not find a rationale for using s->in_operation :
> - if a barrier must be performed, the barrier of the underlying module
> (e.g. coll/tuned) is directly invoked, so coll/sync is not somehow reentrant
> - with MPI_THREAD_MULTIPLE, it is the enduser responsability that two
> threads never invoke simultaneously a collective operation on the *same*
> communicator
>   (and s->in_operation is a per-communicator boolean), so i do not see how
> s->in_operation can be true in a valid MPI program.
>
>
> Though the first point can be seen as a "matter of style", i am pretty
> curious about the second one.
>
>
> Cheers,
>
>
> Gilles
>
> On 8/21/2016 3:44 AM, George Bosilca wrote:
>
> Ralph,
>
> Bringing back the coll/sync is a cheap shot at hiding a real issue behind
> a smoke curtain. As Nathan described in his email, Open MPI lacks of
> control flow on eager messages is the real culprit here, and the loop
> around any one-to-many collective (bcast and scatter*) was only helping to
> exacerbate the issue. However, doing a loop around a small MPI_Send will
> also end on a memory exhaustion issue, one that would not be easily
> circumvented by adding synchronizations deep inside the library.
>
>   George.
>
>
> On Sat, Aug 20, 2016 at 12:30 AM, r...@open-mpi.org <r...@open-mpi.org>
> wrote:
>
>> I can not provide the user report as it is a proprietary problem.
>> However, it consists of a large loop of calls to MPI_Bcast that crashes due
>> to unexpected messages. We have been looking at instituting flow control,
>> but that has way too widespread an impact. The coll/sync component would be
>> a simple solution.
>>
>> I honestly don’t believe the issue I was resolving was due to a bug - it
>> was a simple problem of one proc running slow and creating an overload of
>> unexpected messages that eventually consumed too much memory. Rather, I
>> think you solved a different problem - by the time you arrived at LANL, the
>> app I was working with had already modified their code to no longer create
>> the problem (essentially refactoring the algorithm to avoid the massive
>> loop over allreduce).
>>
>> I have no issue supporting it as it takes near-zero effort to maintain,
>> and this is a fairly common problem with legacy codes that don’t want to
>> refactor their algorithms.
>>
>>
>> > On Aug 19, 2016, at 8:48 PM, Nathan Hjelm <hje...@me.com> wrote:
>> >
>> >> On Aug 19, 2016, at 4:24 PM, r...@open-mpi.org wrote:
>> >>
>> >> Hi folks
>> >>
>> >> I had a question arise regarding a problem being seen by an OMPI user
>> - has to do with the old bugaboo I originally dealt with back in my LANL
>> days. The problem is with an app that repeatedly hammers on a collective,
>> and gets overwhelmed by unexpected messages when one of the procs falls
>> behind.
>> >
>> > I did some investigation on roadrunner several years ago and determined
>> that the user code issue coll/sync was attempting to fix was due to a bug
>> in ob1/cksum (really can’t remember). coll/sync was simply masking a
>> live-lock problem. I committed a workaround for the bug in r26575 (
>> https://github.com/open-mpi/ompi/commit/59e529cf1dfe986e40d
>> 14ec4d2a2e5ef0cea5e35) and tested it with the user code. After this
>> change the user code ran fine without coll/sync. Since lanl no longer had
>> any users of coll/sync we stopped supporting it.
>> >
>> >> I solved this back then by introducing the “sync” component in
>> ompi/mca/coll, which injected a barrier operation every N collectives. You
>> could even “tune” it by doing the injection for only specific collectives.
>> >>
>> >> However, I can no longer find that component in the code base - I find
>> it in the 1.6 series, but someone removed it during the 1.7 series.
>> >>
>> >> Can someone tell me why this was done??? Is there any reason not to
>> bring it back? It solves a very real, not uncommon, problem.
>> >> Ralph
>> >
>> > This was discussed during one (or several) tel-cons years ago. We
>> agreed to kill it and bring it back if there is 1) a use case, and 2)
>> someone is willing to support it. See https://github.com/open-mpi/om
>> pi/commit/5451ee46bd6fcdec002b333474dec919475d2d62 .
>> >
>> > Can you link the user email?
>> >
>> > -Nathan
>> > _______________________________________________
>> > devel mailing list
>> > devel@lists.open-mpi.org
>> > https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>>
>> _______________________________________________
>> devel mailing list
>> devel@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>>
>
>
>
> _______________________________________________
> devel mailing 
> listde...@lists.open-mpi.orghttps://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>
>
> _______________________________________________
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>
>
>
> _______________________________________________
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>
_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Reply via email to