If you do that it may run out of resources and deadlock or crash. I recommend either 1) adding a barrier every 100 iterations, 2) using allreduce, or 3) enable coll/sync (which essentially does 1). Honestly, 2 is probably the easiest option and depending on how large you run may not be any slower than 1 or 3.
-Nathan > On Apr 15, 2019, at 9:53 AM, Saliya Ekanayake <[email protected]> wrote: > > Hi Devs, > > When doing MPI_Reduce in a loop (collecting on Rank 0), is it the correct > understanding that ranks other than root (0 in this case) will pass the > collective as soon as their data is written to MPI buffers without waiting > for all of them to be received at the root? > > If that's the case then what would happen (semantically) if we execute > MPI_Reduce in a loop without a barrier allowing non-root ranks to hit the > collective multiple times while the root will be processing an earlier > reduce? For example, the root can be in the first reduce invocation, while > another rank is in the second the reduce invocation. > > Thank you, > Saliya > > -- > Saliya Ekanayake, Ph.D > Postdoctoral Scholar > Performance and Algorithms Research (PAR) Group > Lawrence Berkeley National Laboratory > Phone: 510-486-5772 > > _______________________________________________ > devel mailing list > [email protected] > https://lists.open-mpi.org/mailman/listinfo/devel _______________________________________________ devel mailing list [email protected] https://lists.open-mpi.org/mailman/listinfo/devel
