Hi Rolf,

yes, same issue ...

i attached a patch to the github issue ( the issue might be in the test).

From the standards (11.5 Synchronization Calls) :
"TheMPI_WIN_FENCE collective synchronization call supports a simple synchroniza- tion pattern that is often used in parallel computations: namely a loosely-synchronous model, where global computation phases alternate with global communication phases."

as far as i understand (disclaimer, i am *not* good at reading standards ...) this is not necessarily an MPI_Barrier, so there is a race condition in the test case that can be avoided
by adding an MPI_Barrier after initializing RecvBuff.

could someone (Jeff ? George ?) please double check this before i push a fix into ompi-tests repo ?

Cheers,

Gilles

On 4/20/2015 10:19 PM, Rolf vandeVaart wrote:

Hi Gilles:

Is your failure similar to this ticket?

https://github.com/open-mpi/ompi/issues/393

Rolf

*From:*devel [mailto:devel-boun...@open-mpi.org] *On Behalf Of *Gilles Gouaillardet
*Sent:* Monday, April 20, 2015 9:12 AM
*To:* Open MPI Developers
*Subject:* [OMPI devel] c_accumulate

Folks,

i (sometimes) get some failure with the c_accumulate test from the ibm test suite on one host with 4 mpi tasks

so far, i was only able to observe this on linux/sparc with the vader btl

here is a snippet of the test :

MPI_Win_create(&RecvBuff, sizeOfInt, 1, MPI_INFO_NULL,
                 MPI_COMM_WORLD, &Win);
SendBuff = rank + 100;
   RecvBuff = 0;
/* Accumulate to everyone, just for the heck of it */ MPI_Win_fence(MPI_MODE_NOPRECEDE, Win);
   for (i = 0; i < size; ++i)
     MPI_Accumulate(&SendBuff, 1, MPI_INT, i, 0, 1, MPI_INT, MPI_SUM, Win);
   MPI_Win_fence((MPI_MODE_NOPUT | MPI_MODE_NOSUCCEED), Win);

when the test fails, RecvBuff in (rank+100) instead of the accumulated value (100 * nprocs + (nprocs -1)*nprocs/2

i am not familiar with onesided operations nor MPI_Win_fence.

that being said, i found suspicious RecvBuff is initialized *after* MPI_Win_create ...

does MPI_Win_fence implies MPI_Barrier ?

if not, i guess RecvBuff should be initialized *before* MPI_Win_create.

makes sense ?

(and if it does make sense, then this issue is not related to sparc, and vader is not the root cause)

Cheers,

Gilles

------------------------------------------------------------------------
This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
------------------------------------------------------------------------


_______________________________________________
devel mailing list
de...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: 
http://www.open-mpi.org/community/lists/devel/2015/04/17272.php

Reply via email to