Kawashima-san,

i am confused ...

as you wrote :

In the MPI_MODE_NOPRECEDE case, a barrier is not necessary
in the MPI implementation to end access/exposure epochs.


and the test case calls MPI_Win_fence with MPI_MODE_NOPRECEDE.

are you saying Open MPI implementation of MPI_Win_fence should perform
a barrier in this case (e.g. MPI_MODE_NOPRECEDE) ?

Cheers,

Gilles

On 4/21/2015 11:08 AM, Kawashima, Takahiro wrote:
Hi Gilles, Nathan,

No, my conclusion is that the MPI program does not need a MPI_Barrier
but MPI implementations need some synchronizations.

Thanks,
Takahiro Kawashima,

Kawashima-san,

Nathan reached the same conclusion (see the github issue) and i fixed
the test
by manually adding a MPI_Barrier.

Cheers,

Gilles

On 4/21/2015 10:20 AM, Kawashima, Takahiro wrote:
Hi Gilles, Nathan,

I read the MPI standard but I think the standard doesn't
require a barrier in the test program.

>From the standards (11.5.1 Fence) :

      A fence call usually entails a barrier synchronization:
    a process completes a call to MPI_WIN_FENCE only after all
    other processes in the group entered their matching call.
    However, a call to MPI_WIN_FENCE that is known not to end
    any epoch (in particular, a call with assert equal to
    MPI_MODE_NOPRECEDE) does not necessarily act as a barrier.

This sentence is misleading.

In the non-MPI_MODE_NOPRECEDE case, a barrier is necessary
in the MPI implementation to end access/exposure epochs.

In the MPI_MODE_NOPRECEDE case, a barrier is not necessary
in the MPI implementation to end access/exposure epochs.
Also, a *global* barrier is not necessary in the MPI
implementation to start access/exposure epochs. But some
synchronizations are still needed to start an exposure epoch.

For example, let's assume all ranks call MPI_WIN_FENCE(MPI_MODE_NOPRECEDE)
and then rank 0 calls MPI_PUT to rank 1. In this case, rank 0
can access the window on rank 1 before rank 2 or others
call MPI_WIN_FENCE. (But rank 0 must wait rank 1's MPI_WIN_FENCE.)
I think this is the intent of the sentence in the MPI standard
cited above.

Thanks,
Takahiro Kawashima

Hi Rolf,

yes, same issue ...

i attached a patch to the github issue ( the issue might be in the test).

   From the standards (11.5 Synchronization Calls) :
"TheMPI_WIN_FENCE collective synchronization call supports a simple
synchroniza-
tion pattern that is often used in parallel computations: namely a
loosely-synchronous
model, where global computation phases alternate with global
communication phases."

as far as i understand (disclaimer, i am *not* good at reading standards
...) this is not
necessarily an MPI_Barrier, so there is a race condition in the test
case that can be avoided
by adding an MPI_Barrier after initializing RecvBuff.

could someone (Jeff ? George ?) please double check this before i push a
fix into ompi-tests repo ?

Cheers,

Gilles

On 4/20/2015 10:19 PM, Rolf vandeVaart wrote:
Hi Gilles:

Is your failure similar to this ticket?

https://github.com/open-mpi/ompi/issues/393

Rolf

*From:*devel [mailto:devel-boun...@open-mpi.org] *On Behalf Of *Gilles
Gouaillardet
*Sent:* Monday, April 20, 2015 9:12 AM
*To:* Open MPI Developers
*Subject:* [OMPI devel] c_accumulate

Folks,

i (sometimes) get some failure with the c_accumulate test from the ibm
test suite on one host with 4 mpi tasks

so far, i was only able to observe this on linux/sparc with the vader btl

here is a snippet of the test :

MPI_Win_create(&RecvBuff, sizeOfInt, 1, MPI_INFO_NULL,
                   MPI_COMM_WORLD, &Win);
SendBuff = rank + 100;
     RecvBuff = 0;
/* Accumulate to everyone, just for the heck of it */ MPI_Win_fence(MPI_MODE_NOPRECEDE, Win);
     for (i = 0; i < size; ++i)
       MPI_Accumulate(&SendBuff, 1, MPI_INT, i, 0, 1, MPI_INT, MPI_SUM, Win);
     MPI_Win_fence((MPI_MODE_NOPUT | MPI_MODE_NOSUCCEED), Win);

when the test fails, RecvBuff in (rank+100) instead of the accumulated
value (100 * nprocs + (nprocs -1)*nprocs/2

i am not familiar with onesided operations nor MPI_Win_fence.

that being said, i found suspicious RecvBuff is initialized *after*
MPI_Win_create ...

does MPI_Win_fence implies MPI_Barrier ?

if not, i guess RecvBuff should be initialized *before* MPI_Win_create.

makes sense ?

(and if it does make sense, then this issue is not related to sparc,
and vader is not the root cause)
_______________________________________________
devel mailing list
de...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: 
http://www.open-mpi.org/community/lists/devel/2015/04/17292.php



Reply via email to