Re: [OMPI users] Memory mapped memory

2011-10-17 Thread James Dinan
Sure, this is possible and generally works, although it is not defined
by the MPI standard.  Regular shared memory rules apply: you may have to
add additional memory consistency and/or synchronization calls depending
on your platform to ensure that MPI sees intended data updates.

Best,
 ~Jim.

On 10/17/2011 09:09 AM, Durga Choudhury wrote:
> If the mmap() pages are created with MAP_SHARED, then they should be
> sharable with other processes in the same node, isn't it? MPI
> processes are just like any other process, aren't they? Will one of
> the MPI Gurus please comment?
> 
> Regards
> Durga
> 
> 
> On Mon, Oct 17, 2011 at 9:45 AM, Gabriele Fatigati  
> wrote:
>> More in detail,
>> is it possible use mmap() function from MPI process and sharing these memory
>> between others processes?
>>
>> 2011/10/13 Gabriele Fatigati 
>>>
>>> Dear OpenMPI users and developers,
>>> is there some limitation or issues to use memory mapped memory into MPI
>>> processes? I would like to share some memory in a node without using OpenM.
>>> Thanks a lot.
>>>
>>> --
>>> Ing. Gabriele Fatigati
>>>
>>> HPC specialist
>>>
>>> SuperComputing Applications and Innovation Department
>>>
>>> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
>>>
>>> www.cineca.itTel:   +39 051 6171722
>>>
>>> g.fatigati [AT] cineca.it
>>
>>
>>
>> --
>> Ing. Gabriele Fatigati
>>
>> HPC specialist
>>
>> SuperComputing Applications and Innovation Department
>>
>> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
>>
>> www.cineca.itTel:   +39 051 6171722
>>
>> g.fatigati [AT] cineca.it
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] MPI one-sided passive synchronization.

2011-04-13 Thread James Dinan

Sudheer,

Locks in MPI don't mean mutexes, they mark the beginning and end of a 
passive mode communication epoch.  All MPI operations within an epoch 
logically occur concurrently and must be non-conflicting.  So, what 
you're written below is incorrect: the get is not guaranteed to complete 
until the call to unlock.  Because of this, it conflicts with the 
ensuing call to MPI_Accumulate, which is an error.


I don't share your pessimism about MPI-2 RMA asynchronous progress.  As 
Brian hinted, the standard says you should get progress without making 
MPI calls.  I think you might be getting tripped up by the poorly named 
MPI_Lock/Unlock calls.  These aren't like mutexes and can't be used to 
ensure exclusive data access for read-modify-write operations (like in 
your example).  In order to do that, you need an actual mutex, which can 
be implemented on top of MPI-2 RMA (I can provide reference if you need 
it, I'm sure the code is available somewhere in MPI tests/examples too).


Best,
 ~Jim.

On 04/13/2011 03:11 PM, Abhishek Kulkarni wrote:

But given the existing behavior, all bets are off and it renders passive
synchronization
(MPI_Win_unlock) mostly similar to active synchronization (MPI_Win_fence).
In trying to emulate a distributed shared memory model, I was hoping to
do things
like:

MPI_Win_lock(MPI_LOCK_EXCLUSIVE, 0, 0, win);
MPI_Get(, 1, MPI_INT, 0, 0, 1, MPI_INT, win);
if (out % 2 == 0)
  out++;
MPI_Accumulate(, 1, MPI_INT, 0, 0, 1, MPI_INT, MPI_REPLACE, win);
MPI_Win_unlock(0, win);

but it is impossible to implement such atomic sections given no semantic
guarantees
on ordering of the RMA calls.


Re: [OMPI users] nonblock alternative to MPI_Win_complete

2011-02-24 Thread James Dinan

Hi Toon,

Can you use non-blocking send/recv?  It sounds like this will give you 
the completion semantics you want.


Best,
 ~Jim.

On 2/24/11 6:07 AM, Toon Knapen wrote:

In that case, I have a small question concerning design:
Suppose task-based parallellism where one node (master) distributes
work/tasks to 2 other nodes (slaves) by means of an MPI_Put. The master
allocates 2 buffers locally in which it will store all necessary data
that is needed by the slave to perform the task. So I do an MPI_Put on
each of my 2 buffers to send each buffer to a specific slave. Now I need
to know when I can reuse one of my buffers to already store the next
task (that I will MPI_Put later on). The only way to know this is call
MPI_Complete. But since this is blocking and if this buffer is not ready
to be reused yet, I can neither verify if the other buffer is already
available to me again (in the same thread).
I would very much appreciate input on how to solve such issue !
thanks in advance,
toon
On Tue, Feb 22, 2011 at 7:21 PM, Barrett, Brian W > wrote:

On Feb 18, 2011, at 8:59 AM, Toon Knapen wrote:

 > (Probably this issue has been discussed at length before but
unfortunately I did not find any threads (on this site or anywhere
else) on this topic, if you are able to provide me with links to
earlier discussions on this topic, please do not hesitate)
 >
 > Is there an alternative to MPI_Win_complete that does not
'enforce completion of preceding RMS calls at the origin' (as said
on pag 353 of the mpi-2.2 standard) ?
 >
 > I would like to know if I can reuse the buffer I gave to MPI_Put
but without blocking on it, if the MPI lib is still using it, I want
to be able to continue (and use another buffer).


There is not.   MPI_Win_complete is the only way to finish a
MPI_Win_start epoch, and is always blocking until local completion
of all messages started during the epoch.

Brian

--
  Brian W. Barrett
  Dept. 1423: Scalable System Software
  Sandia National Laboratories



___
users mailing list
us...@open-mpi.org 
http://www.open-mpi.org/mailman/listinfo.cgi/users




___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Using MPI_Put/Get correctly?

2010-12-16 Thread James Dinan
On 12/16/2010 08:34 AM, Jeff Squyres wrote:
> Additionally, since MPI-3 is updating the semantics of the one-sided
> stuff, it might be worth waiting for all those clarifications before
> venturing into the MPI one-sided realm.  One-sided semantics are much
> more subtle and complex than two-sided semantics.

Hi Jeff,

I don't think we should give users the hope that MPI-3 RMA will be out
tomorrow.  The RMA revisions are still in proposal form and need work.
Realistically speaking, we might be able to get this accepted into the
standard within a year and it will be another year before
implementations catch up.  If users need one-sided now, they should use
the MPI-2 one-sided API.

MPI-3 RMA extends MPI-2 RMA and will be backward compatible, so anything
you write now will still work.  It's still unclear to me whether MPI-3's
RMA semantics will be the leap forward in usability we have hoped for.
We are trying to make it more flexible, but there will likely still be
tricky parts due to portability and performance concerns.

So, my advice: don't be scared of MPI-2.  I agree, it's complicated, but
once you get acclimated it's not that bad.  Really.  :)

Best,
 ~Jim.


Re: [OMPI users] One-sided datatype errors

2010-12-14 Thread James Dinan

Hi Rolf,

Thanks for your help.  I also noticed trouble with subarray data types. 
 I attached the same test again, but with subarray rather than indexed 
types.  It works correctly with MVAPICH on IB, but fails with OpenMPI 
1.5 with the following message:


$ mpiexec -n 2 ./a.out
MPI RMA Strided Accumulate Test:
[f3:1747] *** An error occurred in MPI_Accumlate
[f3:1747] *** on win 3
[f3:1747] *** MPI_ERR_TYPE: invalid datatype
[f3:1747] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)

Thanks,
 ~Jim.

On 12/14/2010 09:05 AM, Rolf vandeVaart wrote:

Hi James:
I can reproduce the problem on a single node with Open MPI 1.5 and the
trunk. I have submitted a ticket with
the information.

https://svn.open-mpi.org/trac/ompi/ticket/2656

Rolf

On 12/13/10 18:44, James Dinan wrote:

Hi,

I'm getting strange behavior using datatypes in a one-sided
MPI_Accumulate operation.

The attached example performs an accumulate into a patch of a shared
2d matrix. It uses indexed datatypes and can be built with
displacement or absolute indices (hindexed) - both cases fail. I'm
seeing data validation errors, hanging, or other erroneous behavior
under OpenMPI 1.5 on Infiniband. The example works correctly under the
current release of MVAPICH on IB and under MPICH on shared memory.

Any help would be greatly appreciated.

Best,
~Jim.


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


/* One-Sided MPI 2-D Strided Accumulate Test
 *
 * Author: James Dinan <di...@mcs.anl.gov> 
 * Date  : December, 2010
 *
 * This code performs N accumulates into a 2d patch of a shared array.  The
 * array has dimensions [X, Y] and the subarray has dimensions [SUB_X, SUB_Y]
 * and begins at index [0, 0].  The input and output buffers are specified
 * using an MPI subarray type.
 */

#include 
#include 
#include 

#define XDIM 1024 
#define YDIM 1024
#define SUB_XDIM 512
#define SUB_YDIM 512
#define ITERATIONS 10

int main(int argc, char **argv) {
int i, j, rank, nranks, peer, bufsize, errors;
double *win_buf, *src_buf;
MPI_Win buf_win;

MPI_Init(, );

MPI_Comm_rank(MPI_COMM_WORLD, );
MPI_Comm_size(MPI_COMM_WORLD, );

bufsize = XDIM * YDIM * sizeof(double);
MPI_Alloc_mem(bufsize, MPI_INFO_NULL, _buf);
MPI_Alloc_mem(bufsize, MPI_INFO_NULL, _buf);

if (rank == 0)
printf("MPI RMA Strided Accumulate Test:\n");

for (i = 0; i < XDIM*YDIM; i++) {
*(win_buf  + i) = 1.0 + rank;
*(src_buf + i) = 1.0 + rank;
}

MPI_Win_create(win_buf, bufsize, 1, MPI_INFO_NULL, MPI_COMM_WORLD, 
_win);

peer = (rank+1) % nranks;

// Perform ITERATIONS strided accumulate operations

for (i = 0; i < ITERATIONS; i++) {
  int ndims   = 2;
  int src_arr_sizes[2]= { XDIM, YDIM };
  int src_arr_subsizes[2] = { SUB_XDIM, SUB_YDIM };
  int src_arr_starts[2]   = {0,0 };
  int dst_arr_sizes[2]= { XDIM, YDIM };
  int dst_arr_subsizes[2] = { SUB_XDIM, SUB_YDIM };
  int dst_arr_starts[2]   = {0,0 };
  MPI_Datatype src_type, dst_type;

  MPI_Type_create_subarray(ndims, src_arr_sizes, src_arr_subsizes, 
src_arr_starts,
  MPI_ORDER_C, MPI_DOUBLE, _type);

  MPI_Type_create_subarray(ndims, dst_arr_sizes, dst_arr_subsizes, 
dst_arr_starts,
  MPI_ORDER_C, MPI_DOUBLE, _type);

  MPI_Type_commit(_type);
  MPI_Type_commit(_type);

  MPI_Win_lock(MPI_LOCK_EXCLUSIVE, peer, 0, buf_win);

  MPI_Accumulate(src_buf, 1, src_type, peer, 0, 1, dst_type, MPI_SUM, 
buf_win);

  MPI_Win_unlock(peer, buf_win);

  MPI_Type_free(_type);
  MPI_Type_free(_type);
}

MPI_Barrier(MPI_COMM_WORLD);

// Verify that the results are correct

MPI_Win_lock(MPI_LOCK_EXCLUSIVE, rank, 0, buf_win);
errors = 0;
for (i = 0; i < SUB_XDIM; i++) {
  for (j = 0; j < SUB_YDIM; j++) {
const double actual   = *(win_buf + i + j*XDIM);
const double expected = (1.0 + rank) + (1.0 + ((rank+nranks-1)%nranks)) 
* (ITERATIONS);
if (actual - expected > 1e-10) {
  printf("%d: Data validation failed at [%d, %d] expected=%f 
actual=%f\n",
  rank, j, i, expected, actual);
  errors++;
  fflush(stdout);
}
  }
}
for (i = SUB_XDIM; i < XDIM; i++) {
  for (j = 0; j < SUB_YDIM; j++) {
const double actual   = *(win_buf + i + j*XDIM);
const double expected = 1.0 + rank;
if (actual - expected > 1e-10) {
  printf("%d: Data validation failed at [%d, %d] expected=%f 
actual=%f\n",
  rank, j, i, expected, actual);
  errors++;
   

[OMPI users] One-sided datatype errors

2010-12-13 Thread James Dinan

Hi,

I'm getting strange behavior using datatypes in a one-sided 
MPI_Accumulate operation.


The attached example performs an accumulate into a patch of a shared 2d 
matrix.  It uses indexed datatypes and can be built with displacement or 
absolute indices (hindexed) - both cases fail.  I'm seeing data 
validation errors, hanging, or other erroneous behavior under OpenMPI 
1.5 on Infiniband.  The example works correctly under the current 
release of MVAPICH on IB and under MPICH on shared memory.


Any help would be greatly appreciated.

Best,
 ~Jim.
/* One-Sided MPI 2-D Strided Accumulate Test
 *
 * Author: James Dinan <di...@mcs.anl.gov> 
 * Date  : December, 2010
 *
 * This code performs N accumulates into a 2d patch of a shared array.  The
 * array has dimensions [X, Y] and the subarray has dimensions [SUB_X, SUB_Y]
 * and begins at index [0, 0].  The input and output buffers are specified
 * using an MPI indexed type.
 */

#include 
#include 
#include 

#define XDIM 16
#define YDIM 16
#define SUB_XDIM 8
#define SUB_YDIM 8
#define ITERATIONS 1

int main(int argc, char **argv) {
int i, j, rank, nranks, peer, bufsize, errors;
double *win_buf, *src_buf;
MPI_Win buf_win;

MPI_Init(, );

MPI_Comm_rank(MPI_COMM_WORLD, );
MPI_Comm_size(MPI_COMM_WORLD, );

bufsize = XDIM * YDIM * sizeof(double);
MPI_Alloc_mem(bufsize, MPI_INFO_NULL, _buf);
MPI_Alloc_mem(bufsize, MPI_INFO_NULL, _buf);

if (rank == 0)
printf("MPI RMA Strided Accumulate Test:\n");

for (i = 0; i < XDIM*YDIM; i++) {
*(win_buf  + i) = 1.0 + rank;
*(src_buf + i) = 1.0 + rank;
}

MPI_Win_create(win_buf, bufsize, 1, MPI_INFO_NULL, MPI_COMM_WORLD, _win);

peer = (rank+1) % nranks;

// Perform ITERATIONS strided accumulate operations

for (i = 0; i < ITERATIONS; i++) {
  MPI_Aint idx_loc[SUB_YDIM];
  int idx_rem[SUB_YDIM];
  int blk_len[SUB_YDIM];
  MPI_Datatype src_type, dst_type;

  for (i = 0; i < SUB_YDIM; i++) {
MPI_Get_address(_buf[i*XDIM], _loc[i]);
idx_rem[i] = i*XDIM;
blk_len[i] = SUB_XDIM;
  }

#ifdef ABSOLUTE
  MPI_Type_hindexed(SUB_YDIM, blk_len, idx_loc, MPI_DOUBLE, _type);
#else
  MPI_Type_indexed(SUB_YDIM, blk_len, idx_rem, MPI_DOUBLE, _type);
#endif
  MPI_Type_indexed(SUB_YDIM, blk_len, idx_rem, MPI_DOUBLE, _type);

  MPI_Type_commit(_type);
  MPI_Type_commit(_type);

  MPI_Win_lock(MPI_LOCK_EXCLUSIVE, peer, 0, buf_win);

#ifdef ABSOLUTE
  MPI_Accumulate(MPI_BOTTOM, 1, src_type, peer, 0, 1, dst_type, MPI_SUM, buf_win);
#else
  MPI_Accumulate(src_buf, 1, src_type, peer, 0, 1, dst_type, MPI_SUM, buf_win);
#endif

  MPI_Win_unlock(peer, buf_win);

  MPI_Type_free(_type);
  MPI_Type_free(_type);
}

MPI_Barrier(MPI_COMM_WORLD);

// Verify that the results are correct

MPI_Win_lock(MPI_LOCK_EXCLUSIVE, rank, 0, buf_win);
errors = 0;
for (i = 0; i < SUB_XDIM; i++) {
  for (j = 0; j < SUB_YDIM; j++) {
const double actual   = *(win_buf + i + j*XDIM);
const double expected = (1.0 + rank) + (1.0 + ((rank+nranks-1)%nranks)) * (ITERATIONS);
if (actual - expected > 1e-10) {
  printf("%d: Data validation failed at [%d, %d] expected=%f actual=%f\n",
  rank, j, i, expected, actual);
  errors++;
  fflush(stdout);
}
  }
}
for (i = SUB_XDIM; i < XDIM; i++) {
  for (j = 0; j < SUB_YDIM; j++) {
const double actual   = *(win_buf + i + j*XDIM);
const double expected = 1.0 + rank;
if (actual - expected > 1e-10) {
  printf("%d: Data validation failed at [%d, %d] expected=%f actual=%f\n",
  rank, j, i, expected, actual);
  errors++;
  fflush(stdout);
}
  }
}
for (i = 0; i < XDIM; i++) {
  for (j = SUB_YDIM; j < YDIM; j++) {
const double actual   = *(win_buf + i + j*XDIM);
const double expected = 1.0 + rank;
if (actual - expected > 1e-10) {
  printf("%d: Data validation failed at [%d, %d] expected=%f actual=%f\n",
  rank, j, i, expected, actual);
  errors++;
  fflush(stdout);
}
  }
}
MPI_Win_unlock(rank, buf_win);

MPI_Win_free(_win);
MPI_Free_mem(win_buf);
MPI_Free_mem(src_buf);

MPI_Finalize();

if (errors == 0) {
  printf("%d: Success\n", rank);
  return 0;
} else {
  printf("%d: Fail\n", rank);
  return 1;
}
}