[OMPI users] One-sided datatype errors

2010-12-13 Thread James Dinan

Hi,

I'm getting strange behavior using datatypes in a one-sided 
MPI_Accumulate operation.


The attached example performs an accumulate into a patch of a shared 2d 
matrix.  It uses indexed datatypes and can be built with displacement or 
absolute indices (hindexed) - both cases fail.  I'm seeing data 
validation errors, hanging, or other erroneous behavior under OpenMPI 
1.5 on Infiniband.  The example works correctly under the current 
release of MVAPICH on IB and under MPICH on shared memory.


Any help would be greatly appreciated.

Best,
 ~Jim.
/* One-Sided MPI 2-D Strided Accumulate Test
 *
 * Author: James Dinan  
 * Date  : December, 2010
 *
 * This code performs N accumulates into a 2d patch of a shared array.  The
 * array has dimensions [X, Y] and the subarray has dimensions [SUB_X, SUB_Y]
 * and begins at index [0, 0].  The input and output buffers are specified
 * using an MPI indexed type.
 */

#include 
#include 
#include 

#define XDIM 16
#define YDIM 16
#define SUB_XDIM 8
#define SUB_YDIM 8
#define ITERATIONS 1

int main(int argc, char **argv) {
int i, j, rank, nranks, peer, bufsize, errors;
double *win_buf, *src_buf;
MPI_Win buf_win;

MPI_Init(&argc, &argv);

MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &nranks);

bufsize = XDIM * YDIM * sizeof(double);
MPI_Alloc_mem(bufsize, MPI_INFO_NULL, &win_buf);
MPI_Alloc_mem(bufsize, MPI_INFO_NULL, &src_buf);

if (rank == 0)
printf("MPI RMA Strided Accumulate Test:\n");

for (i = 0; i < XDIM*YDIM; i++) {
*(win_buf  + i) = 1.0 + rank;
*(src_buf + i) = 1.0 + rank;
}

MPI_Win_create(win_buf, bufsize, 1, MPI_INFO_NULL, MPI_COMM_WORLD, &buf_win);

peer = (rank+1) % nranks;

// Perform ITERATIONS strided accumulate operations

for (i = 0; i < ITERATIONS; i++) {
  MPI_Aint idx_loc[SUB_YDIM];
  int idx_rem[SUB_YDIM];
  int blk_len[SUB_YDIM];
  MPI_Datatype src_type, dst_type;

  for (i = 0; i < SUB_YDIM; i++) {
MPI_Get_address(&src_buf[i*XDIM], &idx_loc[i]);
idx_rem[i] = i*XDIM;
blk_len[i] = SUB_XDIM;
  }

#ifdef ABSOLUTE
  MPI_Type_hindexed(SUB_YDIM, blk_len, idx_loc, MPI_DOUBLE, &src_type);
#else
  MPI_Type_indexed(SUB_YDIM, blk_len, idx_rem, MPI_DOUBLE, &src_type);
#endif
  MPI_Type_indexed(SUB_YDIM, blk_len, idx_rem, MPI_DOUBLE, &dst_type);

  MPI_Type_commit(&src_type);
  MPI_Type_commit(&dst_type);

  MPI_Win_lock(MPI_LOCK_EXCLUSIVE, peer, 0, buf_win);

#ifdef ABSOLUTE
  MPI_Accumulate(MPI_BOTTOM, 1, src_type, peer, 0, 1, dst_type, MPI_SUM, buf_win);
#else
  MPI_Accumulate(src_buf, 1, src_type, peer, 0, 1, dst_type, MPI_SUM, buf_win);
#endif

  MPI_Win_unlock(peer, buf_win);

  MPI_Type_free(&src_type);
  MPI_Type_free(&dst_type);
}

MPI_Barrier(MPI_COMM_WORLD);

// Verify that the results are correct

MPI_Win_lock(MPI_LOCK_EXCLUSIVE, rank, 0, buf_win);
errors = 0;
for (i = 0; i < SUB_XDIM; i++) {
  for (j = 0; j < SUB_YDIM; j++) {
const double actual   = *(win_buf + i + j*XDIM);
const double expected = (1.0 + rank) + (1.0 + ((rank+nranks-1)%nranks)) * (ITERATIONS);
if (actual - expected > 1e-10) {
  printf("%d: Data validation failed at [%d, %d] expected=%f actual=%f\n",
  rank, j, i, expected, actual);
  errors++;
  fflush(stdout);
}
  }
}
for (i = SUB_XDIM; i < XDIM; i++) {
  for (j = 0; j < SUB_YDIM; j++) {
const double actual   = *(win_buf + i + j*XDIM);
const double expected = 1.0 + rank;
if (actual - expected > 1e-10) {
  printf("%d: Data validation failed at [%d, %d] expected=%f actual=%f\n",
  rank, j, i, expected, actual);
  errors++;
  fflush(stdout);
}
  }
}
for (i = 0; i < XDIM; i++) {
  for (j = SUB_YDIM; j < YDIM; j++) {
const double actual   = *(win_buf + i + j*XDIM);
const double expected = 1.0 + rank;
if (actual - expected > 1e-10) {
  printf("%d: Data validation failed at [%d, %d] expected=%f actual=%f\n",
  rank, j, i, expected, actual);
  errors++;
  fflush(stdout);
}
  }
}
MPI_Win_unlock(rank, buf_win);

MPI_Win_free(&buf_win);
MPI_Free_mem(win_buf);
MPI_Free_mem(src_buf);

MPI_Finalize();

if (errors == 0) {
  printf("%d: Success\n", rank);
  return 0;
} else {
  printf("%d: Fail\n", rank);
  return 1;
}
}


Re: [OMPI users] One-sided datatype errors

2010-12-14 Thread Rolf vandeVaart

Hi James:
I can reproduce the problem on a single node with Open MPI 1.5 and the 
trunk.  I have submitted a ticket with

the information.

https://svn.open-mpi.org/trac/ompi/ticket/2656

Rolf

On 12/13/10 18:44, James Dinan wrote:

Hi,

I'm getting strange behavior using datatypes in a one-sided 
MPI_Accumulate operation.


The attached example performs an accumulate into a patch of a shared 
2d matrix.  It uses indexed datatypes and can be built with 
displacement or absolute indices (hindexed) - both cases fail.  I'm 
seeing data validation errors, hanging, or other erroneous behavior 
under OpenMPI 1.5 on Infiniband.  The example works correctly under 
the current release of MVAPICH on IB and under MPICH on shared memory.


Any help would be greatly appreciated.

Best,
 ~Jim.


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] One-sided datatype errors

2010-12-14 Thread James Dinan

Hi Rolf,

Thanks for your help.  I also noticed trouble with subarray data types. 
 I attached the same test again, but with subarray rather than indexed 
types.  It works correctly with MVAPICH on IB, but fails with OpenMPI 
1.5 with the following message:


$ mpiexec -n 2 ./a.out
MPI RMA Strided Accumulate Test:
[f3:1747] *** An error occurred in MPI_Accumlate
[f3:1747] *** on win 3
[f3:1747] *** MPI_ERR_TYPE: invalid datatype
[f3:1747] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)

Thanks,
 ~Jim.

On 12/14/2010 09:05 AM, Rolf vandeVaart wrote:

Hi James:
I can reproduce the problem on a single node with Open MPI 1.5 and the
trunk. I have submitted a ticket with
the information.

https://svn.open-mpi.org/trac/ompi/ticket/2656

Rolf

On 12/13/10 18:44, James Dinan wrote:

Hi,

I'm getting strange behavior using datatypes in a one-sided
MPI_Accumulate operation.

The attached example performs an accumulate into a patch of a shared
2d matrix. It uses indexed datatypes and can be built with
displacement or absolute indices (hindexed) - both cases fail. I'm
seeing data validation errors, hanging, or other erroneous behavior
under OpenMPI 1.5 on Infiniband. The example works correctly under the
current release of MVAPICH on IB and under MPICH on shared memory.

Any help would be greatly appreciated.

Best,
~Jim.


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


/* One-Sided MPI 2-D Strided Accumulate Test
 *
 * Author: James Dinan  
 * Date  : December, 2010
 *
 * This code performs N accumulates into a 2d patch of a shared array.  The
 * array has dimensions [X, Y] and the subarray has dimensions [SUB_X, SUB_Y]
 * and begins at index [0, 0].  The input and output buffers are specified
 * using an MPI subarray type.
 */

#include 
#include 
#include 

#define XDIM 1024 
#define YDIM 1024
#define SUB_XDIM 512
#define SUB_YDIM 512
#define ITERATIONS 10

int main(int argc, char **argv) {
int i, j, rank, nranks, peer, bufsize, errors;
double *win_buf, *src_buf;
MPI_Win buf_win;

MPI_Init(&argc, &argv);

MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &nranks);

bufsize = XDIM * YDIM * sizeof(double);
MPI_Alloc_mem(bufsize, MPI_INFO_NULL, &win_buf);
MPI_Alloc_mem(bufsize, MPI_INFO_NULL, &src_buf);

if (rank == 0)
printf("MPI RMA Strided Accumulate Test:\n");

for (i = 0; i < XDIM*YDIM; i++) {
*(win_buf  + i) = 1.0 + rank;
*(src_buf + i) = 1.0 + rank;
}

MPI_Win_create(win_buf, bufsize, 1, MPI_INFO_NULL, MPI_COMM_WORLD, 
&buf_win);

peer = (rank+1) % nranks;

// Perform ITERATIONS strided accumulate operations

for (i = 0; i < ITERATIONS; i++) {
  int ndims   = 2;
  int src_arr_sizes[2]= { XDIM, YDIM };
  int src_arr_subsizes[2] = { SUB_XDIM, SUB_YDIM };
  int src_arr_starts[2]   = {0,0 };
  int dst_arr_sizes[2]= { XDIM, YDIM };
  int dst_arr_subsizes[2] = { SUB_XDIM, SUB_YDIM };
  int dst_arr_starts[2]   = {0,0 };
  MPI_Datatype src_type, dst_type;

  MPI_Type_create_subarray(ndims, src_arr_sizes, src_arr_subsizes, 
src_arr_starts,
  MPI_ORDER_C, MPI_DOUBLE, &src_type);

  MPI_Type_create_subarray(ndims, dst_arr_sizes, dst_arr_subsizes, 
dst_arr_starts,
  MPI_ORDER_C, MPI_DOUBLE, &dst_type);

  MPI_Type_commit(&src_type);
  MPI_Type_commit(&dst_type);

  MPI_Win_lock(MPI_LOCK_EXCLUSIVE, peer, 0, buf_win);

  MPI_Accumulate(src_buf, 1, src_type, peer, 0, 1, dst_type, MPI_SUM, 
buf_win);

  MPI_Win_unlock(peer, buf_win);

  MPI_Type_free(&src_type);
  MPI_Type_free(&dst_type);
}

MPI_Barrier(MPI_COMM_WORLD);

// Verify that the results are correct

MPI_Win_lock(MPI_LOCK_EXCLUSIVE, rank, 0, buf_win);
errors = 0;
for (i = 0; i < SUB_XDIM; i++) {
  for (j = 0; j < SUB_YDIM; j++) {
const double actual   = *(win_buf + i + j*XDIM);
const double expected = (1.0 + rank) + (1.0 + ((rank+nranks-1)%nranks)) 
* (ITERATIONS);
if (actual - expected > 1e-10) {
  printf("%d: Data validation failed at [%d, %d] expected=%f 
actual=%f\n",
  rank, j, i, expected, actual);
  errors++;
  fflush(stdout);
}
  }
}
for (i = SUB_XDIM; i < XDIM; i++) {
  for (j = 0; j < SUB_YDIM; j++) {
const double actual   = *(win_buf + i + j*XDIM);
const double expected = 1.0 + rank;
if (actual - expected > 1e-10) {
  printf("%d: Data validation failed at [%d, %d] expected=%f 
actual=%f\n",
  rank, j, i, expected, actual);
  errors++;
  fflush(stdout);