Hi,

I and my colleague found 3 OSC-related bugs in OMPI datatype code.
One for trunk and v1.6/v1.7 branches, and two for only v1.6 branch.

(1) OMPI_DATATYPE_ALIGN_PTR should be placed after memcpy

  Last year I reported a bug in OMPI datatype code and it was
  fixed in r25721. But the fix was not correct and the problem
  still exists.

  My reported bug and the patch:
    http://www.open-mpi.org/community/lists/devel/2012/01/10207.php
  r25721:
    https://svn.open-mpi.org/trac/ompi/changeset/25721

  OMPI_DATATYPE_ALIGN_PTR should be placed after memcpy
  in __ompi_datatype_pack_description function, like the
  patch attached in my previous mail.

  I didn't confirm r25721 well when it was committed, sorry.

  The attached file datatype-align.patch is the correct patch
  for the latest trunk. This fix should be applied to trunk
  and v1.7/v1.6 branches.

(2) r28790 should be merged into v1.6

  The trunk changeset r28790 had been merged into v1.7 in r28790
  (ticket #3673), but it is not yet merged into v1.6.

  I confirmed the problem reported last month also occurs in v1.6
  and can be fixed by merging r28790 into v1.6.

  The original reported problem:
    http://www.open-mpi.org/community/lists/devel/2013/07/12595.php

(3) OMPI_DATATYPE_MAX_PREDEFINED should be 46 for v1.6

  In v1.6 branch, ompi/datatype/ompi_datatype.h defines
  OMPI_DATATYPE_MAX_PREDEFINED as 45 but the number of
  predefined datatypes is 46 and the last predefined
  datatype ID (OMPI_DATATYPE_MPI_UB) is 45.

  OMPI_DATATYPE_MAX_PREDEFINED is used as the number of
  predefined datatypes or maximum predefined datatype ID + 1,
  not the maximum predefined datatype ID, like below.

    ompi/op/op.c:79:
      // the number of predefined datatypes
      int ompi_op_ddt_map[OMPI_DATATYPE_MAX_PREDEFINED];
    ompi/datatype/ompi_datatype_args.c:573:
      // maximum predefined datatype ID + 1
      assert( data_id < OMPI_DATATYPE_MAX_PREDEFINED );
    ompi/datatype/ompi_datatype_args.c:492:
      // first unused datatype ID
      // (= maximum predefined datatype ID + 1)
      int next_index = OMPI_DATATYPE_MAX_PREDEFINED;

  So its value should be 46 for v1.6.

  Actually, at r28932 in trunk, one datatype (MPI_Count) is
  added but OMPI_DATATYPE_MAX_PREDEFINED is increased
  from 45 to 47. So current trunk is correct.

  This bug causes a random error, like SEGV, "Error recreating
  datatype", or "received packet for Window with unknown type",
  if you use MPI_UB in OSC, like the attached program osc_ub.c.

Regards,
Takahiro Kawashima,
MPI development team,
Fujitsu
Index: ompi/datatype/ompi_datatype_args.c
===================================================================
--- ompi/datatype/ompi_datatype_args.c	(revision 29064)
+++ ompi/datatype/ompi_datatype_args.c	(working copy)
@@ -467,12 +467,13 @@
     position = (int*)next_packed;
     next_packed += sizeof(int) * args->cd;
 
-    /* description of next datatype should be 64 bits aligned */
-    OMPI_DATATYPE_ALIGN_PTR(next_packed, char*);
     /* copy the aray of counts (32 bits aligned) */
     memcpy( next_packed, args->i, sizeof(int) * args->ci );
     next_packed += args->ci * sizeof(int);
 
+    /* description of next datatype should be 64 bits aligned */
+    OMPI_DATATYPE_ALIGN_PTR(next_packed, char*);
+
     /* copy the rest of the data */
     for( i = 0; i < args->cd; i++ ) {
         ompi_datatype_t* temp_data = args->d[i];
#include <stdio.h>
#include <mpi.h>

int main(int argc, char *argv[])
{
    int size, rank;
    MPI_Win win;
    MPI_Datatype datatype;
    MPI_Datatype datatypes[] = {MPI_INT, MPI_UB};
    int blengths[] = {1, 1};
    MPI_Aint displs[] = {0, sizeof(int)};
    int buf[] = {0};

    MPI_Init(&argc, &argv);

    MPI_Comm_size(MPI_COMM_WORLD, &size);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);

    if (size < 2) {
        fprintf(stderr, "Needs at least 2 processes\n");
        MPI_Abort(MPI_COMM_WORLD, 1);
    }

    MPI_Type_create_struct(2, blengths, displs, datatypes, &datatype);
    MPI_Type_commit(&datatype);
    MPI_Win_create(buf, sizeof(int), 1, MPI_INFO_NULL, MPI_COMM_WORLD, &win);

    MPI_Win_fence(0, win);
    if (rank == 0) {
        MPI_Put(buf, 1, datatype, 1, 0, 1, datatype, win);
    }
    MPI_Win_fence(0, win);

    MPI_Win_free(&win);
    MPI_Type_free(&datatype);

    MPI_Finalize();

    return 0;
}

Reply via email to