Hi, I and my colleague found 3 OSC-related bugs in OMPI datatype code. One for trunk and v1.6/v1.7 branches, and two for only v1.6 branch.
(1) OMPI_DATATYPE_ALIGN_PTR should be placed after memcpy Last year I reported a bug in OMPI datatype code and it was fixed in r25721. But the fix was not correct and the problem still exists. My reported bug and the patch: http://www.open-mpi.org/community/lists/devel/2012/01/10207.php r25721: https://svn.open-mpi.org/trac/ompi/changeset/25721 OMPI_DATATYPE_ALIGN_PTR should be placed after memcpy in __ompi_datatype_pack_description function, like the patch attached in my previous mail. I didn't confirm r25721 well when it was committed, sorry. The attached file datatype-align.patch is the correct patch for the latest trunk. This fix should be applied to trunk and v1.7/v1.6 branches. (2) r28790 should be merged into v1.6 The trunk changeset r28790 had been merged into v1.7 in r28790 (ticket #3673), but it is not yet merged into v1.6. I confirmed the problem reported last month also occurs in v1.6 and can be fixed by merging r28790 into v1.6. The original reported problem: http://www.open-mpi.org/community/lists/devel/2013/07/12595.php (3) OMPI_DATATYPE_MAX_PREDEFINED should be 46 for v1.6 In v1.6 branch, ompi/datatype/ompi_datatype.h defines OMPI_DATATYPE_MAX_PREDEFINED as 45 but the number of predefined datatypes is 46 and the last predefined datatype ID (OMPI_DATATYPE_MPI_UB) is 45. OMPI_DATATYPE_MAX_PREDEFINED is used as the number of predefined datatypes or maximum predefined datatype ID + 1, not the maximum predefined datatype ID, like below. ompi/op/op.c:79: // the number of predefined datatypes int ompi_op_ddt_map[OMPI_DATATYPE_MAX_PREDEFINED]; ompi/datatype/ompi_datatype_args.c:573: // maximum predefined datatype ID + 1 assert( data_id < OMPI_DATATYPE_MAX_PREDEFINED ); ompi/datatype/ompi_datatype_args.c:492: // first unused datatype ID // (= maximum predefined datatype ID + 1) int next_index = OMPI_DATATYPE_MAX_PREDEFINED; So its value should be 46 for v1.6. Actually, at r28932 in trunk, one datatype (MPI_Count) is added but OMPI_DATATYPE_MAX_PREDEFINED is increased from 45 to 47. So current trunk is correct. This bug causes a random error, like SEGV, "Error recreating datatype", or "received packet for Window with unknown type", if you use MPI_UB in OSC, like the attached program osc_ub.c. Regards, Takahiro Kawashima, MPI development team, Fujitsu
Index: ompi/datatype/ompi_datatype_args.c =================================================================== --- ompi/datatype/ompi_datatype_args.c (revision 29064) +++ ompi/datatype/ompi_datatype_args.c (working copy) @@ -467,12 +467,13 @@ position = (int*)next_packed; next_packed += sizeof(int) * args->cd; - /* description of next datatype should be 64 bits aligned */ - OMPI_DATATYPE_ALIGN_PTR(next_packed, char*); /* copy the aray of counts (32 bits aligned) */ memcpy( next_packed, args->i, sizeof(int) * args->ci ); next_packed += args->ci * sizeof(int); + /* description of next datatype should be 64 bits aligned */ + OMPI_DATATYPE_ALIGN_PTR(next_packed, char*); + /* copy the rest of the data */ for( i = 0; i < args->cd; i++ ) { ompi_datatype_t* temp_data = args->d[i];
#include <stdio.h> #include <mpi.h> int main(int argc, char *argv[]) { int size, rank; MPI_Win win; MPI_Datatype datatype; MPI_Datatype datatypes[] = {MPI_INT, MPI_UB}; int blengths[] = {1, 1}; MPI_Aint displs[] = {0, sizeof(int)}; int buf[] = {0}; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &size); MPI_Comm_rank(MPI_COMM_WORLD, &rank); if (size < 2) { fprintf(stderr, "Needs at least 2 processes\n"); MPI_Abort(MPI_COMM_WORLD, 1); } MPI_Type_create_struct(2, blengths, displs, datatypes, &datatype); MPI_Type_commit(&datatype); MPI_Win_create(buf, sizeof(int), 1, MPI_INFO_NULL, MPI_COMM_WORLD, &win); MPI_Win_fence(0, win); if (rank == 0) { MPI_Put(buf, 1, datatype, 1, 0, 1, datatype, win); } MPI_Win_fence(0, win); MPI_Win_free(&win); MPI_Type_free(&datatype); MPI_Finalize(); return 0; }