Nadia, Which version of Open MPI are you using? I tried with the nightly r29751, the current 1.6 and the current 1.7 and I __always__ got the expected output.
There is a simple way to show what the datatype engine is doing. You can set the MCA parameters mpi_ddt_unpack_debug and mpi_ddt_pack_debug to get more info. If you only want to see how the datatype looks after the MPI_Commit step you can call directly ompi_datatype_dump(ddt). This will show the internals of the datatype, converted in predefined types. As an example I took the application you provided and build the following picture of what is send and what is received (original buffer, send datatype, packed buffer, recv datatype, resulting buffer). Now using the ompi_datatype_dump, I see the recv and the send datatypes as: -cC---P-DB-[---][---] OPAL_UINT1 count 8 disp 0x0 (0) extent 1 (size 8) -cC---P-DB-[---][---] OPAL_UINT1 count 8 disp 0x10 (16) extent 1 (size 8) -cC—P-DB-[—][—] OPAL_INT4 count 4 disp 0x30 (48) extent 4 (size 16) ———————-G—[—][—] OPAL_END_LOOP pref 3 elements first elem displacement 0 size of data 32 -cC---P-DB-[---][---] OPAL_UINT1 count 24 disp 0x10 (16) extent 1 (size 24) -cC---P-DB-[---][---] OPAL_UINT1 count 8 disp 0x30 (48) extent 1 (size 8) ———G---[---][---] OPAL_END_LOOP prev 2 elements first elem displacement 16 size of data 32 This match perfectly to the datatype drawn by hand. George. On Nov 25, 2013, at 11:40 , Nadia Derbey <nadia.der...@bull.net> wrote: > Hi, > > I'm currently working on a bug occuring at the client site with openmpi when > calling MPI_Sendreceive() on datatypes built by the application. > I think I've found where the bug comes from (it is located in > opal_generic_simple_pack_function() - file > opal/datatype/opal_datatype_pack.c). But this code is so complicated that I'm > more than unsure of my fix. What I can say is that it fixes things for me, > but I need some advices from the datatypes specialists. > > --------------- > > You will find in attachment the reproducer provided by the client, as well as > the resulting output. > datatypes.c : reproducer > to run the binary: salloc --exclusive -p B510 -N 1 -n 1 mpirun ./datatypes > trc_ko: traces got without the patch applied > trc_ok: traces got with the patch applied. > > --------------- > > The proposed patch is the following: (Note that the very first change in this > patch was enough in my case, but I thought all the "source_base" settings > should follow this model.) > > ------------------------- > opal_generic_simple_pack_function: add the datatype lb when progressing in > the input buffer > > diff -r cb23c2f07e1f opal/datatype/opal_datatype_pack.c > --- a/opal/datatype/opal_datatype_pack.c Sun Nov 24 17:06:51 2013 +0000 > +++ b/opal/datatype/opal_datatype_pack.c Mon Nov 25 10:48:00 2013 +0100 > @@ -301,7 +301,7 @@ opal_generic_simple_pack_function( opal_ > PACK_PREDEFINED_DATATYPE( pConvertor, pElem, count_desc, > source_base, destination, > iov_len_local ); > if( 0 == count_desc ) { /* completed */ > - source_base = pConvertor->pBaseBuf + pStack->disp; > + source_base = pConvertor->pBaseBuf + pStack->disp + > pData->lb; > pos_desc++; /* advance to the next data */ > UPDATE_INTERNAL_COUNTERS( description, pos_desc, pElem, > count_desc ); > continue; > @@ -333,7 +333,7 @@ opal_generic_simple_pack_function( opal_ > pStack->disp += > description[pStack->index].loop.extent; > } > } > - source_base = pConvertor->pBaseBuf + pStack->disp; > + source_base = pConvertor->pBaseBuf + pStack->disp + > pData->lb; > UPDATE_INTERNAL_COUNTERS( description, pos_desc, pElem, > count_desc ); > DO_DEBUG( opal_output( 0, "pack new_loop count %d stack_pos > %d pos_desc %d disp %ld space %lu\n", > (int)pStack->count, > pConvertor->stack_pos, pos_desc, (long)pStack->disp, (unsigned > long)iov_len_local ); ); > @@ -354,7 +354,7 @@ opal_generic_simple_pack_function( opal_ > pStack->disp + local_disp); > pos_desc++; > update_loop_description: /* update the current state */ > - source_base = pConvertor->pBaseBuf + pStack->disp; > + source_base = pConvertor->pBaseBuf + pStack->disp + > pData->lb; > UPDATE_INTERNAL_COUNTERS( description, pos_desc, pElem, > count_desc ); > DDT_DUMP_STACK( pConvertor->pStack, pConvertor->stack_pos, > pElem, "advance loop" ); > continue; > @@ -374,7 +374,7 @@ opal_generic_simple_pack_function( opal_ > } > /* I complete an element, next step I should go to the next one */ > PUSH_STACK( pStack, pConvertor->stack_pos, pos_desc, OPAL_DATATYPE_INT8, > count_desc, > - source_base - pStack->disp - pConvertor->pBaseBuf ); > + source_base - pStack->disp - pConvertor->pBaseBuf - > pData->lb ); > DO_DEBUG( opal_output( 0, "pack save stack stack_pos %d pos_desc %d > count_desc %d disp %ld\n", > pConvertor->stack_pos, pStack->index, > (int)pStack->count, (long)pStack->disp ); ); > return 0; > > ------------------------------- > > Regards, > Nadia > -- > Nadia Derbey > Bull, Architect of an Open World > http://www.bull.com > <datatypes.c><trc_ko.txt><trc_ok.txt>_______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel