I sent this information to George off the mailing list since the attachment was somewhat large. Still strange that I guess I am the only one that sees this.
>-----Original Message----- >From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of George >Bosilca >Sent: Wednesday, April 16, 2014 4:24 PM >To: Open MPI Developers >Subject: Re: [OMPI devel] Possible bug with derived datatypes and openib >BTL in trunk > >Rolf, > >I didn't see these on my check run. Can you run the MPI_Isend_ator test with >mpi_ddt_pack_debug and mpi_ddt_unpack_debug set to 1. I would be >interested in the output you get on your machine. > >George. > > >On Apr 16, 2014, at 14:34 , Rolf vandeVaart <rvandeva...@nvidia.com> wrote: > >> I have seen errors when running the intel test suite using the openib BTL >when transferring derived datatypes. I do not see the error with sm or tcp >BTLs. The errors begin after this checkin. >> >> https://svn.open-mpi.org/trac/ompi/changeset/31370 >> Timestamp: 04/11/14 16:06:56 (5 days ago) >> Author: bosilca >> Message: Reshape all the packing/unpacking functions to use the same >> skeleton. Rewrite the generic_unpacking to take advantage of the same >capabilitites. >> >> Does anyone else see errors? Here is an example running with r31370: >> >> [rvandevaart@drossetti-ivy1 src]$ mpirun --mca btl self,openib -np 2 >> -host drossetti-ivy0,drossetti-ivy1 --mca >> btl_openib_warn_default_gid_prefix 0 MPI_Isend_ator_c MPITEST error >> (1): libmpitest.c:1608 i=117, int32_t value=-1, expected 117 MPITEST >> error (1): libmpitest.c:1578 i=195, char value=-1, expected -61 >> MPITEST error (1): 2 errors in buffer (17,0,12) len 273 commsize 2 >> commtype -10 data_type 13 root 1 MPITEST error (1): libmpitest.c:1608 >> i=117, int32_t value=-1, expected 117 MPITEST error (1): >> libmpitest.c:1578 i=195, char value=-1, expected -61 MPITEST error >> (1): 2 errors in buffer (17,2,12) len 273 commsize 2 commtype -16 >> data_type 13 root 1 MPITEST info (0): Starting MPI_Isend_ator: All >> Isend TO Root test MPITEST info (0): Node spec >> MPITEST_comm_sizes[6]=2 too large, using 1 MPITEST info (0): Node >> spec MPITEST_comm_sizes[22]=2 too large, using 1 MPITEST info (0): >> Node spec MPITEST_comm_sizes[32]=2 too large, using 1 MPITEST error >> (0): libmpitest.c:1608 i=117, int32_t value=-1, expected 118 MPITEST >> error (0): libmpitest.c:1578 i=195, char value=-1, expected -60 >> MPITEST error (0): 2 errors in buffer (17,0,12) len 273 commsize 2 >> commtype -10 data_type 13 root 0 MPITEST error (0): libmpitest.c:1608 >> i=117, int32_t value=-1, expected 118 MPITEST error (0): >> libmpitest.c:1578 i=195, char value=-1, expected -60 MPITEST error >> (0): 2 errors in buffer (17,2,12) len 273 commsize 2 commtype -16 >> data_type 13 root 0 MPITEST error (1): libmpitest.c:1608 i=117, >> int32_t value=-1, expected 117 MPITEST error (1): libmpitest.c:1578 >> i=195, char value=-1, expected -61 MPITEST error (1): 2 errors in >> buffer (17,4,12) len 273 commsize 2 commtype -13 data_type 13 root 1 >> MPITEST error (0): libmpitest.c:1608 i=117, int32_t value=-1, expected >> 118 MPITEST error (0): libmpitest.c:1578 i=195, char value=-1, >> expected -60 MPITEST error (0): 2 errors in buffer (17,4,12) len 273 >> commsize 2 commtype -13 data_type 13 root 0 MPITEST error (1): >> libmpitest.c:1608 i=117, int32_t value=-1, expected 117 MPITEST error >> (1): libmpitest.c:1578 i=195, char value=-1, expected -61 MPITEST >> error (1): 2 errors in buffer (17,6,12) len 273 commsize 2 commtype >> -15 data_type 13 root 0 MPITEST error (0): libmpitest.c:1608 i=117, >> int32_t value=-1, expected 117 MPITEST error (0): libmpitest.c:1578 >> i=195, char value=-1, expected -61 MPITEST error (0): 2 errors in >> buffer (17,6,12) len 273 commsize 2 commtype -15 data_type 13 root 0 >> MPITEST_results: MPI_Isend_ator: All Isend TO Root 8 tests FAILED (of >> 3744) >> ------------------------------------------------------- >> Primary job terminated normally, but 1 process returned a non-zero >> exit code.. Per user-direction, the job has been aborted. >> ------------------------------------------------------- >> ---------------------------------------------------------------------- >> ---- mpirun detected that one or more processes exited with non-zero >> status, thus causing the job to be terminated. The first process to do >> so was: >> >> Process name: [[12363,1],0] >> Exit code: 4 >> ---------------------------------------------------------------------- >> ---- >> [rvandevaart@drossetti-ivy1 src]$ >> >> >> Here is an error with the trunk which is slightly different. >> [rvandevaart@drossetti-ivy1 src]$ mpirun --mca btl self,openib -np 2 >> -host drossetti-ivy0,drossetti-ivy1 --mca >btl_openib_warn_default_gid_prefix 0 MPI_Isend_ator_c [drossetti- >ivy1.nvidia.com:22875] ../../../opal/datatype/opal_datatype_position.c:72 >> Pointer 0x1ad414c size 4 is outside [0x1ac1d20,0x1ad1d08] for >> base ptr 0x1ac1d20 count 273 and data >> [drossetti-ivy1.nvidia.com:22875] Datatype 0x1ac0220[] size 104 align >> 16 id 0 length 22 used 21 true_lb 0 true_ub 232 (true_extent 232) lb 0 >> ub 240 (extent 240) nbElems 21 loops 0 flags 1C4 (commited )-c--lu-GD--[--- >][---] >> contain lb ub OPAL_LB OPAL_UB OPAL_INT1 OPAL_INT2 OPAL_INT4 >OPAL_INT8 OPAL_UINT1 OPAL_UINT2 OPAL_UINT4 OPAL_UINT8 >OPAL_FLOAT4 OPAL_FLOAT8 OPAL_FLOAT16 >> --C---P-D--[---][---] OPAL_INT4 count 1 disp 0x0 (0) extent 4 (size 4) >> --C---P-D--[---][---] OPAL_INT2 count 1 disp 0x8 (8) extent 2 (size 2) >> --C---P-D--[---][---] OPAL_INT8 count 1 disp 0x10 (16) extent 8 (size 8) >> --C---P-D--[---][---] OPAL_UINT2 count 1 disp 0x20 (32) extent 2 (size 2) >> --C---P-D--[---][---] OPAL_UINT4 count 1 disp 0x24 (36) extent 4 (size 4) >> --C---P-D--[---][---] OPAL_UINT8 count 1 disp 0x30 (48) extent 8 (size 8) >> --C---P-D--[---][---] OPAL_FLOAT4 count 1 disp 0x40 (64) extent 4 (size 4) >> --C---P-D--[---][---] OPAL_INT1 count 1 disp 0x48 (72) extent 1 (size 1) >> --C---P-D--[---][---] OPAL_FLOAT8 count 1 disp 0x50 (80) extent 8 (size 8) >> --C---P-D--[---][---] OPAL_UINT1 count 1 disp 0x60 (96) extent 1 (size 1) >> --C---P-D--[---][---] OPAL_FLOAT16 count 1 disp 0x70 (112) extent 16 (size >16) >> --C---P-D--[---][---] OPAL_INT1 count 1 disp 0x90 (144) extent 1 (size >> 1) >> --C---P-D--[---][---] OPAL_UINT1 count 1 disp 0x92 (146) extent 1 (size >> 1) >> --C---P-D--[---][---] OPAL_INT2 count 1 disp 0x94 (148) extent 2 (size >> 2) >> --C---P-D--[---][---] OPAL_UINT2 count 1 disp 0x98 (152) extent 2 (size >> 2) >> --C---P-D--[---][---] OPAL_INT4 count 1 disp 0x9c (156) extent 4 (size >> 4) >> --C---P-D--[---][---] OPAL_UINT4 count 1 disp 0xa4 (164) extent 4 (size >> 4) >> --C---P-D--[---][---] OPAL_INT8 count 1 disp 0xb0 (176) extent 8 (size >> 8) >> --C---P-D--[---][---] OPAL_UINT8 count 1 disp 0xc0 (192) extent 8 (size >> 8) >> --C---P-D--[---][---] OPAL_INT8 count 1 disp 0xd0 (208) extent 8 (size >> 8) >> --C---P-D--[---][---] OPAL_UINT8 count 1 disp 0xe0 (224) extent 8 (size >> 8) >> -------G---[---][---] OPAL_END_LOOP prev 21 elements first elem >> displacement 0 size of data 104 Optimized description >> -cC---P-DB-[---][---] OPAL_INT4 count 1 disp 0x0 (0) extent 4 (size 4) >> -cC---P-DB-[---][---] OPAL_INT2 count 1 disp 0x8 (8) extent 2 (size 2) >> -cC---P-DB-[---][---] OPAL_INT8 count 1 disp 0x10 (16) extent 8 (size 8) >> -cC---P-DB-[---][---] OPAL_UINT2 count 1 disp 0x20 (32) extent 2 (size 2) >> -cC---P-DB-[---][---] OPAL_UINT4 count 1 disp 0x24 (36) extent 4 (size 4) >> -cC---P-DB-[---][---] OPAL_UINT8 count 1 disp 0x30 (48) extent 8 (size 8) >> -cC---P-DB-[---][---] OPAL_FLOAT4 count 1 disp 0x40 (64) extent 4 (size 4) >> -cC---P-DB-[---][---] OPAL_INT1 count 1 disp 0x48 (72) extent 1 (size 1) >> -cC---P-DB-[---][---] OPAL_FLOAT8 count 1 disp 0x50 (80) extent 8 (size 8) >> -cC---P-DB-[---][---] OPAL_UINT1 count 1 disp 0x60 (96) extent 1 (size 1) >> -cC---P-DB-[---][---] OPAL_FLOAT16 count 1 disp 0x70 (112) extent 16 (size >16) >> -cC---P-DB-[---][---] OPAL_INT1 count 1 disp 0x90 (144) extent 1 (size >> 1) >> -cC---P-DB-[---][---] OPAL_UINT1 count 1 disp 0x92 (146) extent 1 (size >> 1) >> -cC---P-DB-[---][---] OPAL_INT2 count 1 disp 0x94 (148) extent 2 (size >> 2) >> -cC---P-DB-[---][---] OPAL_UINT2 count 1 disp 0x98 (152) extent 2 (size >> 2) >> -cC---P-DB-[---][---] OPAL_INT4 count 1 disp 0x9c (156) extent 4 (size >> 4) >> -cC---P-DB-[---][---] OPAL_UINT4 count 1 disp 0xa4 (164) extent 4 (size >> 4) >> -cC---P-DB-[---][---] OPAL_INT8 count 1 disp 0xb0 (176) extent 8 (size >> 8) >> -cC---P-DB-[---][---] OPAL_UINT8 count 1 disp 0xc0 (192) extent 8 (size >> 8) >> -cC---P-DB-[---][---] OPAL_INT8 count 1 disp 0xd0 (208) extent 8 (size >> 8) >> -cC---P-DB-[---][---] OPAL_UINT8 count 1 disp 0xe0 (224) extent 8 (size >> 8) >> -------G---[---][---] OPAL_END_LOOP prev 21 elements first elem >> displacement 0 size of data 104 >> >> MPITEST error (1): libmpitest.c:1578 i=0, char value=-61, expected 0 >> MPITEST error (1): libmpitest.c:1608 i=0, int32_t value=117, expected >> 0 MPITEST error (1): libmpitest.c:1608 i=117, int32_t value=-1, >> expected 117 MPITEST error (1): libmpitest.c:1578 i=195, char >> value=-1, expected -61 MPITEST error (1): 4 errors in buffer (17,0,12) >> len 273 commsize 2 commtype -10 data_type 13 root 1 MPITEST info (0): >> Starting MPI_Isend_ator: All Isend TO Root test MPITEST info (0): >> Node spec MPITEST_comm_sizes[6]=2 too large, using 1 MPITEST info >> (0): Node spec MPITEST_comm_sizes[22]=2 too large, using 1 MPITEST >> info (0): Node spec MPITEST_comm_sizes[32]=2 too large, using 1 >> MPITEST_results: MPI_Isend_ator: All Isend TO Root 1 tests FAILED (of >> 3744) >> ------------------------------------------------------- >> Primary job terminated normally, but 1 process returned a non-zero >> exit code.. Per user-direction, the job has been aborted. >> ------------------------------------------------------- >> ---------------------------------------------------------------------- >> ---- mpirun detected that one or more processes exited with non-zero >> status, thus causing the job to be terminated. The first process to do >> so was: >> >> Process name: [[12296,1],1] >> Exit code: 1 >> ---------------------------------------------------------------------- >> ---- >> [rvandevaart@drossetti-ivy1 src]$ >> >> ---------------------------------------------------------------------- >> ------------- This email message is for the sole use of the intended >> recipient(s) and may contain confidential information. Any >> unauthorized review, use, disclosure or distribution is prohibited. >> If you are not the intended recipient, please contact the sender by >> reply email and destroy all copies of the original message. >> ---------------------------------------------------------------------- >> ------------- _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2014/04/14553.php > >_______________________________________________ >devel mailing list >de...@open-mpi.org >Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >Link to this post: http://www.open- >mpi.org/community/lists/devel/2014/04/14554.php