Thomas,
Thanks for the detailed bug report and the test case. I successfully identified
the culprit, and the issue is now fixed (commit r28319).
Regards,
George.
PS: During the debugging process I sketched the datatype representation to help
myself understand the issue. I attached the figure here for the delight of
whoever might be interested. It contains the 4 datatypes created in main, and
the two datatypes created on the second invocation of the do_test function.
On Apr 8, 2013, at 16:08 , Thomas Jahns <[email protected]> wrote:
> Hello,
>
> a colleague of mine has investigated a difficult problem we traced to OpenMPI
> giving incorrectly delivered data on some struct datatypes which use specific
> offsets (on the stack in our case but the problem can be reproduced when using
> specifically chosen slices of an array). Our library is used to aggregate
> several MPI communications in a generic and transparent manner and therefore
> we
> need to be able to handle any combination of properly aligned offsets and
> component types.
>
> The attached example program contains the necessary steps to reproduce the
> problem:
>
> 1. create the struct types in question
> 2. send/recv the data
> 3. compare to reference (said comparison works on several MPICH2 versions)
>
> The code prints than any array indices/values not matching the reference.
>
> Our platform is linux x86_64 with Debian squeeze, the tested versions of
> OpenMPI
> are the 1.4.2 version supplied with squeeze and 1.6.4 compiled ourselves. For
> 1.4.2 I also did a quick test in a i386 chroot and the code fails there too.
> gcc
> 4.6.1 was used for the x86_64 cases and gcc 4.3.5 for the i386 chroot.
>
> Sorry if the test is not of minimal size, but we were happy once he got this
> down from several 10000 lines Fortran+C and even that took more than a day
> once
> we understood the problem was unrelated to the Fortran program it originally
> occurred in.
>
> When running the program with OpenMPI:
>
> $ mpicc -std=gnu99 ./mpi_test.c && ./a.out
> first tests:
> second tests:
> results_2[6] = 8
> ref_results_2[6] = 12
> results_2[7] = 9
> ref_results_2[7] = 13
>
> MPICH gives the expected result:
> $ /sw/squeeze-x64/mpi/mpich2-1.4.1p1-gccsys/bin/mpicc -std=gnu99 ./mpi_test.c
> &&
> ./a.out
> first tests:
> second tests:
>
> Regards, Thomas
> --
> Thomas Jahns
> DKRZ GmbH, Department: Application software
>
> Deutsches Klimarechenzentrum
> Bundesstraße 45a
> D-20146 Hamburg
>
> Phone: +49-40-460094-151
> Fax: +49-40-460094-270
> Email: Thomas Jahns <[email protected]>
> <mpi_test.c>_______________________________________________
> devel mailing list
> [email protected]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel