I think the issue is related to using an older openmpi (or maybe just using openmpi). In hdf5-1.8.16, H5Dchunk.c, there is a comment about working around a bug for MPI_Type_create_hindexed_block(). The comment says that “should not have a special case for blocks == 0, but ompi (as of 1.8.1) has a bug in file_set_view when a zero size datatype is create with hindexed or hvector.”
This fix is not in hdf5-1.10.0-patch1. My cases are failing (with openmpi-1.6.4 and openmpi-1.8.1) on processors where blocks == 0 and they are failing with MPI_File_set_view in the backtrace. If I pull the workaround from 1.8.16 in H5Dchunk.c into 1.8.10-patch1, then the code makes it past this point (but then fails an assert at a later point in the test). ..Greg -- "A supercomputer is a device for turning compute-bound problems into I/O-bound problems” From: Hdf-forum <[email protected]> on behalf of "Sjaardema, Gregory D" <[email protected]> Reply-To: HDF Users Discussion List <[email protected]> Date: Tuesday, October 25, 2016 at 1:20 PM To: "[email protected]" <[email protected]> Subject: [EXTERNAL] [Hdf-forum] hdf5-1.10.0-patch1 -- parallel tests failing in PMPI_File_set_view I am having failures running the hdf5-1.10.0-patch1 parallel tests testphdf5. The t_mpi test passes with no issues. Many of the failures occur in the call stack with PMPI_File_set_view being called by H5FDWrite. I am using gcc-4.7.2 and openmpi-1.6.4 on a RHEL6 system. I am also getting failures on OSX El Capitan with gcc-4.9.4 and openmpi. On RHEL6, the eidsetw2 is one of the tests failing. The backtrace is: [...] *** Process received signal *** [...] Signal: Segmentation fault (11) [...] Signal code: Address not mapped (1) [...] Failing at address: (nil) [...] [ 0] /lib64/libpthread.so.0() [0x3481a0f710] [...] [ 1] ....openmpi/1.6.4-gcc-4.7.2-RHEL6/lib/openmpi/mca_io_romio.so(ADIOI_Flatten+0x450) [0x7f65d968e5e0] [...] [ 2] ....openmpi/1.6.4-gcc-4.7.2-RHEL6/lib/openmpi/mca_io_romio.so(ADIOI_Flatten_datatype+0xc5) [0x7f65d9690495] [...] [ 3] ....openmpi/1.6.4-gcc-4.7.2-RHEL6/lib/openmpi/mca_io_romio.so(ADIO_Set_view+0x1da) [0x7f65d96852ca] [...] [ 4] ....openmpi/1.6.4-gcc-4.7.2-RHEL6/lib/openmpi/mca_io_romio.so(mca_io_romio_dist_MPI_File_set_view+0x172) [0x7f65d9695db2] [...] [ 5] ....openmpi/1.6.4-gcc-4.7.2-RHEL6/lib/libmpi.so.1(MPI_File_set_view+0x107) [0x7f65e1ae3e77] [...] [ 6] hdf5-1.10.0-patch1/src/.libs/libhdf5.so.100(+0x5f899f) [0x7f65e242599f] [...] [ 7] hdf5-1.10.0-patch1/src/.libs/libhdf5.so.100(H5FD_write+0x4e0) [0x7f65e203d232] [...] [ 8] hdf5-1.10.0-patch1/src/.libs/libhdf5.so.100(H5F__accum_write+0x184a) [0x7f65e1ffbd84] [...] [ 9] hdf5-1.10.0-patch1/src/.libs/libhdf5.so.100(H5F_block_write+0x40c) [0x7f65e20023dd] [...] [10] hdf5-1.10.0-patch1/src/.libs/libhdf5.so.100(+0x118fcf) [0x7f65e1f45fcf] [...] [11] hdf5-1.10.0-patch1/src/.libs/libhdf5.so.100(H5D__chunk_allocate+0x1af6) [0x7f65e1f443f5] [...] [12] hdf5-1.10.0-patch1/src/.libs/libhdf5.so.100(+0x152fd3) [0x7f65e1f7ffd3] [...] [13] hdf5-1.10.0-patch1/src/.libs/libhdf5.so.100(H5D__alloc_storage+0x665) [0x7f65e1f7f95f] [...] [14] hdf5-1.10.0-patch1/src/.libs/libhdf5.so.100(H5D__layout_oh_create+0x57a) [0x7f65e1f8dee0] [...] [15] hdf5-1.10.0-patch1/src/.libs/libhdf5.so.100(+0x14bc99) [0x7f65e1f78c99] [...] [16] hdf5-1.10.0-patch1/src/.libs/libhdf5.so.100(H5D__create+0x1162) [0x7f65e1f7a5e8] [...] [17] hdf5-1.10.0-patch1/src/.libs/libhdf5.so.100(+0x165662) [0x7f65e1f92662] [...] [18] hdf5-1.10.0-patch1/src/.libs/libhdf5.so.100(H5O_obj_create+0x2ec) [0x7f65e21438e2] [...] [19] hdf5-1.10.0-patch1/src/.libs/libhdf5.so.100(+0x2f1930) [0x7f65e211e930] [...] [20] hdf5-1.10.0-patch1/src/.libs/libhdf5.so.100(+0x27eba0) [0x7f65e20abba0] [...] [21] hdf5-1.10.0-patch1/src/.libs/libhdf5.so.100(H5G_traverse+0x4ff) [0x7f65e20acd6b] [...] [22] hdf5-1.10.0-patch1/src/.libs/libhdf5.so.100(+0x2f259a) [0x7f65e211f59a] [...] [23] hdf5-1.10.0-patch1/src/.libs/libhdf5.so.100(H5L_link_object+0x1d3) [0x7f65e211e6ae] [...] [24] hdf5-1.10.0-patch1/src/.libs/libhdf5.so.100(H5D__create_named+0x3d1) [0x7f65e1f75f17] [...] [25] hdf5-1.10.0-patch1/src/.libs/libhdf5.so.100(H5Dcreate2+0x68f) [0x7f65e1f1e703] [...] [26] hdf5-1.10.0-patch1/testpar/.libs/lt-testphdf5(extend_writeInd2+0x598) [0x416512] [...] [27] hdf5-1.10.0-patch1/testpar/.libs/lt-testphdf5(PerformTests+0x1ab) [0x45addc] [...] [28] hdf5-1.10.0-patch1/testpar/.libs/lt-testphdf5(main+0x94c) [0x408949] [...] [29] /lib64/libc.so.6(__libc_start_main+0xfd) [0x348161ed5d] [...] *** End of error message *** I’m not really asking for anyone to debug this for me, just wondering if anyone else is having issues running the parallel tests with hdf5-1.10.0-patch1. Thanks, ..Greg -- "A supercomputer is a device for turning compute-bound problems into I/O-bound problems”
_______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org Twitter: https://twitter.com/hdf5
