On Mon, 2017-08-07 at 09:14 -0500, Quincey Koziol wrote: > Hi Frederic, > Could you give us some more details about your file and the > call(s) you are making to HDF5? I can’t think of any reason that it > would crash when creating a file like this, but something interesting > could be going on… :-)
Depending on how new his MPI implementation is, it might not have all the 64 bit cleanups in the NFS path. The final error in the trace says "File too large" but what it might mean is "I/O request too big". If you write to something that is not NFS, I think you'll find this problem goes away: http://press3.mcs.anl.gov/romio/2013/07/03/large-transfers-in-romio/ and http://press3.mcs.anl.gov/romio/2014/07/11/more-headaches-with-2-gib-io / have a bit more information. I neglected NFS back then and did not update that driver until earlier this year. ==rob > > Quincey > > > > On Aug 7, 2017, at 5:28 AM, Frederic Perez <[email protected] > > m> wrote: > > > > Hi, > > > > While writing significant amount of data in parallel, I obtain the > > following error stack: > > > > HDF5-DIAG: Error detected in HDF5 (1.8.16) MPI-process 66: > > #000: H5D.c line 194 in H5Dcreate2(): unable to create dataset > > major: Dataset > > minor: Unable to initialize object > > #001: H5Dint.c line 453 in H5D__create_named(): unable to create > > and > > link to dataset > > major: Dataset > > minor: Unable to initialize object > > #002: H5L.c line 1638 in H5L_link_object(): unable to create new > > link to object > > major: Links > > minor: Unable to initialize object > > #003: H5L.c line 1882 in H5L_create_real(): can't insert link > > major: Symbol table > > minor: Unable to insert object > > #004: H5Gtraverse.c line 861 in H5G_traverse(): internal path > > traversal failed > > major: Symbol table > > minor: Object not found > > #005: H5Gtraverse.c line 641 in H5G_traverse_real(): traversal > > operator failed > > major: Symbol table > > minor: Callback failed > > #006: H5L.c line 1685 in H5L_link_cb(): unable to create object > > major: Object header > > minor: Unable to initialize object > > #007: H5O.c line 3016 in H5O_obj_create(): unable to open object > > major: Object header > > minor: Can't open object > > #008: H5Doh.c line 293 in H5O__dset_create(): unable to create > > dataset > > major: Dataset > > minor: Unable to initialize object > > #009: H5Dint.c line 1060 in H5D__create(): can't update the > > metadata cache > > major: Dataset > > minor: Unable to initialize object > > #010: H5Dint.c line 852 in H5D__update_oh_info(): unable to update > > layout/pline/efl header message > > major: Dataset > > minor: Unable to initialize object > > #011: H5Dlayout.c line 238 in H5D__layout_oh_create(): unable to > > initialize storage > > major: Dataset > > minor: Unable to initialize object > > #012: H5Dint.c line 1713 in H5D__alloc_storage(): unable to > > initialize dataset with fill value > > major: Dataset > > minor: Unable to initialize object > > #013: H5Dint.c line 1805 in H5D__init_storage(): unable to > > allocate > > all chunks of dataset > > major: Dataset > > minor: Unable to initialize object > > #014: H5Dchunk.c line 3575 in H5D__chunk_allocate(): unable to > > write > > raw data to file > > major: Low-level I/O > > minor: Write failed > > #015: H5Dchunk.c line 3745 in H5D__chunk_collective_fill(): unable > > to write raw data to file > > major: Low-level I/O > > minor: Write failed > > #016: H5Fio.c line 171 in H5F_block_write(): write through > > metadata > > accumulator failed > > major: Low-level I/O > > minor: Write failed > > #017: H5Faccum.c line 825 in H5F__accum_write(): file write failed > > major: Low-level I/O > > minor: Write failed > > #018: H5FDint.c line 260 in H5FD_write(): driver write request > > failed > > major: Virtual File Layer > > minor: Write failed > > #019: H5FDmpio.c line 1846 in H5FD_mpio_write(): > > MPI_File_write_at_all failed > > major: Internal error (too specific to document in detail) > > minor: Some MPI function failed > > #020: H5FDmpio.c line 1846 in H5FD_mpio_write(): Other I/O error , > > error stack: > > ADIOI_NFS_WRITESTRIDED(672): Other I/O error File too large > > major: Internal error (too specific to document in detail) > > minor: MPI Error String > > > > > > It basically claims that I am creating a file too large. But I > > verified that the filesystem is capable of handling such a size. In > > my > > case, the file is around 4 TB when it crashes. Where could this > > problem come from? I thought HDF5 does not have a problem with very > > large files. Plus, I am dividing the file in several datasets, and > > the > > write operations work perfectly until, at some point, it crashes > > with > > the errors above. > > > > Could it be an issue with HDF5? Or could it be an MPI limitation? I > > am > > skeptic about the latter option: at the beginning, the program > > writes > > several datasets inside the file succesfully (all the datasets > > being > > the same size). If MPI was to blame, why wouldn't it crash at the > > first write? > > > > Thank you for your help. > > Fred > > > > _______________________________________________ > > Hdf-forum is for HDF software users discussion. > > [email protected] > > http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup > > .org > > Twitter: https://twitter.com/hdf5 > > > _______________________________________________ > Hdf-forum is for HDF software users discussion. > [email protected] > http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.o > rg > Twitter: https://twitter.com/hdf5 _______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org Twitter: https://twitter.com/hdf5
