Hi Guillaume,
As Pierre mentioned, a chunk size of 1 is not reasonable and will
generate a lot of metadata overhead. Something closer to 1MB of data elements
would be much better.
Quincey
> On May 24, 2017, at 12:23 AM, Guillaume Jacquenot
> <[email protected]> wrote:
>
> Hello Hdf5 community, Quincey
>
> I have tested 1.8.16 and 1.10.1 versions, also with h5pset_libver_bounds_f
> subroutine
>
> I have inserted these commands in my bench program
>
> call h5open_f(error)
> call h5pcreate_f( H5P_FILE_ACCESS_F, fapl_id, error)
> call h5pset_libver_bounds_f(fapl_id, H5F_LIBVER_LATEST_F,
> H5F_LIBVER_LATEST_F, error)
>
>
> However, I can't see any difference on the size of HDF5 generated files.
> Below is the size and md5sum of the generated hdf5 files, with the 2 hdf5
> libraries and different number of elements (0,1 and 2) in each dataset
>
>
>
> Version 1.8.16
> $ ./bench.exe 0 && md5sum results.h5 && ls -altr results.h5
> ee8157f1ce74936021b1958fb796741e *results.h5
> -rw-r--r-- 1 xxxxx 1049089 1169632 May 24 09:17 results.h5
>
> $ ./bench.exe 1 && md5sum results.h5 && ls -altr results.h5
> 1790a5650bb945b17c0f8a4e59adec85 *results.h5
> -rw-r--r-- 1 xxxxx 1049089 7481632 May 24 09:17 results.h5
>
> $ ./bench.exe 2 && md5sum results.h5 && ls -altr results.h5
> 7d3dff2c6a1c29fa0fe827e4bd5ba79e *results.h5
> -rw-r--r-- 1 xxxxx 1049089 7505632 May 24 09:17 results.h5
>
>
> Version 1.10.1
> $ ./bench.exe 0 && md5sum results.h5 && ls -altr results.h5
> ec8169773b9ea015c81fc4cb2205d727 *results.h5
> -rw-r--r-- 1 xxxxx 1049089 1169632 May 24 09:12 results.h5
>
> $ ./bench.exe 1 && md5sum results.h5 && ls -altr results.h5
> fae64160fe79f4af0ef382fd1790bf76 *results.h5
> -rw-r--r-- 1 xxxxx 1049089 7481632 May 24 09:14 results.h5
>
> $ ./bench.exe 2 && md5sum results.h5 && ls -altr results.h5
> 20aaf160b3d8ab794ab8c14a604dacc5 *results.h5
> -rw-r--r-- 1 xxxxx 1049089 7505632 May 24 09:14 results.h5
>
>
>
>
>
> 2017-05-23 19:12 GMT+02:00 Guillaume Jacquenot <[email protected]
> <mailto:[email protected]>>:
> Hello Quincey
>
> I am using version 1.8.16
>
> I am using chunk of size 1.
> I have tried contiguous dataset, but I have error at runtime
>
> I have written a test program that creates 3000 datasets filled with 64
> floating point number.
> I can specify the number n, which controls the number of times I saved my
> data (the number of timesteps of a simulation in my case)
>
> To sum this test program,
>
> call hdf5_init(filename)
> do i = 1, n
> call hdf5_write(datatosave)
> end do
> call hdf5_close()
>
>
>
> With n =0, I have a HDF5 file with size 1.11 Mo, which corresponds to a 370
> bytes per empty dataset (Totally reasonnable).
> With 1 =0, I have a HDF5 file with size 7.13 Mo, which surprises me. Why
> such an increase?
> With 2 =0, I have a HDF5 file with size 7.15 Mo, which is leads to an
> increase of 0.02 Mo which is logical : 3000*8*1/1e6 =0.024 Mo)
>
> When setting chunk size to 10, I obtain the following results
>
> With n =0, I have a HDF5 file with size 1.11 Mo, which corresponds to a 370
> bytes per empty dataset.
> With 1 =0, I have a HDF5 file with size 7.34 Mo, which surprises me.
> With 2 =0, I have a HDF5 file with size 7.15 Mo, which is leads to an
> increase of 3000*8*10/1e6, which is logical.
>
> I don't understand the first increase of size. It does not make this data
> storage very efficient.
> Do you think coumpound dataset with 3000 columns will present the same
> behaviour? I have not tried since I don't know how to map the content of an
> array when calling the h5dwrite_f function for a compound dataset.
>
>
> If I ask 30000 datasets, I observe the same behaviour
> n=0 -> 10.9 Mo
> n=1 -> 73.2 Mo
>
> Thanks
>
>
>
> Here is the error I have with contiguous dataset
>
>
> #001: hdf5-1.8.16/src/H5Dint.c line 453 in H5D__create_named(): unable to
> create and link to dataset
> major: Dataset
> minor: Unable to initialize object
> #002: hdf5-1.8.16/src/H5L.c line 1638 in H5L_link_object(): unable to
> create new link to object
> major: Links
> minor: Unable to initialize object
> #003: hdf5-1.8.16/src/H5L.c line 1882 in H5L_create_real(): can't insert
> link
> major: Symbol table
> minor: Unable to insert object
> #004: hdf5-1.8.16/src/H5Gtraverse.c line 861 in H5G_traverse(): internal
> path traversal failed
> major: Symbol table
> minor: Object not found
> #005: hdf5-1.8.16/src/H5Gtraverse.c line 641 in H5G_traverse_real():
> traversal operator failed
> major: Symbol table
> minor: Callback failed
> #006: hdf5-1.8.16/src/H5L.c line 1685 in H5L_link_cb(): unable to create
> object
> major: Object header
> minor: Unable to initialize object
> #007: hdf5-1.8.16/src/H5O.c line 3016 in H5O_obj_create(): unable to open
> object
> major: Object header
> minor: Can't open object
> #008: hdf5-1.8.16/src/H5Doh.c line 293 in H5O__dset_create(): unable to
> create dataset
> major: Dataset
> minor: Unable to initialize object
> #009: hdf5-1.8.16/src/H5Dint.c line 1056 in H5D__create(): unable to
> construct layout information
> major: Dataset
> minor: Unable to initialize object
> #010: hdf5-1.8.16/src/H5Dcontig.c line 422 in H5D__contig_construct():
> extendible contiguous non-external dataset
> major: Dataset
> minor: Feature is unsupported
> HDF5-DIAG: Error detected in HDF5 (1.8.16) t^C
>
> 2017-05-23 19:00 GMT+02:00 <[email protected]
> <mailto:[email protected]>>:
>
>
> Date: Tue, 23 May 2017 08:22:59 -0700
> From: Quincey Koziol <[email protected] <mailto:[email protected]>>
> To: HDF Users Discussion List <[email protected]
> <mailto:[email protected]>>
> Subject: Re: [Hdf-forum] Questions about size of generated Hdf5 files
> Message-ID: <[email protected]
> <mailto:[email protected]>>
> Content-Type: text/plain; charset="utf-8"
>
> Hi Guillaume,
> Are you using chunked or contiguous datasets? If chunked, what size
> are you using? Also, can you use the ?latest? version of the format, which
> should be smaller, but is only compatible with HDF5 1.10.x or later? (i.e.
> H5Pset_libver_bounds with ?latest? for low and high bounds,
> https://support.hdfgroup.org/HDF5/doc/RM/H5P/H5Pset_libver_bounds.htm
> <https://support.hdfgroup.org/HDF5/doc/RM/H5P/H5Pset_libver_bounds.htm>
> <https://support.hdfgroup.org/HDF5/doc/RM/H5P/H5Pset_libver_bounds.htm
> <https://support.hdfgroup.org/HDF5/doc/RM/H5P/H5Pset_libver_bounds.htm>> )
>
> Quincey
>
>
> > On May 23, 2017, at 3:02 AM, Guillaume Jacquenot
> > <[email protected] <mailto:[email protected]>>
> > wrote:
> >
> > Hello everyone!
> >
> > I am creating a HDF5 file from a Fortran program, and I am confused about
> > the size of my generated HDF5 file.
> >
> > I am writing 19000 datasets with 21 values of 64 bit (real number).
> > I write one value at a time, and extend with one each of the 19000 datasets
> > everytime.
> > All data are correctly written.
> > But the generated file is more than 48 Mo.
> > I expected the total size of the file to be a little bigger than the raw
> > data, about 3.2Mo (21*19000*8 / 1e6=3.192Mo)
> > If I only create 19000 empty datasets, I obtain a 6Mo Hdf5 file, which
> > means each empty dataset is about 400 bytes.
> > I guess I could create a ~10 Mo (6Mo + 3.2Mo) Hdf5 file that can contain
> > everything.
> >
> > For comparaison,if I write everything in a text file, where each real
> > number is written with 15 characters, I obtain a 6 Mo CSV file.
> >
> > Question 1)
> > Is this behaviour normal?
> >
> > Question 2)
> > Does extending dataset each time we write data inside can significantly
> > increase the total required space disk size?
> > Does preallocating dataset and using hyperslab can save some space?
> > Does chunk parameters can impact the size of generated hdf5 file
> >
> > Question 3)
> > If I pack everything in a compound dataset with 19000 columns, will the
> > result file be smaller?
> >
> > N.B:
> > When looking at the example of generating 100000 groups (grplots.c),the
> > size of the generated HD5 file is 78 Mo for 100000 empty groups
> > That means each group is about 780 bytes
> > https://support.hdfgroup.org/ftp/HDF5/examples/howto/crtmany/grplots.c
> > <https://support.hdfgroup.org/ftp/HDF5/examples/howto/crtmany/grplots.c>
> > <https://support.hdfgroup.org/ftp/HDF5/examples/howto/crtmany/grplots.c
> > <https://support.hdfgroup.org/ftp/HDF5/examples/howto/crtmany/grplots.c>>
> >
> > Guillaume Jacquenot
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [email protected]
> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
> Twitter: https://twitter.com/hdf5
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5