[Hdf-forum] Questions about size of generated Hdf5 files

Guillaume Jacquenot Tue, 23 May 2017 03:04:46 -0700

Hello everyone!

I am creating a HDF5 file from a Fortran program, and I am confused about
the size of my generated HDF5 file.


I am writing 19000 datasets with 21 values of 64 bit (real number).
I write one value at a time, and extend with one each of the 19000 datasets
everytime.
All data are correctly written.
But the generated file is more than 48 Mo.
I expected the total size of the file to be a little bigger than the raw
data, about 3.2Mo (21*19000*8 / 1e6=3.192Mo)
If I only create 19000 empty datasets, I obtain a 6Mo Hdf5 file, which
means each empty dataset is about 400 bytes.
I guess I could create a ~10 Mo (6Mo + 3.2Mo) Hdf5 file that can contain
everything.

For comparaison,if I write everything in a text file, where each real
number is written with 15 characters, I obtain a 6 Mo CSV file.

Question 1)
Is this behaviour normal?

Question 2)
Does extending dataset each time we write data inside can significantly
increase the total required space disk size?
Does preallocating dataset and using hyperslab can save some space?
Does chunk parameters can impact the size of generated hdf5 file

Question 3)
If I pack everything in a compound dataset with 19000 columns, will the
result file be smaller?

N.B:
When looking at the example of generating 100000 groups (grplots.c),the
size of the generated HD5 file is 78 Mo for 100000 empty groups
That means each group is about 780 bytes
https://support.hdfgroup.org/ftp/HDF5/examples/howto/crtmany/grplots.c

Guillaume Jacquenot

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

[Hdf-forum] Questions about size of generated Hdf5 files

Reply via email to