Hello everyone! I am creating a HDF5 file from a Fortran program, and I am confused about the size of my generated HDF5 file.
I am writing 19000 datasets with 21 values of 64 bit (real number). I write one value at a time, and extend with one each of the 19000 datasets everytime. All data are correctly written. But the generated file is more than 48 Mo. I expected the total size of the file to be a little bigger than the raw data, about 3.2Mo (21*19000*8 / 1e6=3.192Mo) If I only create 19000 empty datasets, I obtain a 6Mo Hdf5 file, which means each empty dataset is about 400 bytes. I guess I could create a ~10 Mo (6Mo + 3.2Mo) Hdf5 file that can contain everything. For comparaison,if I write everything in a text file, where each real number is written with 15 characters, I obtain a 6 Mo CSV file. Question 1) Is this behaviour normal? Question 2) Does extending dataset each time we write data inside can significantly increase the total required space disk size? Does preallocating dataset and using hyperslab can save some space? Does chunk parameters can impact the size of generated hdf5 file Question 3) If I pack everything in a compound dataset with 19000 columns, will the result file be smaller? N.B: When looking at the example of generating 100000 groups (grplots.c),the size of the generated HD5 file is 78 Mo for 100000 empty groups That means each group is about 780 bytes https://support.hdfgroup.org/ftp/HDF5/examples/howto/crtmany/grplots.c Guillaume Jacquenot
_______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org Twitter: https://twitter.com/hdf5
