Re: [Hdf-forum] I/O bandwidth drops dramatically and discontinuously for a large number of small datasets

Elena Pourmal Fri, 19 Feb 2016 15:44:37 -0800

Justin,

Will it be possible for you to provide a program that illustrates the problem? 
Which version of the library are you using? On which system are you running 
your application?


Thank you!

Elena
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Elena Pourmal  The HDF Group  http://hdfgroup.org
1800 So. Oak St., Suite 203, Champaign IL 61820
217.531.6112
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~




On Feb 19, 2016, at 4:03 PM, Hsi-Yu Schive 
<[email protected]<mailto:[email protected]>> wrote:

Thanks for the suggestion. The performance I reported was measured using the 
earliest file format (i.e., H5F_LIBVER_EARLIEST). I just tried to use 
H5F_LIBVER_18, but it leads to an even worse performance. The bandwidth starts 
to drop when N > ~ 0.5 million. Using H5F_LIBVER_LATEST does not help either.

Justin

2016-02-19 8:26 GMT-06:00 Gerd Heber 
<[email protected]<mailto:[email protected]>>:
Are you using the latest version of the file format? In other words, are you 
using H5P_DEFAULT (-> earliest)
as your file access property list, or have you created one which sets the 
library version bounds to H5F_LIBVER_18?

See https://www.hdfgroup.org/HDF5/doc/RM/RM_H5P.html#Property-SetLibverBounds

In the newer version, groups with large numbers of links and attributes are 
managed more.

Does that solve your problem?

Best, G.


From: Hdf-forum 
[mailto:[email protected]<mailto:[email protected]>]
 On Behalf Of Hsi-Yu Schive
Sent: Thursday, February 18, 2016 2:36 PM
To: [email protected]<mailto:[email protected]>
Subject: [Hdf-forum] I/O bandwidth drops dramatically and discontinuously for a 
large number of small datasets

I encounter a sudden drop of I/O bandwidth when the number of datasets in a 
single group exceeds around 1.7 million. In the following I describe the issue 
in more detail.

I'm converting an adaptive mesh refinement data to HDF5 format. Each dataset 
contains a small 4-D array with a size of ~ 10 KB in the compact format. All 
datasets are stored in the same group. When the total number of datasets (N) is 
smaller than ~ 1.7 million, I get an I/O bandwidth of ~100 MB/s, which is 
acceptable. However, when N exceeds ~ 1.7 million, the bandwidth suddenly drops 
by at least one to two orders of magnitude.

This issue seems to relate to the **number of datasets per group** instead of 
total data size. For example, if I reduce the size of each dataset by a factor 
of 5 (so ~2 KB per dataset), the I/O bandwidth stills drops when N > ~ 1.7 
million, even though the total data size is reduced by a factor of 5.

So I was wondering what causes this issue, and if there is any simple solution 
to that. Since the data stored in different datasets are independent to each 
other, I prefer not to combine them into a larger dataset. My current solution 
is to further create several HDF5 sub-groups under the main group, and then 
distribute all datasets evenly in these sub-groups (so that the number of 
datasets per group becomes smaller). By doing so the I/O bandwidth becomes 
stable even when N > 1.7 million.

If necessary, I can post a simplified code to reproduce this issue.

Hsi-Yu

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]<mailto:[email protected]>
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]<mailto:[email protected]>
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Re: [Hdf-forum] I/O bandwidth drops dramatically and discontinuously for a large number of small datasets

Reply via email to