Thanks for the suggestion. The performance I reported was measured using
the earliest file format (i.e., H5F_LIBVER_EARLIEST). I just tried to use
H5F_LIBVER_18, but it leads to an even worse performance. The bandwidth
starts to drop when N > ~ 0.5 million. Using H5F_LIBVER_LATEST does not
help either.

Justin

2016-02-19 8:26 GMT-06:00 Gerd Heber <[email protected]>:

> Are you using the latest version of the file format? In other words, are
> you using H5P_DEFAULT (-> earliest)
>
> as your file access property list, or have you created one which sets the
> library version bounds to H5F_LIBVER_18?
>
>
>
> See
> https://www.hdfgroup.org/HDF5/doc/RM/RM_H5P.html#Property-SetLibverBounds
>
>
>
> In the newer version, groups with large numbers of links and attributes
> are managed more.
>
>
>
> Does that solve your problem?
>
>
>
> Best, G.
>
>
>
>
>
> *From:* Hdf-forum [mailto:[email protected]] *On
> Behalf Of *Hsi-Yu Schive
> *Sent:* Thursday, February 18, 2016 2:36 PM
> *To:* [email protected]
> *Subject:* [Hdf-forum] I/O bandwidth drops dramatically and
> discontinuously for a large number of small datasets
>
>
>
> I encounter a sudden drop of I/O bandwidth when the number of datasets in
> a single group exceeds around 1.7 million. In the following I describe the
> issue in more detail.
>
>
>
> I'm converting an adaptive mesh refinement data to HDF5 format. Each
> dataset contains a small 4-D array with a size of ~ 10 KB in the compact
> format. All datasets are stored in the same group. When the total number of
> datasets (N) is smaller than ~ 1.7 million, I get an I/O bandwidth of ~100
> MB/s, which is acceptable. However, when N exceeds ~ 1.7 million, the
> bandwidth suddenly drops by at least one to two orders of magnitude.
>
>
>
> This issue seems to relate to the **number of datasets per group** instead
> of total data size. For example, if I reduce the size of each dataset by a
> factor of 5 (so ~2 KB per dataset), the I/O bandwidth stills drops when N >
> ~ 1.7 million, even though the total data size is reduced by a factor of 5.
>
>
>
> So I was wondering what causes this issue, and if there is any simple
> solution to that. Since the data stored in different datasets are
> independent to each other, I prefer not to combine them into a larger
> dataset. My current solution is to further create several HDF5 sub-groups
> under the main group, and then distribute all datasets evenly in these
> sub-groups (so that the number of datasets per group becomes smaller). By
> doing so the I/O bandwidth becomes stable even when N > 1.7 million.
>
>
>
> If necessary, I can post a simplified code to reproduce this issue.
>
>
>
> Hsi-Yu
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [email protected]
> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
> Twitter: https://twitter.com/hdf5
>
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Reply via email to