Hi Elena,

I just tried it on a local system with the XFS file system. The same issue
happens for H5F_LIBVER_EARLIEST, but for both H5F_LIBVER_18 and
H5F_LIBVER_LATEST
the bandwidth becomes stable (although still lower than the case with
NGROUP=128 by a factor of 1.5 ~ 2). Please let me know if you could
reproduce these results. Thanks!

Justin

2016-02-21 17:54 GMT-06:00 Elena Pourmal <[email protected]>:

> Hi Justin,
>
> Thanks a lot for the program! We will take a look.
>
> Just one more question. Have you tried to run your benchmark on some other
> file system?
>
> Thanks again!
>
> Elena
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> Elena Pourmal  The HDF Group  http://hdfgroup.org
> 1800 So. Oak St., Suite 203, Champaign IL 61820
> 217.531.6112
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
>
>
>
> On Feb 21, 2016, at 5:05 PM, Hsi-Yu Schive <[email protected]> wrote:
>
> Hi Elena,
>
> A simple code demonstrating this issue is attached. Please try to modify
> the variables "NGroup, LibVerLow, LibVerLow". NGroup gives the number of
> groups for a fixed number of datasets (NDataset), and the other two
> variables specify the file format. The size of each dataset is ~2 KB.
>
> I tried four different cases, with the combination of NGroup=1 or 128 and
> LibVerLow=H5F_LIBVER_EARLIEST or H5F_LIBVER_18. For NGroup=1, the I/O
> bandwidth drops dramatically when the file size exceeds ~ 3.4 GB. For
> NGroup=128, the bandwidth becomes reasonable. The results are similar for
> different LibVerLow (actually the results are a bit worse for H5F_LIBVER_18
> and H5F_LIBVER_LATEST than for H5F_LIBVER_EARLIEST ).
>
> Some system spec:
> HDF5 version: 1.8.16
> CPU: Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
> File system: gpfs
> OS: CentOS release 6.7
>
> Sincerely,
> Justin
>
> 2016-02-19 17:41 GMT-06:00 Elena Pourmal <[email protected]>:
>
>> Justin,
>>
>> Will it be possible for you to provide a program that illustrates the
>> problem? Which version of the library are you using? On which system are
>> you running your application?
>>
>> Thank you!
>>
>> Elena
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> Elena Pourmal  The HDF Group  http://hdfgroup.org
>> 1800 So. Oak St., Suite 203, Champaign IL 61820
>> 217.531.6112
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>
>>
>>
>>
>> On Feb 19, 2016, at 4:03 PM, Hsi-Yu Schive <[email protected]> wrote:
>>
>> Thanks for the suggestion. The performance I reported was measured using
>> the earliest file format (i.e., H5F_LIBVER_EARLIEST). I just tried to use
>> H5F_LIBVER_18, but it leads to an even worse performance. The bandwidth
>> starts to drop when N > ~ 0.5 million. Using H5F_LIBVER_LATEST does not
>> help either.
>>
>> Justin
>>
>> 2016-02-19 8:26 GMT-06:00 Gerd Heber <[email protected]>:
>>
>>> Are you using the latest version of the file format? In other words, are
>>> you using H5P_DEFAULT (-> earliest)
>>>
>>> as your file access property list, or have you created one which sets
>>> the library version bounds to H5F_LIBVER_18?
>>>
>>>
>>>
>>> See
>>> https://www.hdfgroup.org/HDF5/doc/RM/RM_H5P.html#Property-SetLibverBounds
>>>
>>>
>>>
>>> In the newer version, groups with large numbers of links and attributes
>>> are managed more.
>>>
>>>
>>>
>>> Does that solve your problem?
>>>
>>>
>>>
>>> Best, G.
>>>
>>>
>>>
>>>
>>>
>>> *From:* Hdf-forum [mailto:[email protected]] *On
>>> Behalf Of *Hsi-Yu Schive
>>> *Sent:* Thursday, February 18, 2016 2:36 PM
>>> *To:* [email protected]
>>> *Subject:* [Hdf-forum] I/O bandwidth drops dramatically and
>>> discontinuously for a large number of small datasets
>>>
>>>
>>>
>>> I encounter a sudden drop of I/O bandwidth when the number of datasets
>>> in a single group exceeds around 1.7 million. In the following I describe
>>> the issue in more detail.
>>>
>>>
>>>
>>> I'm converting an adaptive mesh refinement data to HDF5 format. Each
>>> dataset contains a small 4-D array with a size of ~ 10 KB in the compact
>>> format. All datasets are stored in the same group. When the total number of
>>> datasets (N) is smaller than ~ 1.7 million, I get an I/O bandwidth of ~100
>>> MB/s, which is acceptable. However, when N exceeds ~ 1.7 million, the
>>> bandwidth suddenly drops by at least one to two orders of magnitude.
>>>
>>>
>>>
>>> This issue seems to relate to the **number of datasets per group**
>>> instead of total data size. For example, if I reduce the size of each
>>> dataset by a factor of 5 (so ~2 KB per dataset), the I/O bandwidth stills
>>> drops when N > ~ 1.7 million, even though the total data size is reduced by
>>> a factor of 5.
>>>
>>>
>>>
>>> So I was wondering what causes this issue, and if there is any simple
>>> solution to that. Since the data stored in different datasets are
>>> independent to each other, I prefer not to combine them into a larger
>>> dataset. My current solution is to further create several HDF5 sub-groups
>>> under the main group, and then distribute all datasets evenly in these
>>> sub-groups (so that the number of datasets per group becomes smaller). By
>>> doing so the I/O bandwidth becomes stable even when N > 1.7 million.
>>>
>>>
>>>
>>> If necessary, I can post a simplified code to reproduce this issue.
>>>
>>>
>>>
>>> Hsi-Yu
>>>
>>> _______________________________________________
>>> Hdf-forum is for HDF software users discussion.
>>> [email protected]
>>> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>>> Twitter: https://twitter.com/hdf5
>>>
>>
>> _______________________________________________
>> Hdf-forum is for HDF software users discussion.
>> [email protected]
>> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>> Twitter: https://twitter.com/hdf5
>>
>>
>>
>> _______________________________________________
>> Hdf-forum is for HDF software users discussion.
>> [email protected]
>> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>> Twitter: https://twitter.com/hdf5
>>
>
> <HDF5_IO_Bandwidth__Justin.cpp>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [email protected]
> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
> Twitter: https://twitter.com/hdf5
>
>
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [email protected]
> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
> Twitter: https://twitter.com/hdf5
>
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Reply via email to