Hi,

I'm trying to understand a performance hit that we are
experiencing trying to examine the tree structure of
our HDF5 files. Originally we observed problem when 
using h5py but it could be reproduced even with h5ls 
command. I tracked it down to a significant delay in
the call to H5Oget_info_by_name function on a dataset 
with a large number of chunks. It looks like when the
number of chunks in dataset increases (in our case 
we have 1-10k chunks) the performance of the H5Oget_info
drops significantly. Looking at the IO statistics it 
seems that HDF5 library does very many small IO operations 
in this case. There is very little CPU spent, but real
time is measured in tens of seconds.

Is this an expected behavior? Can it be improved somehow
without reducing the number of chunks drastically?

One more comment about H5Oget_info - it returns a 
structure that contains a lot of different info. 
In the case of h5py code the only member of the 
structure used in the code is "type". could there be 
more efficient way to determine just the type of the 
object without requiring every other piece of info?

Regards,
Andy


_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Reply via email to