Hi, I'm trying to understand a performance hit that we are experiencing trying to examine the tree structure of our HDF5 files. Originally we observed problem when using h5py but it could be reproduced even with h5ls command. I tracked it down to a significant delay in the call to H5Oget_info_by_name function on a dataset with a large number of chunks. It looks like when the number of chunks in dataset increases (in our case we have 1-10k chunks) the performance of the H5Oget_info drops significantly. Looking at the IO statistics it seems that HDF5 library does very many small IO operations in this case. There is very little CPU spent, but real time is measured in tens of seconds.
Is this an expected behavior? Can it be improved somehow without reducing the number of chunks drastically? One more comment about H5Oget_info - it returns a structure that contains a lot of different info. In the case of h5py code the only member of the structure used in the code is "type". could there be more efficient way to determine just the type of the object without requiring every other piece of info? Regards, Andy _______________________________________________ Hdf-forum is for HDF software users discussion. Hdf-forum@hdfgroup.org http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org