Hi,

in our organization the data that we need to store in HDF5 
typically varies in size very much, from small objects 10-20
bytes in size to very large several-MB per object (typically 
images). The chunks that we create for large objects tend to 
be large as well and they exceed the standard HDF5 setting for 
chunk cache size (1MB). That of course means that with the
default settings large chunks are never cached in memory.

We cannot reduce our chunk size as it will lead to way too
many chunks which causes other sorts of problems. Standard 
solution is of course is to set chunk cache size when reading 
data to a larger value. This does not work too well for us 
because we have a multitude of tools for HDF5 access -
C++, Matlab, h5py, IDL, etc.; and too many users that need 
some education about how to change cache size settings in each 
of those tools (which is not always trivial). The only 
reasonable solution that I found for now is to patch HDF5 
sources to increase default cache size value from 1MB to 32MB.
That has is own troubles because not everyone uses our patched 
HDF5 library of course.

I think it would be beneficial in cases like ours to have an
adaptive algorithm in HDF5 by default which can fit larger 
chunks in cache. Would it be possible to add something like
this to future HDF5 versions? I don't think it has to be 
complex, simplest thing would probably be "make sure that at 
least one chunk fits in cache unless user provides explicit
cache size for a dataset". If help is needed I could try to 
produce a patch which does that (will need some time to 
understand the code of course).

Thanks,
Andy


_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

Reply via email to