Hi Andrey and Werner, Thank you for your input. Global cache is one of the improvements we are considering to the library. We have more and more applications that open and process multiple files and dataset. You are absolutely correct that memory becomes an issue in this case. Global cache is one of the improvements we are considering to the library to address the problem.
There is one thing to remember: chunk cache is important when a chunk is compressed and is accessed multiple time. If this is not the case (for example, application always read a subset that contains the whole chunks), one can disable chunk cache completely to reduce application memory footprint. Elena ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Elena Pourmal The HDF Group http://hdfgroup.org 1800 So. Oak St., Suite 203, Champaign IL 61820 217.531.6112 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ On May 13, 2015, at 8:23 AM, Werner Benger <[email protected]> wrote: > Hi Andrey, > > just to mention, there are more buffers and caches involved for HDF5 > datasets, for instance the Sieve buffer: > > https://www.hdfgroup.org/HDF5/doc/RM/RM_H5P.html#Property-SetSieveBufSize > > It was this one that gave me memory headaches at some point, though it seems > solved in the current HDF5 version. > > A global cache value would make sense and be convenient, possibly combined > with a setting how much to prioritize each of the individual cache sizes. > > Werner > > > On 13.05.2015 12:01, Андрей Парамонов wrote: >> Hello HDF5 developers! >> >> Currently, HDF5 library presents two ways to control cache size when >> accessing datasets: >> >> * H5Pset_cache / H5Pget_cache >> * H5Pset_chunk_cache / H5Pget_chunk_cache >> >> The former control the default cache buffer size for all datasets, while the >> latter allow to fine-tune the cache buffer size on per-dataset basis. >> >> It works nicely in many cases. However working with bigger, multi-dataset >> HDF5 files reveals a considerable flow. Cache is way to trade memory for >> speed. How much memory one would trade naturally depends on the total memory >> available, i.e. memory is (a scarce) global resource. Thus, more often than >> not it is desirable to set *global* cache size for *all* HDF5 datasets, >> regardless of number of datasets (and even files) open. >> >> E.g, I'd like to be able to say: "Use no more than 1GB of memory for cache" >> instead of "Use no more than 50MB of memory for caching each dataset". The >> latter is not as useful as the former, as number of datasets may vary >> greatly. >> >> Currently there seems no way to impose global cache size limit. Would it be >> hard to implement such a feature, in one of future versions? >> >> Thank you for your work, >> Andrey Paramonov >> > > -- > ___________________________________________________________________________ > Dr. Werner Benger Visualization Research > Center for Computation & Technology at Louisiana State University (CCT/LSU) > 2019 Digital Media Center, Baton Rouge, Louisiana 70803 > Tel.: +1 225 578 4809 Fax.: +1 225 578-5362 > > > _______________________________________________ > Hdf-forum is for HDF software users discussion. > [email protected] > http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org > Twitter: https://twitter.com/hdf5 _______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org Twitter: https://twitter.com/hdf5
