Note that the HDF5 chunk cache size can be very important. HDF5 does not look at the access pattern to estimate the optimal cache size. If your access pattern is not sequential, you need to set a cache size that minimizes I/O for that access pattern. I've noted that accessing small hyperslabs is quite slow in HDF5, probably due to B-tree lookup overhead.
Some colleagues at sister institutes have used the ADIOS data system developed at Oak Ridge and said it was much faster than HDF5. However, AFAIK it can use a lot of memory to achieve it. But a large chunk cache is not much different. BTW. I assume that in your tests both ROOT and HDF5 used cold data, thus no data was already available in the system file buffers. - Ger >>> Tamas Gal <[email protected]> 31-Mar-17 17:46 >>> Sorry Ewan, I nearly missed your message! > On 31. Mar 2017, at 13:28, Ewan Makepeace <[email protected]> wrote: > My instinct in your situation would be to define a compound data structure to represent one hit (it sounds as if you have done that) and then write a dataset per event. Yes, we use compound data structures for the hits right now. > On 31. Mar 2017, at 13:28, Ewan Makepeace <[email protected]> wrote: >>> To iterate through the events, I need to create a list of nodes and walk over them, or I store the number of events as an attribute and simply use an iterator. > > I believe you can directly get the number of rows in each dataset and so I am confused by the attribute suggestion. It seems performance was still an issue? That was referring to the one-big-table with an event_id array. Kind of mocking the pytables indexing feature without having a strict pytables dependency, so like an extra dataset which stores the "from-to" index values for each event. Which is of course ugly ;) > On 31. Mar 2017, at 13:28, Ewan Makepeace <[email protected]> wrote: > Generally I find that performance is all about the chunk size - HDF will generally read a whole chunk at a time and cache those chunks - have you tried different chunk sizes? I tried but obviously I did something wrong... ;) _______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org Twitter: https://twitter.com/hdf5
_______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org Twitter: https://twitter.com/hdf5
