Sorry Ewan, I nearly missed your message! > On 31. Mar 2017, at 13:28, Ewan Makepeace <[email protected]> wrote: > My instinct in your situation would be to define a compound data structure to > represent one hit (it sounds as if you have done that) and then write a > dataset per event.
Yes, we use compound data structures for the hits right now. > On 31. Mar 2017, at 13:28, Ewan Makepeace <[email protected]> wrote: >>> To iterate through the events, I need to create a list of nodes and walk >>> over them, or I store the number of events as an attribute and simply use >>> an iterator. > > I believe you can directly get the number of rows in each dataset and so I am > confused by the attribute suggestion. It seems performance was still an issue? That was referring to the one-big-table with an event_id array. Kind of mocking the pytables indexing feature without having a strict pytables dependency, so like an extra dataset which stores the "from-to" index values for each event. Which is of course ugly ;) > On 31. Mar 2017, at 13:28, Ewan Makepeace <[email protected]> wrote: > Generally I find that performance is all about the chunk size - HDF will > generally read a whole chunk at a time and cache those chunks - have you > tried different chunk sizes? I tried but obviously I did something wrong... ;) _______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org Twitter: https://twitter.com/hdf5
