Re: [Hdf-forum] Optimising HDF5 data structure

Tamas Gal Fri, 31 Mar 2017 08:48:06 -0700

Sorry Ewan, I nearly missed your message!

> On 31. Mar 2017, at 13:28, Ewan Makepeace <[email protected]> wrote:
> My instinct in your situation would be to define a compound data structure to 
> represent one hit (it sounds as if you have done that) and then write a 
> dataset per event.

Yes, we use compound data structures for the hits right now.

> On 31. Mar 2017, at 13:28, Ewan Makepeace <[email protected]> wrote:
>>> To iterate through the events, I need to create a list of nodes and walk 
>>> over them, or I store the number of events as an attribute and simply use 
>>> an iterator.
> 
> I believe you can directly get the number of rows in each dataset and so I am 
> confused by the attribute suggestion. It seems performance was still an issue?

That was referring to the one-big-table with an event_id array. Kind of mocking 
the pytables indexing feature without having a strict pytables dependency, so 
like an extra dataset which stores the "from-to" index values for each event. 
Which is of course ugly ;)

> On 31. Mar 2017, at 13:28, Ewan Makepeace <[email protected]> wrote:
> Generally I find that performance is all about the chunk size - HDF will 
> generally read a whole chunk at a time and cache those chunks - have you 
> tried different chunk sizes?

I tried but obviously I did something wrong... ;)

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Re: [Hdf-forum] Optimising HDF5 data structure

Reply via email to