Anthony Scopatz <[email protected]> schreef:
> Are you using compression on this EArray? This method is basically a thin
> wrapper over some HDF5 functions. I think that the data that you are asking
> for (inadvertently, maybe) is just expensive to get.
No, no compression. But I saw this is one of the first pytables data
sets I created years ago. The chunk size was not chosen well. I
improved that now (better chunk size/shape, transposed axes, and using
CArray) and things are roughly 50% faster.
But I still don't understand why so much data is apparently being read
when I only want to know which children (i.e. the leaf names) a group
contains. To do this in my program I loop over _v_children.items(),
i.e., like,
d = {}
for label, node in f.root.recordings.AB_5000._v_children.items():
d[label] = node
I would have expected code like this to yield a dictionary with node
objects, without reading/inspecting the data content that nodes
contain. But apparently under the hood HDF5 is looking at the contents
of the nodes, which takes a while if they are large, especially over a
usb3 connection. It is not reading the full array into RAM, because
the memory footprint of the python session doesn't increase
appreciably if I run the code above.
Thanks, all the best, Gabriel
------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed for production.
Get down to code-level detail for bottlenecks, with <2% overhead.
Download for free and get started troubleshooting in minutes.
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
_______________________________________________
Pytables-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pytables-users