Re: [Pytables-users] searching for group names

2013-08-08 Thread Gabriel J.L. Beckers

Anthony Scopatz scop...@gmail.com schreef:

 Are you using compression on this EArray?  This method is basically a thin
 wrapper over some HDF5 functions. I think that the data that you are asking
 for (inadvertently, maybe) is just expensive to get.

No, no compression. But I saw this is one of the first pytables data  
sets I created years ago. The chunk size was not chosen well. I  
improved that now (better chunk size/shape, transposed axes, and using  
CArray) and things are roughly 50% faster.

But I still don't understand why so much data is apparently being read  
when I only want to know which children (i.e. the leaf names) a group  
contains. To do this in my program I loop over _v_children.items(),  
i.e., like,

d = {}
for label, node in f.root.recordings.AB_5000._v_children.items():
d[label] = node

I would have expected code like this to yield a dictionary with node  
objects, without reading/inspecting the data content that nodes  
contain. But apparently under the hood HDF5 is looking at the contents  
of the nodes, which takes a while if they are large, especially over a  
usb3 connection. It is not reading the full array into RAM, because  
the memory footprint of the python session doesn't increase  
appreciably if I run the code above.

Thanks, all the best, Gabriel


--
Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed for production.
Get down to code-level detail for bottlenecks, with 2% overhead. 
Download for free and get started troubleshooting in minutes. 
http://pubads.g.doubleclick.net/gampad/clk?id=48897031iu=/4140/ostg.clktrk
___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] searching for group names

2013-08-07 Thread Gabriel J.L. Beckers
Hi,

I don't know if this is related in any way to Gergo's problem, but I  
have slow responses when querying which children a group contains, if  
that group contains big leafs. I am using pytables 2.5 and hdf5 1.8.9  
on linux 64 bit.

Specifically, I found that using the _g_get_objinfo method (which is  
used by other methods that I use) is slow when used on a large leaf.  
The slowness is proportional to the size of the leaf. It is almost as  
if some process is actually reading the data instead of just info on  
the type of data. I am noticing this because my data is on an external  
usb3 disk. To give you an idea: that method takes almost 80 seconds to  
return the string 'Leaf' when used on a 5 Gb EArray. That should  
roughly correspond to reading the complete disk-based array. The info  
is cached somehow, because if I run the method a second time in the  
same python session it is very fast.

If I copy my hdf5 file to my SSD disk, things are much faster, but  
running the method still takes 2 seconds or so on a 5 Gb leaf.

Is this expected behavior and should I just avoid this method in my  
applications, or is something wrong?

Best, Gabriel

Anthony Scopatz scop...@gmail.com schreef:

 On Mon, Aug 5, 2013 at 4:11 AM, Nyirő Gergő gergo.ny...@gmail.com wrote:

 Hello,


 We develop a measurement evaluation tool, and we'd like to use
 pytables/hdf5 as a middle layer for signal accessing.

 We have to deal with the silly structure of the recorder device
 measurement format.



 The signals can be accessed via two identifiers:

 * device name: source of the signal-channel of the
 message-another tag-yet another tag

 * signal name



 The first identifier says the source information of the signal, which
 can be quite long.

 Therefore I grouped the device name into two layers:

 /source of the signal

 /channel of the message...

 /signal name



 So if you have the same message from two channels, than you will get
 /foo-device-name

 /channel-1

 /bar

 /baz

 /channel-2

 /bar

 /baz



 Besides signal loading, we have to search for signal name as fast as
 possible, and return with the shortest unique device name part and the
 signal name.

 Using the structure above, iterating over the group names is quite
 slow. So I build up a table from device and signal name.

 As far as I know, the pytables query does not support string searching
 (e.g. startswidth, *foo[0-9]ch*, etc.), so fetching this table lead us
 to a pure python loop which is slow again.

 Therefore I build up a python dictionary from the table, which provide
 fast iteration against the table, but the init time increased from 100
 ms to 3-4 sec (we have more than 40 000 signals).



 Do you have any advice how to search for group names in hdf5 with
 pytables in an efficient way?


 Hi grego,

 Searching through group names, like accessing all HDF5 metadata, is slow.
  For group names this is because rather than searching through a list you
 are traversing a B-tree, IIRC.  So you have to use the couple of tricks
 that you used: 1) have another Table / Array of all table names, 2) read
 this in once to a native Python data structure (dict here).

 However, 4 sec to read in this table seems excessive for data of this size.
  You are probably not reading this in properly.  You should be using:

 raw_grps = f.root.grp_names[:]

 or similar.

 Maybe other people have some other ideas.

 Be Well
 Anthony



 ps: I would be most happy with a glob interface.



 thanks for your advices in advance,

 gergo


 --
 Get your SQL database under version control now!
 Version control is standard for application code, but databases havent
 caught up. So what steps can you take to put your SQL databases under
 version control? Why should you start doing it? Read more to find out.
 http://pubads.g.doubleclick.net/gampad/clk?id=49501711iu=/4140/ostg.clktrk
 ___
 Pytables-users mailing list
 Pytables-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/pytables-users





--
Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed for production.
Get down to code-level detail for bottlenecks, with 2% overhead. 
Download for free and get started troubleshooting in minutes. 
http://pubads.g.doubleclick.net/gampad/clk?id=48897031iu=/4140/ostg.clktrk
___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


[Pytables-users] tables.expr behavior

2010-03-16 Thread Gabriel J.L. Beckers
Dear list,

I ran into the following.

If you multiply a 10-element pytables array with a 1-element NumPy array using
tables.expr, then the return value has length one. If you do the same  
in pure NumPy, the
return value has length 10.

I pasted a small script below shows this.

Is this is expected or not?

Best, Gabriel

Script:
+++


import tables as tb
import numpy as np

f = tb.openFile('test.h5', 'a')

factor = np.array([3.])
ar = np.arange(10.)

# PyTables
ar = f.createArray(f.root, 'test1', ar)
e = tb.Expr('factor*ar')
print e.eval() # [ 0.]

# NumPy
print factor*ar # [  0.   3.   6.   9.  12.  15.  18.  21.  24.  27.]

f.close()




--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


[Pytables-users] table iterator surprise

2009-08-25 Thread Gabriel J.L. Beckers

Hi List,

I ran into unexpected behavior when iterating over rows in a table  
(PyTables version '2.2a2').


The attached script shows the problem (minimal example), but, briefly:  
if one iterates over rows in a table, and checks in the loop if the  
row has a certain key, the iterator will jump to the last row and skip  
all other rows.


I admit checking for a key is a stupid thing to do (one should look at  
the table colnames before the loop), and I don't even know if the row  
type is intended to support this. But this aside, is the iterator  
behavior shown here a bug?


Cheers, Gabriel
import numpy as np
import tables as tb

dtype = np.dtype([('number', np.int)])
structar = np.arange(10).view(dtype)

h5f = tb.openFile('testfile.h5','w')
t = h5f.createTable(h5f.root, 'testtable', structar)
h5f.flush()

print this works as expected
for row in t.iterrows():
print row

print this doesn't
for row in t.iterrows():
if 'number' in row:
pass
print row

h5f.close()



--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users