Hi Louis,

A Dimarts 14 Març 2006 00:05, Louis Wicker va escriure:
> Hi:
>
> I have a 300,000+ line Pytable having 25 columns of various data,
> mostly floats.
>
> I need to do a set of simultaneous searches on this data, based on
> observation type, time, location...
>
> Here is the code I am using to do the search (here I am just
> searching for 5 columns of data, I would like to do all 25 and have
> in other ways, not much diff):
>
>        for o in table.iterrows():
>
>              ob_date = datetime.datetime(o['year'], o['month'], o
> ['day'], o['hour'], o['min'], o['sec'])
>
>              if o['status'] == status_in:
>
>                    if o['type'] == obtype_in
> and  \
>                       tgrid[0]  <= ob_date  < tgrid[1]          and  \
>                       xgrid[1]  <=  o['x']  < xgrid[2]
> and  \
>                       ygrid[1]  <=  o['y']  < ygrid[2]
> and  \
>                       zgrid[1]  <=  o['z']  < zgrid[2]:
>
>                          value.append(o['value'])
>                          Hxa.append(o['Hxa'])
>                          Hxf.append(o['Hxf'])
>                          sdHxa.append(o['sdHxa'])
>                          sdHxf.append(o['sdHxf'])
>
> the only "funny" deal here is that I am using the python datetime
> module to compare the times, but from my tests and trials with and
> without it I don't think that is the problem (?).

Yeah, I don't think so either. The best would to instrument your code
by spreading time counters and determine which lines are the
responsible for the slowdown (you can also use a regular profiler). In
any case, are you aware that you can make use of column of type
TimeCol? That would replace 6 lookups and a function call by just a
single lookup in the table.

> It takes about 7 sec on the Mac G5 (2.5 ghz) and 8.5 sec on an SGI
> Altix.  I think that is not very good, right?

Well, it depends, but it is clear that this not not enough for you at
least, which is the important thing ;-). My guess is that you have a
quite complex query made entirely in python space, and that is the
most probable cause of the slowness. But profiling the code would be
a nice thing to do, just to be sure.

> I have also tried "inlining" each variable with a search (really
> slow), or try using the table.where functions, again, slow.

IMO, the best way would be to be able to do the complete query at C
level. The Table.where method is the way that PyTables has to execute
queries at C level. However, and as you know, currently PyTables only
does support single queries in the where method. We are addressing
this limitation in the forthcoming PyTables Pro [1]. The good news is
that we are getting very promising speed improvement over plain
PyTables. The bad news is that we are already having a delay in the
scheduling of the product; nevertheless, we think that we would be
able to deliver it during the first half of 2006.

> Another problem I am having with this is that I need to search over
> 20-30 individual time bins (like above), and I cannot figure out a
> way to bin up the data
> in one pass, e.g., I am calling the above loop 20-30 times, so to
> data mine this table I have a longer wait than I thought I would....

Well, one possibility would be to pass the complete time bins to the
condition, and retrieve a binning as a result (instead of a single
scalar). Not sure about the added difficulty of this one, though.

[1] http://www.carabos.com/products/pytables-pro.html

Cheers,

-- 
>0,0<   Francesc Altet     http://www.carabos.com/
V   V   Cárabos Coop. V.   Enjoy Data
 "-"



-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642
_______________________________________________
Pytables-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to