Hi Louis, A Dimarts 14 Març 2006 00:05, Louis Wicker va escriure: > Hi: > > I have a 300,000+ line Pytable having 25 columns of various data, > mostly floats. > > I need to do a set of simultaneous searches on this data, based on > observation type, time, location... > > Here is the code I am using to do the search (here I am just > searching for 5 columns of data, I would like to do all 25 and have > in other ways, not much diff): > > for o in table.iterrows(): > > ob_date = datetime.datetime(o['year'], o['month'], o > ['day'], o['hour'], o['min'], o['sec']) > > if o['status'] == status_in: > > if o['type'] == obtype_in > and \ > tgrid[0] <= ob_date < tgrid[1] and \ > xgrid[1] <= o['x'] < xgrid[2] > and \ > ygrid[1] <= o['y'] < ygrid[2] > and \ > zgrid[1] <= o['z'] < zgrid[2]: > > value.append(o['value']) > Hxa.append(o['Hxa']) > Hxf.append(o['Hxf']) > sdHxa.append(o['sdHxa']) > sdHxf.append(o['sdHxf']) > > the only "funny" deal here is that I am using the python datetime > module to compare the times, but from my tests and trials with and > without it I don't think that is the problem (?).
Yeah, I don't think so either. The best would to instrument your code by spreading time counters and determine which lines are the responsible for the slowdown (you can also use a regular profiler). In any case, are you aware that you can make use of column of type TimeCol? That would replace 6 lookups and a function call by just a single lookup in the table. > It takes about 7 sec on the Mac G5 (2.5 ghz) and 8.5 sec on an SGI > Altix. I think that is not very good, right? Well, it depends, but it is clear that this not not enough for you at least, which is the important thing ;-). My guess is that you have a quite complex query made entirely in python space, and that is the most probable cause of the slowness. But profiling the code would be a nice thing to do, just to be sure. > I have also tried "inlining" each variable with a search (really > slow), or try using the table.where functions, again, slow. IMO, the best way would be to be able to do the complete query at C level. The Table.where method is the way that PyTables has to execute queries at C level. However, and as you know, currently PyTables only does support single queries in the where method. We are addressing this limitation in the forthcoming PyTables Pro [1]. The good news is that we are getting very promising speed improvement over plain PyTables. The bad news is that we are already having a delay in the scheduling of the product; nevertheless, we think that we would be able to deliver it during the first half of 2006. > Another problem I am having with this is that I need to search over > 20-30 individual time bins (like above), and I cannot figure out a > way to bin up the data > in one pass, e.g., I am calling the above loop 20-30 times, so to > data mine this table I have a longer wait than I thought I would.... Well, one possibility would be to pass the complete time bins to the condition, and retrieve a binning as a result (instead of a single scalar). Not sure about the added difficulty of this one, though. [1] http://www.carabos.com/products/pytables-pro.html Cheers, -- >0,0< Francesc Altet http://www.carabos.com/ V V Cárabos Coop. V. Enjoy Data "-" ------------------------------------------------------- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642 _______________________________________________ Pytables-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/pytables-users
