Thanks Francesc, we're getting there :). Some more precise questions below.
> Here it is how you can do that in PyTables: > > my_condition = '(col1>0.5) && (col2<24) && (col3 == "novel")' > mycol4_values = [ r['col4'] for r in tbl.where(my_condtion) ] ok, but having data upon which I want to operate also across columns in Table columns means that if I cannot use numpy operations across that dimensions. I can either do things in a loop (such as taking the max of two numbers) or resort to the subset of operations supported by numexpr. To give an example: if I wanted to do the FFT of all [i] values of col01... col64, I would numpy.fft(izip(col01,..,col64))? (potentially .appending the result to another column in the process) vs. newcol = numpy.fft(largearr, axis=whatever) Is this correct? >> Would using this syntax in PyTables trigger a copy in memory of all >> 70Gb (or at least 3 out of 64 'detector channels'), i.e. about 3.3 Gb? > > Your original syntax yes. The syntax that I proposed to you, no. > >> (Incidentally: is it a good idea to ask PyTables to index boolean >> arrays that are going to be used for these kinds of queries?) > > No. It is much better to index the columns on which you are going to perform > your queries, and the evaluate boolean expressions on top of the indexes. Ok, I am just skeptical about how much indexing can do with the random-looking data that I have (see this for an example: http://media.wiley.com/wires/WCS/WCS1164/nfig008.gif). I also wonder how much compression will help in this scenario. > Yes, it is reasonable if you can express your problem in terms of tables > (please note that the Table object of PyTables does have support for > multidimensional columns). But that is just my question! from the point of view of the abstractions in PyTables (when querying, when compressing, when indexing), is it better to create many columns, even if they are completely homogeneous and tedious to manage separately, or is it better a huge leaf value consisting of all the columns put together in an array? > Another possibility is to put all your potential indexes on a table, index >them and then use them to access data in the big array object. Example: > > indexes = tbl.getWhereList(my_condition) > my_values = arr[indexes, 1:4] Ok, this is really useful (numpy.where on steroids?), because I should be able to reduce my original 64 columns to a few processed ones that will then be used for queries. Gracias mil! Álvaro. ------------------------------------------------------------------------------ This SF email is sponsosred by: Try Windows Azure free for 90 days Click Here http://p.sf.net/sfu/sfd2d-msazure _______________________________________________ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users