Re: [Pytables-users] Advice for new user

Alvaro Tejero Cantero Thu, 15 Mar 2012 23:56:32 -0700

Thanks Francesc, we're getting there :).

Some more precise questions below.


> Here it is how you can do that in PyTables:
>
> my_condition = '(col1>0.5) && (col2<24) && (col3 == "novel")'
> mycol4_values = [ r['col4'] for r in tbl.where(my_condtion) ]

ok, but having data upon which I want to operate also across columns
in Table columns means that if I cannot use numpy operations across
that dimensions. I can either do things in a loop (such as taking the
max of two numbers) or resort to the subset of operations supported by
numexpr. To give an example: if I wanted to do the FFT of all [i]
values of col01... col64, I would numpy.fft(izip(col01,..,col64))?
(potentially .appending the result to another column in the process)
vs. newcol = numpy.fft(largearr, axis=whatever) Is this correct?

>> Would using this syntax in PyTables trigger a copy in memory of all
>> 70Gb (or at least 3 out of 64 'detector channels'), i.e. about 3.3 Gb?
>
> Your original syntax yes.  The syntax that I proposed to you, no.
>
>> (Incidentally: is it a good idea to ask PyTables to index boolean
>> arrays that are going to be used for these kinds of queries?)
>
> No.  It is much better to index the columns on which you are going to perform 
> your queries, and the evaluate boolean expressions on top of the indexes.


Ok, I am just skeptical about how much indexing can do with the
random-looking data that I have (see this for an example:
http://media.wiley.com/wires/WCS/WCS1164/nfig008.gif).

I also wonder how much compression will help in this scenario.


> Yes, it is reasonable if you can express your problem in terms of tables 
> (please note that the Table object of PyTables does have support for 
> multidimensional columns).

But that is just my question! from the point of view of the
abstractions in PyTables (when querying, when compressing, when
indexing), is it better to create many columns, even if they are
completely homogeneous and tedious to manage separately, or is it
better a huge leaf value consisting of all the columns put together in
an array?

> Another possibility is to put all your potential indexes on a table, index 
>them and then use them to access data in the big array object.  Example:
>
> indexes = tbl.getWhereList(my_condition)
> my_values = arr[indexes, 1:4]

Ok, this is really useful (numpy.where on steroids?), because I should
 be able to reduce my original 64 columns to a few processed ones that
will then be used for queries.

Gracias mil!

Álvaro.

------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Re: [Pytables-users] Advice for new user

Reply via email to