[Pytables-users] selecting subsets of tables

Francesc Altet Tue, 24 Oct 2006 05:09:01 -0700

Hi Jan,

First of all: if you want to write to this list, please, subscribe first. 
Failing to do this will make your message to bounce and be sent to me (as the 
administrator of the list) together with a bunch of others that are mainly 
spam (i.e. maximizing the possibilities that your message could be definitely 
lost in such a maremagnum).


Now, see my answer to the question intertwined in your original message:

--------------------------------------------------------------------------------------------------
> From: Jan Strube <[EMAIL PROTECTED]>
> To: [email protected]
> Date: Yesterday 09:16:28
>
> Hi List,
> first a thank you to Francesc, whose suggestion to my last question was
> really helpful.
>
> I am having a little understanding problem now:
> What I would like to do is simply apply a selection to a file and write out
> all rows that pass the selection to a new file.
>
> The way to do this is obviously the whereAppend function, and its great.
> However, it only takes one selection, just as the where method does,
> but I need to really apply a more sophisticated selection.
> Now here's the thing that's not intuitive to me:
>
> Getting a selection with the slicing mechanism returns a
> NestedRecArray object that I can append to the table.
> The problem is, I don't know the index of the row in the table.
>
> So as far as I understand I have the following problem:
> a) whereAppend doesn't let me perform a selection on more than one property
> b) array slicing doesn't work for me, because I need to make the selection
> based on properties, not indices c) the generator approach (e.g., [row for
> row in tab.where(tab.cols.x < y)]) doesn't produce anything that I can use
> in a table.append call. This is the part that surprised me, because naively
> I would have thought the elements of table[a:b] and [x for x in
> table.where(table.cols.x<y)] to be of the same type.
>
> I have read that in the future table.where is supposed to support selection
> on multiple properties and I expect this also goes for table.whereAppend,
> but how can I perform the selection in the meantime without having to write
> out each column explicitly ?

I think that this can be achieved by first selecting the indexes of the 
interesting rows, then doing a sparse read of the table and finally write the 
resulting recarray to the destination table.

The problematic part for you seems to retrieve the indexes of rows that 
fulfills a complex condition. The best way to do this is making use of 
the .nrow attribute of the Row instance that is returned in each iteration of 
Table.__iter__ (or Table.where) iterator. The next is an example:

interesting_rows = [ row.nrow for row in table.where(single_cond)
                                                   if other_complex_cond ]

Now, you can do the sparse read:

sparse_table = table.readCoordinates(interesting_rows)

And finally, you can add the above recarray to the destination:

dest_table.append(sparse_table)

And you are right: when PyTables 2.0 would be out, you will be able to use 
whereAppend() to do all the above in just one shot :-)

HTH,

-- 
>0,0<   Francesc Altet     http://www.carabos.com/
V   V   Cárabos Coop. V.   Enjoy Data
 "-"

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Pytables-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pytables-users

[Pytables-users] selecting subsets of tables

Reply via email to