A Dijous 03 Maig 2007 21:26, Matt Knox escrigué:
> Thanks for the reply, Francesc. You may be correct in that a table could
> work for this. It would require a bit of work because ideally the class
> interface would be the same as EArray and not the more complicated table
> interface, so I'd have to hack things up a bit to make a table appear like
> an array... but it probably is possible.

I beg to disagree that the Table interface is more complicated.  I'd say that 
its API is more featured, but you can do the same things that you do with an 
EArray by using the same (or equivalent) methods.  The richer API of the 
Table object is consequence of two facts:

- An inhomogeneous collection of data is a bit harder to manage than an 
homogenous one (or at very least, for the users that are not used to deal 
with the collections of records).

- The Table is the central corner of PyTables.  Many of the optimizations that 
are available in PyTables are only available for it (and this was so almost 
from the very beginning of PyTables because I think that a container that is 
meant for keeping inhomogeneous data is a much flexible and powerful tool 
than another that is only able to keep homogeneous data).

> Are their any performance considerations to be aware of when using tables
> vs arrays? 

Not many, really.  In fact, Table objects does implement the Row() interface, 
that allows traversing the Table data very efficiently through iterators that 
support I/O buffering on a transparent way.  This is a value added tool that 
other containers in PyTables do miss.

There can be other considerations, like for example, if you declare very long 
records, your throughput can be negatively affected.  However, the same 
happens if the atoms of your EArrays (for example) are too big.  Another 
example of performance implications is that Table I/O buffers can easily end 
containing columns that are unaligned, and that could in principle affect 
performance.  However, PyTables has machinery inside that is aware of this 
issue, and apply optimization techniques (as for one in the integrated 
numexpr engine) in order to smooth their negative impact in performance.

So, by any means, get used to Table containers: you won't regret it.

> One slight problem with this approach is that columns can't be dynamically
> added/removed from tables (as far as I know), so that boolean column would
> always need to be there even if the MaskedArray had no masked values to
> account for the possibility of masked values being appended to it in the
> future. This wouldn't impact functionality, but would impact performance a
> bit.

As Ivan already suggested, you can use compression so as to reduce the impact 
in performance.  Interestingly enough, many benchmarks confirm the idea that 
using compression actually improve the performance of reading the data in 
many situations (while not reducing very much the writing speed). See the 
chapter 5 of User's Manual for an in-depth explanation of compression issues.

Cheers,

-- 
>0,0<   Francesc Altet     http://www.carabos.com/
V   V   Cárabos Coop. V.   Enjoy Data
 "-"

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Pytables-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to