A Dijous 03 Maig 2007 21:26, Matt Knox escrigué: > Thanks for the reply, Francesc. You may be correct in that a table could > work for this. It would require a bit of work because ideally the class > interface would be the same as EArray and not the more complicated table > interface, so I'd have to hack things up a bit to make a table appear like > an array... but it probably is possible.
I beg to disagree that the Table interface is more complicated. I'd say that its API is more featured, but you can do the same things that you do with an EArray by using the same (or equivalent) methods. The richer API of the Table object is consequence of two facts: - An inhomogeneous collection of data is a bit harder to manage than an homogenous one (or at very least, for the users that are not used to deal with the collections of records). - The Table is the central corner of PyTables. Many of the optimizations that are available in PyTables are only available for it (and this was so almost from the very beginning of PyTables because I think that a container that is meant for keeping inhomogeneous data is a much flexible and powerful tool than another that is only able to keep homogeneous data). > Are their any performance considerations to be aware of when using tables > vs arrays? Not many, really. In fact, Table objects does implement the Row() interface, that allows traversing the Table data very efficiently through iterators that support I/O buffering on a transparent way. This is a value added tool that other containers in PyTables do miss. There can be other considerations, like for example, if you declare very long records, your throughput can be negatively affected. However, the same happens if the atoms of your EArrays (for example) are too big. Another example of performance implications is that Table I/O buffers can easily end containing columns that are unaligned, and that could in principle affect performance. However, PyTables has machinery inside that is aware of this issue, and apply optimization techniques (as for one in the integrated numexpr engine) in order to smooth their negative impact in performance. So, by any means, get used to Table containers: you won't regret it. > One slight problem with this approach is that columns can't be dynamically > added/removed from tables (as far as I know), so that boolean column would > always need to be there even if the MaskedArray had no masked values to > account for the possibility of masked values being appended to it in the > future. This wouldn't impact functionality, but would impact performance a > bit. As Ivan already suggested, you can use compression so as to reduce the impact in performance. Interestingly enough, many benchmarks confirm the idea that using compression actually improve the performance of reading the data in many situations (while not reducing very much the writing speed). See the chapter 5 of User's Manual for an in-depth explanation of compression issues. Cheers, -- >0,0< Francesc Altet http://www.carabos.com/ V V Cárabos Coop. V. Enjoy Data "-" ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Pytables-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/pytables-users
