[Pytables-users] [ANN] PyTables 2.0 alpha1

Francesc Altet Sat, 17 Feb 2007 10:01:25 -0800

Hi List,

After several months of hard work, Ivan and me are very proud to
announce the public availability of the first alpha release of PyTables
2.0.


Despite it is labeled as alpha we think that's it is stable enough to
deserve a look from people that is interested in developing modules for
it or just for getting a taste of the shiny new PyTables :)

Below is the official announcement. Enjoy data!

----------------------------------------------------------------------------

===========================
 Announcing PyTables 2.0a1
===========================

This is the first *alpha* version of PyTables 2.0. This release,
although that it is fairly stable regarding its operativity, it is
tagged as alpha because the API can still change a bit (but hopefully
not a great deal), so it is meant basically for developers and people
that wants to get a taste of the new exciting features in this major
version.

You can download the latest sources from the SVN repository in:
http://pytables.org/svn/pytables/trunk/

If you are afraid of SVN (you shouldn't), you can always fetch the
latests sources updated daily from:
http://www.pytables.org/download/snapshot/

You can download a source package of the version 2.0a1 with
generated PDF and HTML docs from:
http://www.pytables.org/download/preliminary/

Please keep reading for more info about the new features. Please, also
have an in-deep read of the RELEASE-NOTES.txt document where you will
find an entire section devoted to how to migrate your existing PyTables
1.x apps to the 2.0 version. You can also access to it at:

http://www.pytables.org/moin/ReleaseNotes/Release_2.0a1


Changes more in depth
=====================

Improvements:

- NumPy is finally at the core! That means that PyTables no longer needs
  numarray in order to operate, although it continues to be supported
  (as well as Numeric). This also means that you should be able to run
  PyTables in scenarios combining Python2.5 and 64-bit platforms (these
  are a source of problems with numarray/Numeric because they don't
  support this combination yet).

- Most of the operations in PyTables have experimented noticeable
  speed-ups (sometimes up to 2x, like in regular Python table
  selections). This is a consequence of both using NumPy internally and
  a considerable effort in terms of refactorization and optimization of
  the new code.

- Numexpr has been integrated in all in-kernel selections. So, now it is
  possible to perform complex selections like:

res = [ row['var3'] for row in table.where('(var2 < 20) | (var1 ==
"sas")') ]

  or:

cplx_cond = '((%s<=col5) & (col2<=%s)) | (sqrt(col1+3.1*col2
+col3*col4)>3)'
res = [ row['var3'] for row in table.where('cplx_cond' % (inf, sup)) ]

  and run them at full C-speed (or perhaps more, due to the cache-tuned
  computing kernel of Numexpr).

- Now, it is possible to get fields of the Row iterator by specifiying
their
  position, o even ranges of positions (extended slicing is supported
  now). For example, you can do now:

  res = [ row[4] for row in table if row[1] < 20 ]   # fetch field #4
  res = [ row[:] for row in table if row['var2'] < 20 ]  # fetch all
fields
  res = [ row[1::2] for row in table.iterrows(2, 3000, 3) ] # fetch odd
fields

  in addition to the classical:

  res = [ row['var3'] for row in table.where('var2 < 20') ]

- Row has received a new method called ``fetch_all_fields`` in order to
  easily retrieve all the fields of a row in situations like:

  [row.fetch_all_fields() for row in table.where('column1 < 0.3')]

  The difference between row[:] and row.fetch_all_fields() is that the
former
  will return all the fields as a tuple, while the latter will return
the
  fields in a NumPy void type and should be faster. Choose whatever fits
  better to your needs.

- Now, all the data that is read from disk is converted, if necessary,
  to the native byteorder of the hosting machine (before, this only
  happened with Table objects). This should help to accelerate apps that
  have to do computations with data generated in platforms with a
  byteorder different than the user machine.

- All the leaf constructors have received a new pararameter called
  'byteorder' that lets the user specify the byteorder of his data *on
  disk*. This effectively allows to create datasets in other byteorders
  than the native platform.

- Native HDF5 datasets with H5T_ARRAY datatypes are fully supported for
  reading now.

Bug fixes:

- As mentioned above, the fact that NumPy is at the core makes that
  certain bizarre interactions between numarray and NumPy scalars don't
  affect the behaviour of table selections anymore.  Fixes
  http://www.pytables.org/trac/ticket/29.

- Did I mention that PyTables 2.0 can be safely used in 64-bit platforms
  in combination with Python 2.5? ;)

Deprecated features:

- Not many, really. Please, see ``RELEASE-NOTES.txt`` file.

Backward-incompatible changes:

- Many. Please, see ``RELEASE-NOTES.txt`` file.


Important note for Windows users
================================

In order to keep PyTables happy, you will need to get the HDF5 library
compiled for MSVC 7.1, aka .NET 2003.  It can be found at:
ftp://ftp.ncsa.uiuc.edu/HDF/HDF5/current/bin/windows/5-165-win-net.ZIP

Please remember that, from PyTables 2.0 on, Python 2.3 (and lesser) is
not supported anymore.


What it is
==========

PyTables is a package for managing hierarchical datasets and designed to
efficiently cope with extremely large amounts of data (with support for
full 64-bit file addressing).  It features an object-oriented interface
that, combined with C extensions for the performance-critical parts of
the code, makes it a very easy-to-use tool for high performance data
storage and retrieval.

PyTables runs on top of the HDF5 library and NumPy (but numarray and
Numeric are also supported) package for achieving maximum throughput and
convenient use.

Besides, PyTables I/O for table objects is buffered, implemented in C
and carefully tuned so that you can reach much better performance with
PyTables than with your own home-grown wrappings to the HDF5 library.
PyTables Pro sports indexing capabilities as well, allowing selections
in tables exceeding one billion of rows in just seconds.


Platforms
=========

This version has been extensively checked on quite a few platforms, like
Linux on Intel32 (Pentium), Win on Intel32 (Pentium), Linux on Intel64
(Itanium2), FreeBSD on AMD64 (Opteron), Linux on PowerPC (and PowerPC64)
and MacOSX on PowerPC.  For other platforms, chances are that the code
can be easily compiled and run without further issues.  Please contact
us in case you are experiencing problems.


Resources
=========

Go to the PyTables web site for more details:

http://www.pytables.org

About the HDF5 library:

http://hdf.ncsa.uiuc.edu/HDF5/

About NumPy:

http://numpy.scipy.org/

To know more about the company behind the development of PyTables, see:

http://www.carabos.com/


Acknowledgments
===============

Thanks to various users who provided feature improvements, patches, bug
reports, support and suggestions.  See the ``THANKS`` file in the
distribution package for a (incomplete) list of contributors.  Many
thanks also to SourceForge who have helped to make and distribute this
package!


Share your experience
=====================

Let us know of any bugs, suggestions, gripes, kudos, etc. you may
have.


----

  **Enjoy data!**

  -- The PyTables Team


-- 
Francesc Altet    |  Be careful about using the following code --
Carabos Coop. V.  |  I've only proven that it works, 
www.carabos.com   |  I haven't tested it. -- Donald Knuth


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

[Pytables-users] [ANN] PyTables 2.0 alpha1

Reply via email to