[Python-announce] ANN: numexpr 2.8.8 released
Announcing NumExpr 2.8.8 Hi everyone, NumExpr 2.8.8 is a release to deal mainly with issues appearing with upcoming `NumPy` 2.0. Also, some small fixes (support for simple complex expressions like `ne.evaluate('1.5j')`) and improvements are included. Project documentation is available at: http://numexpr.readthedocs.io/ Changes from 2.8.7 to 2.8.8 --- * Fix re_evaluate not taking global_dict as argument. Thanks to Teng Liu (@27rabbitlt). * Fix parsing of simple complex numbers. Now, `ne.evaluate('1.5j')` works. Thanks to Teng Liu (@27rabbitlt). * Fixes for upcoming NumPy 2.0: * Replace npy_cdouble with C++ complex. Thanks to Teng Liu (@27rabbitlt). * Add NE_MAXARGS for future numpy change NPY_MAXARGS. Now it is set to 64 to match NumPy 2.0 value. Thanks to Teng Liu (@27rabbitlt). What's Numexpr? --- Numexpr is a fast numerical expression evaluator for NumPy. With it, expressions that operate on arrays (like "3*a+4*b") are accelerated and use less memory than doing the same calculation in Python. It has multi-threaded capabilities, as well as support for Intel's MKL (Math Kernel Library), which allows an extremely fast evaluation of transcendental functions (sin, cos, tan, exp, log...) while squeezing the last drop of performance out of your multi-core processors. Look here for a some benchmarks of numexpr using MKL: https://github.com/pydata/numexpr/wiki/NumexprMKL Its only dependency is NumPy (MKL is optional), so it works well as an easy-to-deploy, easy-to-use, computational engine for projects that don't want to adopt other solutions requiring more heavy dependencies. Where I can find Numexpr? - The project is hosted at GitHub in: https://github.com/pydata/numexpr You can get the packages from PyPI as well (but not for RC releases): http://pypi.python.org/pypi/numexpr Documentation is hosted at: http://numexpr.readthedocs.io/en/latest/ Share your experience - Let us know of any bugs, suggestions, gripes, kudos, etc. you may have. Enjoy data! -- Francesc Alted ___ Python-announce-list mailing list -- python-announce-list@python.org To unsubscribe send an email to python-announce-list-le...@python.org https://mail.python.org/mailman3/lists/python-announce-list.python.org/ Member address: arch...@mail-archive.com
[Python-announce] ANN: NumExpr 2.8.7
Hi everyone, NumExpr 2.8.7 is a release to deal with issues related to downstream `pandas` and other projects where the sanitization blacklist was triggering issues in their evaluate. Hopefully, the new sanitization code would be much more robust now. For those who do not wish to have sanitization on by default, it can be changed by setting an environment variable, `NUMEXPR_SANITIZE=0`. If you use `pandas` in your packages it is advisable you pin `numexpr >= 2.8.7` in your requirements. Project documentation is available at: http://numexpr.readthedocs.io/ Changes from 2.8.6 to 2.8.7 --- * More permissive rules in sanitizing regular expression: allow to access digits after the . with scientific notation. Thanks to Thomas Vincent. * Don't reject double underscores that are not at the start or end of a variable name (pandas uses those), or scientific-notation numbers with digits after the decimal point. Thanks to Rebecca Palmer. * Do not use `numpy.alltrue` in the test suite, as it has been deprecated (replaced by `numpy.all`). Thanks to Rebecca Chen. * Wheels for Python 3.12. Wheels for 3.7 and 3.8 are not generated anymore. What's Numexpr? --- Numexpr is a fast numerical expression evaluator for NumPy. With it, expressions that operate on arrays (like "3*a+4*b") are accelerated and use less memory than doing the same calculation in Python. It has multi-threaded capabilities, as well as support for Intel's MKL (Math Kernel Library), which allows an extremely fast evaluation of transcendental functions (sin, cos, tan, exp, log...) while squeezing the last drop of performance out of your multi-core processors. Look here for a some benchmarks of numexpr using MKL: https://github.com/pydata/numexpr/wiki/NumexprMKL Its only dependency is NumPy (MKL is optional), so it works well as an easy-to-deploy, easy-to-use, computational engine for projects that don't want to adopt other solutions requiring more heavy dependencies. Where I can find Numexpr? - The project is hosted at GitHub in: https://github.com/pydata/numexpr You can get the packages from PyPI as well (but not for RC releases): http://pypi.python.org/pypi/numexpr Documentation is hosted at: http://numexpr.readthedocs.io/en/latest/ Share your experience - Let us know of any bugs, suggestions, gripes, kudos, etc. you may have. Enjoy data! -- Francesc Alted ___ Python-announce-list mailing list -- python-announce-list@python.org To unsubscribe send an email to python-announce-list-le...@python.org https://mail.python.org/mailman3/lists/python-announce-list.python.org/ Member address: arch...@mail-archive.com
ANN: python-blosc 1.9.2 released!
= Announcing python-blosc 1.9.2 = What is new? This is a maintenance release to better support recent versions of Python (3.8 and 3.9). Also, and due to the evolution of modern CPUs, the number of default threads has been raised to 8 (from 4). Finally, zero-copy decompression is now supported by allowing bytes-like input. Thanks to Lehman Garrison. For more info, you can have a look at the release notes in: https://github.com/Blosc/python-blosc/blob/master/RELEASE_NOTES.rst More docs and examples are available in the documentation site: http://python-blosc.blosc.org What is it? === Blosc (http://www.blosc.org) is a high performance compressor optimized for binary data. It has been designed to transmit data to the processor cache faster than the traditional, non-compressed, direct memory fetch approach via a memcpy() OS call. Blosc works well for compressing numerical arrays that contain data with relatively low entropy, like sparse data, time series, grids with regular-spaced values, etc. python-blosc (http://python-blosc.blosc.org/) is the Python wrapper for the Blosc compression library, with added functions (`compress_ptr()` and `pack_array()`) for efficiently compressing NumPy arrays, minimizing the number of memory copies during the process. python-blosc can be used to compress in-memory data buffers for transmission to other machines, persistence or just as a compressed cache. There is also a handy tool built on top of python-blosc called Bloscpack (https://github.com/Blosc/bloscpack). It features a command line interface that allows you to compress large binary data files on-disk. It also comes with a Python API that has built-in support for serializing and deserializing Numpy arrays both on-disk and in-memory at speeds that are competitive with regular Pickle/cPickle machinery. Sources repository == The sources and documentation are managed through github services at: http://github.com/Blosc/python-blosc **Enjoy data!** -- The Blosc Development Team ___ Python-announce-list mailing list -- python-announce-list@python.org To unsubscribe send an email to python-announce-list-le...@python.org https://mail.python.org/mailman3/lists/python-announce-list.python.org/ Member address: arch...@mail-archive.com
ANN: python-blosc 1.9.0 released
= Announcing python-blosc 1.9.0 = What is new? In this release we got rid of the support for Python 2.7 and 3.5. Also, we fixed the copy of the leftovers of a chunk when its size is not a multiple of the typesize. Although this is a very unusual situation, it can certainly happen (e.g. https://github.com/Blosc/python-blosc/issues/220). Finally, sources for C-Blosc v1.18.1 have been included. For more info, you can have a look at the release notes in: https://github.com/Blosc/python-blosc/blob/master/RELEASE_NOTES.rst More docs and examples are available in the documentation site: http://python-blosc.blosc.org What is it? === Blosc (http://www.blosc.org) is a high performance compressor optimized for binary data. It has been designed to transmit data to the processor cache faster than the traditional, non-compressed, direct memory fetch approach via a memcpy() OS call. Blosc works well for compressing numerical arrays that contains data with relatively low entropy, like sparse data, time series, grids with regular-spaced values, etc. python-blosc (http://python-blosc.blosc.org/) is the Python wrapper for the Blosc compression library, with added functions (`compress_ptr()` and `pack_array()`) for efficiently compressing NumPy arrays, minimizing the number of memory copies during the process. python-blosc can be used to compress in-memory data buffers for transmission to other machines, persistence or just as a compressed cache. There is also a handy tool built on top of python-blosc called Bloscpack (https://github.com/Blosc/bloscpack). It features a commmand line interface that allows you to compress large binary datafiles on-disk. It also comes with a Python API that has built-in support for serializing and deserializing Numpy arrays both on-disk and in-memory at speeds that are competitive with regular Pickle/cPickle machinery. Sources repository == The sources and documentation are managed through github services at: http://github.com/Blosc/python-blosc **Enjoy data!** -- Python-announce-list mailing list -- python-announce-list@python.org To unsubscribe send an email to python-announce-list-le...@python.org https://mail.python.org/mailman3/lists/python-announce-list.python.org/ Support the Python Software Foundation: http://www.python.org/psf/donations/
bcolz, a column store for Python, 1.1.2 released
== Announcing bcolz 1.1.2 == What's new == This is a maintenance release that brings quite a lot of improvements. Here are the highlights: - Zstd is a supported codec now. Fixes #331. - C-Blosc updated to 1.11.2. - Added a new `defaults_ctx` context so that users can select defaults easily without changing global behaviour. For example:: with bcolz.defaults_ctx(vm="python", cparams=bcolz.cparams(clevel=0)): cout = bcolz.eval("(x + 1) < 0") - Fixed a crash occurring in `ctable.todataframe()` when both `columns` and `orient='columns'` were specified. PR #311. Thanks to Peter Quackenbush. - Use `pkg_resources.parse_version()` to test for version of packages. Fixes #322 (PY27 bcolz with dask unicode error). - New package recipe for conda-forge. Now you can install bcolz with: `conda install -c conda-forge bcolz`. Thanks to Alistair Miles. For a more detailed change log, see: https://github.com/Blosc/bcolz/blob/master/RELEASE_NOTES.rst For some comparison between bcolz and other compressed data containers, see: https://github.com/FrancescAlted/DataContainersTutorials specially chapters 3 (in-memory containers) and 4 (on-disk containers). What it is == *bcolz* provides **columnar and compressed** data containers that can live either on-disk or in-memory. The compression is carried out transparently by Blosc, an ultra fast meta-compressor that is optimized for binary data. Compression is active by default. Column storage allows for efficiently querying tables with a large number of columns. It also allows for cheap addition and removal of columns. Lastly, high-performance iterators (like ``iter()``, ``where()``) for querying the objects are provided. bcolz can use diffent backends internally (currently numexpr, Python/NumPy or dask) so as to accelerate many vector and query operations (although it can use pure NumPy for doing so too). Moreover, since the carray/ctable containers can be disk-based, it is possible to use them for seamlessly performing out-of-memory computations. While NumPy is used as the standard way to feed and retrieve data from bcolz internal containers, but it also comes with support for high-performance import/export facilities to/from `HDF5/PyTables tables <http://www.pytables.org>`_ and `pandas dataframes <http://pandas.pydata.org>`_. Have a look at how bcolz and the Blosc compressor, are making a better use of the memory without an important overhead, at least for some real scenarios: http://nbviewer.ipython.org/github/Blosc/movielens-bench/blob/master/querying-ep14.ipynb#Plots bcolz has minimal dependencies (NumPy is the only strict requisite), comes with an exhaustive test suite, and it is meant to be used in production. Example users of bcolz are Visualfabriq (http://www.visualfabriq.com/), Quantopian (https://www.quantopian.com/) and scikit-allel: * Visualfabriq: * *bquery*, A query and aggregation framework for Bcolz: * https://github.com/visualfabriq/bquery * Quantopian: * Using compressed data containers for faster backtesting at scale: * https://quantopian.github.io/talks/NeedForSpeed/slides.html * scikit-allel: * Exploratory analysis of large scale genetic variation data. * https://github.com/cggh/scikit-allel Resources = Visit the main bcolz site repository at: http://github.com/Blosc/bcolz Manual: http://bcolz.blosc.org Home of Blosc compressor: http://blosc.org User's mail list: bc...@googlegroups.com http://groups.google.com/group/bcolz License is the new BSD: https://github.com/Blosc/bcolz/blob/master/LICENSES/BCOLZ.txt Release notes can be found in the Git repository: https://github.com/Blosc/bcolz/blob/master/RELEASE_NOTES.rst **Enjoy data!** -- Francesc Alted -- https://mail.python.org/mailman/listinfo/python-announce-list Support the Python Software Foundation: http://www.python.org/psf/donations/
ANN: python-blosc 1.5.0 released
= Announcing python-blosc 1.5.0 = What is new? A new `blosc.set_releasegil()` function that allows to release/acquire the GIL at will. Thanks to Robert McLeod. Also, C-Blosc has been updated to 1.11.2. For more info, you can have a look at the release notes in: https://github.com/Blosc/python-blosc/blob/master/RELEASE_NOTES.rst More docs and examples are available in the documentation site: http://python-blosc.blosc.org What is it? === Blosc (http://www.blosc.org) is a high performance compressor optimized for binary data. It has been designed to transmit data to the processor cache faster than the traditional, non-compressed, direct memory fetch approach via a memcpy() OS call. Blosc works well for compressing numerical arrays that contains data with relatively low entropy, like sparse data, time series, grids with regular-spaced values, etc. python-blosc (http://python-blosc.blosc.org/) is the Python wrapper for the Blosc compression library, with added functions (`compress_ptr()` and `pack_array()`) for efficiently compressing NumPy arrays, minimizing the number of memory copies during the process. python-blosc can be used to compress in-memory data buffers for transmission to other machines, persistence or just as a compressed cache. There is also a handy tool built on top of python-blosc called Bloscpack (https://github.com/Blosc/bloscpack). It features a commmand line interface that allows you to compress large binary datafiles on-disk. It also comes with a Python API that has built-in support for serializing and deserializing Numpy arrays both on-disk and in-memory at speeds that are competitive with regular Pickle/cPickle machinery. Sources repository == The sources and documentation are managed through github services at: http://github.com/Blosc/python-blosc **Enjoy data!** -- Francesc Alted -- https://mail.python.org/mailman/listinfo/python-announce-list Support the Python Software Foundation: http://www.python.org/psf/donations/
ANN: numexpr 2.6.2 released!
= Announcing Numexpr 2.6.2 = What's new == This is a maintenance release that fixes several issues, with special emphasis in keeping compatibility with newer NumPy versions. Also, initial support for POWER processors is here. Thanks to Oleksandr Pavlyk, Alexander Shadchin, Breno Leitao, Fernando Seiti Furusato and Antonio Valentino for their nice contributions. In case you want to know more in detail what has changed in this version, see: https://github.com/pydata/numexpr/blob/master/RELEASE_NOTES.rst What's Numexpr == Numexpr is a fast numerical expression evaluator for NumPy. With it, expressions that operate on arrays (like "3*a+4*b") are accelerated and use less memory than doing the same calculation in Python. It wears multi-threaded capabilities, as well as support for Intel's MKL (Math Kernel Library), which allows an extremely fast evaluation of transcendental functions (sin, cos, tan, exp, log...) while squeezing the last drop of performance out of your multi-core processors. Look here for a some benchmarks of numexpr using MKL: https://github.com/pydata/numexpr/wiki/NumexprMKL Its only dependency is NumPy (MKL is optional), so it works well as an easy-to-deploy, easy-to-use, computational engine for projects that don't want to adopt other solutions requiring more heavy dependencies. Where I can find Numexpr? = The project is hosted at GitHub in: https://github.com/pydata/numexpr You can get the packages from PyPI as well (but not for RC releases): http://pypi.python.org/pypi/numexpr Share your experience = Let us know of any bugs, suggestions, gripes, kudos, etc. you may have. Enjoy data! -- Francesc Alted -- https://mail.python.org/mailman/listinfo/python-announce-list Support the Python Software Foundation: http://www.python.org/psf/donations/
ANN: numexpr 2.6.1 released
= Announcing Numexpr 2.6.1 = What's new == This is a maintenance release that fixes a performance regression in some situations. More specifically, the BLOCK_SIZE1 constant has been set to 1024 (down from 8192). This allows for better cache utilization when there are many operands and with VML. Fixes #221. Also, support for NetBSD has been added. Thanks to Thomas Klausner. In case you want to know more in detail what has changed in this version, see: https://github.com/pydata/numexpr/blob/master/RELEASE_NOTES.rst What's Numexpr == Numexpr is a fast numerical expression evaluator for NumPy. With it, expressions that operate on arrays (like "3*a+4*b") are accelerated and use less memory than doing the same calculation in Python. It wears multi-threaded capabilities, as well as support for Intel's MKL (Math Kernel Library), which allows an extremely fast evaluation of transcendental functions (sin, cos, tan, exp, log...) while squeezing the last drop of performance out of your multi-core processors. Look here for a some benchmarks of numexpr using MKL: https://github.com/pydata/numexpr/wiki/NumexprMKL Its only dependency is NumPy (MKL is optional), so it works well as an easy-to-deploy, easy-to-use, computational engine for projects that don't want to adopt other solutions requiring more heavy dependencies. Where I can find Numexpr? = The project is hosted at GitHub in: https://github.com/pydata/numexpr You can get the packages from PyPI as well (but not for RC releases): http://pypi.python.org/pypi/numexpr Share your experience = Let us know of any bugs, suggestions, gripes, kudos, etc. you may have. Enjoy data! -- Francesc Alted -- https://mail.python.org/mailman/listinfo/python-announce-list Support the Python Software Foundation: http://www.python.org/psf/donations/
ANN: bcolz 1.1.0 released!
== Announcing bcolz 1.1.0 == What's new == This release brings quite a lot of changes. After format stabilization in 1.0, the focus is now in fine-tune many operations (specially queries in ctables), as well as widening the available computational engines. Highlights: * Much improved performance of ctable.where() and ctable.whereblocks(). Now bcolz is getting closer than ever to fundamental memory limits during queries (see the updated benchmarks in the data containers tutorial below). * Better support for Dask; i.e. GIL is released during Blosc operation when bcolz is called from a multithreaded app (like Dask). Also, Dask can be used as another virtual machine for evaluating expressions (so now it is possible to use it during queries too). * New ctable.fetchwhere() method for getting the rows fulfilling some condition in one go. * New quantize filter for allowing lossy compression of floating point data. * It is possible to create ctables with more than 255 columns now. Thanks to Skipper Seabold. * The defaults during carray creation are scalars now. That allows to create highly dimensional data containers more efficiently. * carray object does implement the __array__() special method now. With this, interoperability with numpy arrays is easier and faster. For a more detailed change log, see: https://github.com/Blosc/bcolz/blob/master/RELEASE_NOTES.rst For some comparison between bcolz and other compressed data containers, see: https://github.com/FrancescAlted/DataContainersTutorials specially chapters 3 (in-memory containers) and 4 (on-disk containers). What it is == *bcolz* provides columnar and compressed data containers that can live either on-disk or in-memory. Column storage allows for efficiently querying tables with a large number of columns. It also allows for cheap addition and removal of column. In addition, bcolz objects are compressed by default for reducing memory/disk I/O needs. The compression process is carried out internally by Blosc, an extremely fast meta-compressor that is optimized for binary data. Lastly, high-performance iterators (like ``iter()``, ``where()``) for querying the objects are provided. bcolz can use numexpr internally so as to accelerate many vector and query operations (although it can use pure NumPy for doing so too). numexpr optimizes the memory usage and use several cores for doing the computations, so it is blazing fast. Moreover, since the carray/ctable containers can be disk-based, and it is possible to use them for seamlessly performing out-of-memory computations. bcolz has minimal dependencies (NumPy), comes with an exhaustive test suite and fully supports both 32-bit and 64-bit platforms. Also, it is typically tested on both UNIX and Windows operating systems. Together, bcolz and the Blosc compressor, are finally fulfilling the promise of accelerating memory I/O, at least for some real scenarios: http://nbviewer.ipython.org/github/Blosc/movielens-bench/blob/master/querying-ep14.ipynb#Plots Example users of bcolz are Visualfabriq (http://www.visualfabriq.com/), and Quantopian (https://www.quantopian.com/): * Visualfabriq: * *bquery*, A query and aggregation framework for Bcolz: * https://github.com/visualfabriq/bquery * Quantopian: * Using compressed data containers for faster backtesting at scale: * https://quantopian.github.io/talks/NeedForSpeed/slides.html Resources = Visit the main bcolz site repository at: http://github.com/Blosc/bcolz Manual: http://bcolz.blosc.org Home of Blosc compressor: http://blosc.org User's mail list: bc...@googlegroups.com http://groups.google.com/group/bcolz License is the new BSD: https://github.com/Blosc/bcolz/blob/master/LICENSES/BCOLZ.txt Release notes can be found in the Git repository: https://github.com/Blosc/bcolz/blob/master/RELEASE_NOTES.rst **Enjoy data!** -- Francesc Alted -- https://mail.python.org/mailman/listinfo/python-announce-list Support the Python Software Foundation: http://www.python.org/psf/donations/
ANN: numexpr 2.6.0 released
= Announcing Numexpr 2.6.0 = Numexpr is a fast numerical expression evaluator for NumPy. With it, expressions that operate on arrays (like "3*a+4*b") are accelerated and use less memory than doing the same calculation in Python. It wears multi-threaded capabilities, as well as support for Intel's MKL (Math Kernel Library), which allows an extremely fast evaluation of transcendental functions (sin, cos, tan, exp, log...) while squeezing the last drop of performance out of your multi-core processors. Look here for a some benchmarks of numexpr using MKL: https://github.com/pydata/numexpr/wiki/NumexprMKL Its only dependency is NumPy (MKL is optional), so it works well as an easy-to-deploy, easy-to-use, computational engine for projects that don't want to adopt other solutions requiring more heavy dependencies. What's new == This is a minor version bump because it introduces a new function. Also some minor fine tuning for recent CPUs has been done. More specifically: - Introduced a new re_evaluate() function for re-evaluating the previous executed array expression without any check. This is meant for accelerating loops that are re-evaluating the same expression repeatedly without changing anything else than the operands. If unsure, use evaluate() which is safer. - The BLOCK_SIZE1 and BLOCK_SIZE2 constants have been re-checked in order to find a value maximizing most of the benchmarks in bench/ directory. The new values (8192 and 16 respectively) give somewhat better results (~5%) overall. The CPU used for fine tuning is a relatively new Haswell processor (E3-1240 v3). In case you want to know more in detail what has changed in this version, see: https://github.com/pydata/numexpr/blob/master/RELEASE_NOTES.rst Where I can find Numexpr? = The project is hosted at GitHub in: https://github.com/pydata/numexpr You can get the packages from PyPI as well (but not for RC releases): http://pypi.python.org/pypi/numexpr Share your experience = Let us know of any bugs, suggestions, gripes, kudos, etc. you may have. Enjoy data! -- Francesc Alted -- https://mail.python.org/mailman/listinfo/python-announce-list Support the Python Software Foundation: http://www.python.org/psf/donations/
ANN: bcolz 1.0.0 (final) released
= Announcing bcolz 1.0.0 final = What's new == Yeah, 1.0.0 is finally here. We are not introducing any exciting new feature (just some optimizations and bug fixes), but bcolz is already 6 years old and it implements most of the capabilities that it was designed for, so I decided to release a 1.0.0 meaning that the format is declared stable and that people can be assured that future bcolz releases will be able to read bcolz 1.0 data files (and probably much earlier ones too) for a long while. Such a format is fully described at: https://github.com/Blosc/bcolz/blob/master/DISK_FORMAT_v1.rst Also, a 1.0.0 release means that bcolz 1.x series will be based on C-Blosc 1.x series (https://github.com/Blosc/c-blosc). After C-Blosc 2.x (https://github.com/Blosc/c-blosc2) would be out, a new bcolz 2.x is expected taking advantage of shiny new features of C-Blosc2 (more compressors, more filters, native variable length support and the concept of super-chunks), which should be very beneficial for next bcolz generation. Important: this is a final release and there are no important known bugs there, so this is recommended to be used in production. Enjoy! For a more detailed change log, see: https://github.com/Blosc/bcolz/blob/master/RELEASE_NOTES.rst For some comparison between bcolz and other compressed data containers, see: https://github.com/FrancescAlted/DataContainersTutorials specially chapters 3 (in-memory containers) and 4 (on-disk containers). Also, if it happens that you are in Madrid during this weekend, you can drop by my tutorial and talk: http://pydata.org/madrid2016/schedule/ See you! What it is == *bcolz* provides columnar and compressed data containers that can live either on-disk or in-memory. Column storage allows for efficiently querying tables with a large number of columns. It also allows for cheap addition and removal of column. In addition, bcolz objects are compressed by default for reducing memory/disk I/O needs. The compression process is carried out internally by Blosc, an extremely fast meta-compressor that is optimized for binary data. Lastly, high-performance iterators (like ``iter()``, ``where()``) for querying the objects are provided. bcolz can use numexpr internally so as to accelerate many vector and query operations (although it can use pure NumPy for doing so too). numexpr optimizes the memory usage and use several cores for doing the computations, so it is blazing fast. Moreover, since the carray/ctable containers can be disk-based, and it is possible to use them for seamlessly performing out-of-memory computations. bcolz has minimal dependencies (NumPy), comes with an exhaustive test suite and fully supports both 32-bit and 64-bit platforms. Also, it is typically tested on both UNIX and Windows operating systems. Together, bcolz and the Blosc compressor, are finally fulfilling the promise of accelerating memory I/O, at least for some real scenarios: http://nbviewer.ipython.org/github/Blosc/movielens-bench/blob/master/querying-ep14.ipynb#Plots Other users of bcolz are Visualfabriq (http://www.visualfabriq.com/) , Quantopian (https://www.quantopian.com/) and Scikit-Allel ( https://github.com/cggh/scikit-allel) which you can read more about by pointing your browser at the links below. * Visualfabriq: * *bquery*, A query and aggregation framework for Bcolz: * https://github.com/visualfabriq/bquery * Quantopian: * Using compressed data containers for faster backtesting at scale: * https://quantopian.github.io/talks/NeedForSpeed/slides.html * Scikit-Allel * Provides an alternative backend to work with compressed arrays * https://scikit-allel.readthedocs.org/en/latest/model/bcolz.html Resources = Visit the main bcolz site repository at: http://github.com/Blosc/bcolz Manual: http://bcolz.blosc.org Home of Blosc compressor: http://blosc.org User's mail list: bc...@googlegroups.com http://groups.google.com/group/bcolz License is the new BSD: https://github.com/Blosc/bcolz/blob/master/LICENSES/BCOLZ.txt Release notes can be found in the Git repository: https://github.com/Blosc/bcolz/blob/master/RELEASE_NOTES.rst **Enjoy data!** -- Francesc Alted -- https://mail.python.org/mailman/listinfo/python-announce-list Support the Python Software Foundation: http://www.python.org/psf/donations/
ANN: python-blosc 1.3.1
= Announcing python-blosc 1.3.1 = What is new? This is an important release in terms of stability. Now, the -O1 flag for compiling the included C-Blosc sources on Linux. This represents slower performance, but fixes the nasty issue #110. In case maximum speed is needed, please `compile python-blosc with an external C-Blosc library < https://github.com/Blosc/python-blosc#compiling-with-an-installed-blosc-library-recommended )>`_. Also, symbols like BLOSC_MAX_BUFFERSIZE have been replaced for allowing backward compatibility with python-blosc 1.2.x series. For whetting your appetite, look at some benchmarks here: https://github.com/Blosc/python-blosc#benchmarking For more info, you can have a look at the release notes in: https://github.com/Blosc/python-blosc/blob/master/RELEASE_NOTES.rst More docs and examples are available in the documentation site: http://python-blosc.blosc.org What is it? === Blosc (http://www.blosc.org) is a high performance compressor optimized for binary data. It has been designed to transmit data to the processor cache faster than the traditional, non-compressed, direct memory fetch approach via a memcpy() OS call. Blosc works well for compressing numerical arrays that contains data with relatively low entropy, like sparse data, time series, grids with regular-spaced values, etc. python-blosc (http://python-blosc.blosc.org/) is the Python wrapper for the Blosc compression library, with added functions (`compress_ptr()` and `pack_array()`) for efficiently compressing NumPy arrays, minimizing the number of memory copies during the process. python-blosc can be used to compress in-memory data buffers for transmission to other machines, persistence or just as a compressed cache. There is also a handy tool built on top of python-blosc called Bloscpack (https://github.com/Blosc/bloscpack). It features a commmand line interface that allows you to compress large binary datafiles on-disk. It also comes with a Python API that has built-in support for serializing and deserializing Numpy arrays both on-disk and in-memory at speeds that are competitive with regular Pickle/cPickle machinery. Sources repository == The sources and documentation are managed through github services at: http://github.com/Blosc/python-blosc **Enjoy data!** -- Francesc Alted -- https://mail.python.org/mailman/listinfo/python-announce-list Support the Python Software Foundation: http://www.python.org/psf/donations/
ANN: numexpr 2.5.2 released
= Announcing Numexpr 2.5.2 = Numexpr is a fast numerical expression evaluator for NumPy. With it, expressions that operate on arrays (like "3*a+4*b") are accelerated and use less memory than doing the same calculation in Python. It wears multi-threaded capabilities, as well as support for Intel's MKL (Math Kernel Library), which allows an extremely fast evaluation of transcendental functions (sin, cos, tan, exp, log...) while squeezing the last drop of performance out of your multi-core processors. Look here for a some benchmarks of numexpr using MKL: https://github.com/pydata/numexpr/wiki/NumexprMKL Its only dependency is NumPy (MKL is optional), so it works well as an easy-to-deploy, easy-to-use, computational engine for projects that don't want to adopt other solutions requiring more heavy dependencies. What's new == This is a maintenance release shaking some remaining problems with VML (it is nice to see how Anaconda VML's support helps raising hidden issues). Now conj() and abs() are actually added as VML-powered functions, preventing the same problems than log10() before (PR #212); thanks to Tom Kooij. Upgrading to this release is highly recommended. In case you want to know more in detail what has changed in this version, see: https://github.com/pydata/numexpr/blob/master/RELEASE_NOTES.rst Where I can find Numexpr? = The project is hosted at GitHub in: https://github.com/pydata/numexpr You can get the packages from PyPI as well (but not for RC releases): http://pypi.python.org/pypi/numexpr Share your experience = Let us know of any bugs, suggestions, gripes, kudos, etc. you may have. Enjoy data! -- Francesc Alted -- https://mail.python.org/mailman/listinfo/python-announce-list Support the Python Software Foundation: http://www.python.org/psf/donations/
ANN: python-blosc 1.3.0 released
= Announcing python-blosc 1.3.0 = What is new? There is support for newest C-Blosc. As such, C-Blosc 1.8.0 is being distributed internally. Support for the new `BITSHUFFLE` filter, allowing for more compression ratios in many cases, at the expense of some slowdown. For details see: http://python-blosc.blosc.org/tutorial.html#using-different-filters You can also run some benchmarks including different codecs and filters: https://github.com/Blosc/python-blosc/blob/master/bench/compress_ptr.py For more info, you can have a look at the release notes in: https://github.com/Blosc/python-blosc/blob/master/RELEASE_NOTES.rst More docs and examples are available in the documentation site: http://python-blosc.blosc.org What is it? === Blosc (http://www.blosc.org) is a high performance compressor optimized for binary data. It has been designed to transmit data to the processor cache faster than the traditional, non-compressed, direct memory fetch approach via a memcpy() OS call. Blosc is the first compressor that is meant not only to reduce the size of large datasets on-disk or in-memory, but also to accelerate object manipulations that are memory-bound (http://www.blosc.org/docs/StarvingCPUs.pdf). See http://www.blosc.org/synthetic-benchmarks.html for some benchmarks on how much speed it can achieve in some datasets. Blosc works well for compressing numerical arrays that contains data with relatively low entropy, like sparse data, time series, grids with regular-spaced values, etc. python-blosc (http://python-blosc.blosc.org/) is the Python wrapper for the Blosc compression library. There is also a handy tool built on Blosc called Bloscpack (https://github.com/Blosc/bloscpack). It features a commmand line interface that allows you to compress large binary datafiles on-disk. It also comes with a Python API that has built-in support for serializing and deserializing Numpy arrays both on-disk and in-memory at speeds that are competitive with regular Pickle/cPickle machinery. Installing == python-blosc is in PyPI repository, so installing it is easy: $ pip install -U blosc # yes, you must omit the 'python-' prefix Download sources The sources are managed through github services at: http://github.com/Blosc/python-blosc Documentation = There is Sphinx-based documentation site at: http://python-blosc.blosc.org/ Mailing list There is an official mailing list for Blosc at: bl...@googlegroups.com http://groups.google.es/group/blosc Licenses Both Blosc and its Python wrapper are distributed using the MIT license. See: https://github.com/Blosc/python-blosc/blob/master/LICENSES for more details. **Enjoy data!** -- Francesc Alted -- https://mail.python.org/mailman/listinfo/python-announce-list Support the Python Software Foundation: http://www.python.org/psf/donations/
ANN: bcolz 1.0.0 RC2 is out!
== Announcing bcolz 1.0.0 RC2 == What's new == Yeah, 1.0.0 is finally here. We are not introducing any exciting new feature (just some optimizations and bug fixes), but bcolz is already 6 years old and it implements most of the capabilities that it was designed for, so I decided to release a 1.0.0 meaning that the format is declared stable and that people can be assured that future bcolz releases will be able to read bcolz 1.0 data files (and probably much earlier ones too) for a long while. Such a format is fully described at: https://github.com/Blosc/bcolz/blob/master/DISK_FORMAT_v1.rst Also, a 1.0.0 release means that bcolz 1.x series will be based on C-Blosc 1.x series (https://github.com/Blosc/c-blosc). After C-Blosc 2.x (https://github.com/Blosc/c-blosc2) would be out, a new bcolz 2.x is expected taking advantage of shiny new features of C-Blosc2 (more compressors, more filters, native variable length support and the concept of super-chunks), which should be very beneficial for next bcolz generation. Important: this is a Release Candidate, so please test it as much as you can. If no issues would appear in a week or so, I will proceed to tag and release 1.0.0 final. Enjoy! For a more detailed change log, see: https://github.com/Blosc/bcolz/blob/master/RELEASE_NOTES.rst What it is == *bcolz* provides columnar and compressed data containers that can live either on-disk or in-memory. Column storage allows for efficiently querying tables with a large number of columns. It also allows for cheap addition and removal of column. In addition, bcolz objects are compressed by default for reducing memory/disk I/O needs. The compression process is carried out internally by Blosc, an extremely fast meta-compressor that is optimized for binary data. Lastly, high-performance iterators (like ``iter()``, ``where()``) for querying the objects are provided. bcolz can use numexpr internally so as to accelerate many vector and query operations (although it can use pure NumPy for doing so too). numexpr optimizes the memory usage and use several cores for doing the computations, so it is blazing fast. Moreover, since the carray/ctable containers can be disk-based, and it is possible to use them for seamlessly performing out-of-memory computations. bcolz has minimal dependencies (NumPy), comes with an exhaustive test suite and fully supports both 32-bit and 64-bit platforms. Also, it is typically tested on both UNIX and Windows operating systems. Together, bcolz and the Blosc compressor, are finally fulfilling the promise of accelerating memory I/O, at least for some real scenarios: http://nbviewer.ipython.org/github/Blosc/movielens-bench/blob/master/querying-ep14.ipynb#Plots Other users of bcolz are Visualfabriq (http://www.visualfabriq.com/) the Blaze project (http://blaze.pydata.org/), Quantopian (https://www.quantopian.com/) and Scikit-Allel (https://github.com/cggh/scikit-allel) which you can read more about by pointing your browser at the links below. * Visualfabriq: * *bquery*, A query and aggregation framework for Bcolz: * https://github.com/visualfabriq/bquery * Blaze: * Notebooks showing Blaze + Pandas + BColz interaction: * http://nbviewer.ipython.org/url/blaze.pydata.org/notebooks/timings-csv.ipynb * http://nbviewer.ipython.org/url/blaze.pydata.org/notebooks/timings-bcolz.ipynb * Quantopian: * Using compressed data containers for faster backtesting at scale: * https://quantopian.github.io/talks/NeedForSpeed/slides.html * Scikit-Allel * Provides an alternative backend to work with compressed arrays * https://scikit-allel.readthedocs.org/en/latest/model/bcolz.html Installing == bcolz is in the PyPI repository, so installing it is easy:: $ pip install -U bcolz Resources = Visit the main bcolz site repository at: http://github.com/Blosc/bcolz Manual: http://bcolz.blosc.org Home of Blosc compressor: http://blosc.org User's mail list: bc...@googlegroups.com http://groups.google.com/group/bcolz License is the new BSD: https://github.com/Blosc/bcolz/blob/master/LICENSES/BCOLZ.txt Release notes can be found in the Git repository: https://github.com/Blosc/bcolz/blob/master/RELEASE_NOTES.rst **Enjoy data!** -- Francesc Alted -- https://mail.python.org/mailman/listinfo/python-announce-list Support the Python Software Foundation: http://www.python.org/psf/donations/
ANN: numexpr 2.5.1 released
= Announcing Numexpr 2.5.1 = Numexpr is a fast numerical expression evaluator for NumPy. With it, expressions that operate on arrays (like "3*a+4*b") are accelerated and use less memory than doing the same calculation in Python. It wears multi-threaded capabilities, as well as support for Intel's MKL (Math Kernel Library), which allows an extremely fast evaluation of transcendental functions (sin, cos, tan, exp, log...) while squeezing the last drop of performance out of your multi-core processors. Look here for a some benchmarks of numexpr using MKL: https://github.com/pydata/numexpr/wiki/NumexprMKL Its only dependency is NumPy (MKL is optional), so it works well as an easy-to-deploy, easy-to-use, computational engine for projects that don't want to adopt other solutions requiring more heavy dependencies. What's new == Fixed a critical bug that caused wrong evaluations of log10() and conj(). These produced wrong results when numexpr was compiled with Intel's MKL (which is a popular build since Anaconda ships it by default) and non-contiguous data. This is considered a *critical* bug and upgrading is highly recommended. Thanks to Arne de Laat and Tom Kooij for reporting and providing a test unit. In case you want to know more in detail what has changed in this version, see: https://github.com/pydata/numexpr/blob/master/RELEASE_NOTES.rst Where I can find Numexpr? = The project is hosted at GitHub in: https://github.com/pydata/numexpr You can get the packages from PyPI as well (but not for RC releases): http://pypi.python.org/pypi/numexpr Share your experience = Let us know of any bugs, suggestions, gripes, kudos, etc. you may have. Enjoy data! -- Francesc Alted -- https://mail.python.org/mailman/listinfo/python-announce-list Support the Python Software Foundation: http://www.python.org/psf/donations/
[ANN] bcolz 1.0.0 RC1 released
== Announcing bcolz 1.0.0 RC1 == What's new == Yeah, 1.0.0 is finally here. We are not introducing any exciting new feature (just some optimizations and bug fixes), but bcolz is already 6 years old and it implements most of the capabilities that it was designed for, so I decided to release a 1.0.0 meaning that the format is declared stable and that people can be assured that future bcolz releases will be able to read bcolz 1.0 data files (and probably much earlier ones too) for a long while. Such a format is fully described at: https://github.com/Blosc/bcolz/blob/master/DISK_FORMAT_v1.rst Also, a 1.0.0 release means that bcolz 1.x series will be based on C-Blosc 1.x series (https://github.com/Blosc/c-blosc). After C-Blosc 2.x (https://github.com/Blosc/c-blosc2) would be out, a new bcolz 2.x is expected taking advantage of shiny new features of C-Blosc2 (more compressors, more filters, native variable length support and the concept of super-chunks), which should be very beneficial for next bcolz generation. Important: this is a Release Candidate, so please test it as much as you can. If no issues would appear in a week or so, I will proceed to tag and release 1.0.0 final. Enjoy! For a more detailed change log, see: https://github.com/Blosc/bcolz/blob/master/RELEASE_NOTES.rst What it is == *bcolz* provides columnar and compressed data containers that can live either on-disk or in-memory. Column storage allows for efficiently querying tables with a large number of columns. It also allows for cheap addition and removal of column. In addition, bcolz objects are compressed by default for reducing memory/disk I/O needs. The compression process is carried out internally by Blosc, an extremely fast meta-compressor that is optimized for binary data. Lastly, high-performance iterators (like ``iter()``, ``where()``) for querying the objects are provided. bcolz can use numexpr internally so as to accelerate many vector and query operations (although it can use pure NumPy for doing so too). numexpr optimizes the memory usage and use several cores for doing the computations, so it is blazing fast. Moreover, since the carray/ctable containers can be disk-based, and it is possible to use them for seamlessly performing out-of-memory computations. bcolz has minimal dependencies (NumPy), comes with an exhaustive test suite and fully supports both 32-bit and 64-bit platforms. Also, it is typically tested on both UNIX and Windows operating systems. Together, bcolz and the Blosc compressor, are finally fulfilling the promise of accelerating memory I/O, at least for some real scenarios: http://nbviewer.ipython.org/github/Blosc/movielens-bench/blob/master/querying-ep14.ipynb#Plots Other users of bcolz are Visualfabriq (http://www.visualfabriq.com/) the Blaze project (http://blaze.pydata.org/), Quantopian (https://www.quantopian.com/) and Scikit-Allel (https://github.com/cggh/scikit-allel) which you can read more about by pointing your browser at the links below. * Visualfabriq: * *bquery*, A query and aggregation framework for Bcolz: * https://github.com/visualfabriq/bquery * Blaze: * Notebooks showing Blaze + Pandas + BColz interaction: * http://nbviewer.ipython.org/url/blaze.pydata.org/notebooks/timings-csv.ipynb * http://nbviewer.ipython.org/url/blaze.pydata.org/notebooks/timings-bcolz.ipynb * Quantopian: * Using compressed data containers for faster backtesting at scale: * https://quantopian.github.io/talks/NeedForSpeed/slides.html * Scikit-Allel * Provides an alternative backend to work with compressed arrays * https://scikit-allel.readthedocs.org/en/latest/model/bcolz.html Installing == bcolz is in the PyPI repository, so installing it is easy:: $ pip install -U bcolz Resources = Visit the main bcolz site repository at: http://github.com/Blosc/bcolz Manual: http://bcolz.blosc.org Home of Blosc compressor: http://blosc.org User's mail list: bc...@googlegroups.com http://groups.google.com/group/bcolz License is the new BSD: https://github.com/Blosc/bcolz/blob/master/LICENSES/BCOLZ.txt Release notes can be found in the Git repository: https://github.com/Blosc/bcolz/blob/master/RELEASE_NOTES.rst **Enjoy data!** -- Francesc Alted -- https://mail.python.org/mailman/listinfo/python-announce-list Support the Python Software Foundation: http://www.python.org/psf/donations/
ANN: numexpr 2.5
= Announcing Numexpr 2.5 = Numexpr is a fast numerical expression evaluator for NumPy. With it, expressions that operate on arrays (like "3*a+4*b") are accelerated and use less memory than doing the same calculation in Python. It wears multi-threaded capabilities, as well as support for Intel's MKL (Math Kernel Library), which allows an extremely fast evaluation of transcendental functions (sin, cos, tan, exp, log...) while squeezing the last drop of performance out of your multi-core processors. Look here for a some benchmarks of numexpr using MKL: https://github.com/pydata/numexpr/wiki/NumexprMKL Its only dependency is NumPy (MKL is optional), so it works well as an easy-to-deploy, easy-to-use, computational engine for projects that don't want to adopt other solutions requiring more heavy dependencies. What's new == In this version, a lock has been added so that numexpr can be called from multithreaded apps. Mind that this does not prevent numexpr to use multiple cores internally. Also, a new min() and max() functions have been added. Thanks to contributors! In case you want to know more in detail what has changed in this version, see: https://github.com/pydata/numexpr/blob/master/RELEASE_NOTES.rst Where I can find Numexpr? = The project is hosted at GitHub in: https://github.com/pydata/numexpr You can get the packages from PyPI as well (but not for RC releases): http://pypi.python.org/pypi/numexpr Share your experience = Let us know of any bugs, suggestions, gripes, kudos, etc. you may have. Enjoy data! -- Francesc Alted -- https://mail.python.org/mailman/listinfo/python-announce-list Support the Python Software Foundation: http://www.python.org/psf/donations/
ANN: bcolz 0.12.0 released
=== Announcing bcolz 0.12.0 === What's new == This release copes with some compatibility issues with NumPy 1.10. Also, several improvements have happened in the installation procedure, allowing for a smoother process. Last but not least, the tutorials haven been migrated to the IPython notebook format (a huge thank you to Francesc Elies for this!). This will hopefully will allow users to better exercise the different features of bcolz. For a more detailed change log, see: https://github.com/Blosc/bcolz/blob/master/RELEASE_NOTES.rst What it is == *bcolz* provides columnar and compressed data containers that can live either on-disk or in-memory. Column storage allows for efficiently querying tables with a large number of columns. It also allows for cheap addition and removal of column. In addition, bcolz objects are compressed by default for reducing memory/disk I/O needs. The compression process is carried out internally by Blosc, an extremely fast meta-compressor that is optimized for binary data. Lastly, high-performance iterators (like ``iter()``, ``where()``) for querying the objects are provided. bcolz can use numexpr internally so as to accelerate many vector and query operations (although it can use pure NumPy for doing so too). numexpr optimizes the memory usage and use several cores for doing the computations, so it is blazing fast. Moreover, since the carray/ctable containers can be disk-based, and it is possible to use them for seamlessly performing out-of-memory computations. bcolz has minimal dependencies (NumPy), comes with an exhaustive test suite and fully supports both 32-bit and 64-bit platforms. Also, it is typically tested on both UNIX and Windows operating systems. Together, bcolz and the Blosc compressor, are finally fulfilling the promise of accelerating memory I/O, at least for some real scenarios: http://nbviewer.ipython.org/github/Blosc/movielens-bench/blob/master/querying-ep14.ipynb#Plots Other users of bcolz are Visualfabriq (http://www.visualfabriq.com/) the Blaze project (http://blaze.pydata.org/), Quantopian (https://www.quantopian.com/) and Scikit-Allel (https://github.com/cggh/scikit-allel) which you can read more about by pointing your browser at the links below. * Visualfabriq: * *bquery*, A query and aggregation framework for Bcolz: * https://github.com/visualfabriq/bquery * Blaze: * Notebooks showing Blaze + Pandas + BColz interaction: * http://nbviewer.ipython.org/url/blaze.pydata.org/notebooks/timings-csv.ipynb * http://nbviewer.ipython.org/url/blaze.pydata.org/notebooks/timings-bcolz.ipynb * Quantopian: * Using compressed data containers for faster backtesting at scale: * https://quantopian.github.io/talks/NeedForSpeed/slides.html * Scikit-Allel * Provides an alternative backend to work with compressed arrays * https://scikit-allel.readthedocs.org/en/latest/model/bcolz.html Installing == bcolz is in the PyPI repository, so installing it is easy:: $ pip install -U bcolz Resources = Visit the main bcolz site repository at: http://github.com/Blosc/bcolz Manual: http://bcolz.blosc.org Home of Blosc compressor: http://blosc.org User's mail list: bc...@googlegroups.com http://groups.google.com/group/bcolz License is the new BSD: https://github.com/Blosc/bcolz/blob/master/LICENSES/BCOLZ.txt Release notes can be found in the Git repository: https://github.com/Blosc/bcolz/blob/master/RELEASE_NOTES.rst **Enjoy data!** -- Francesc Alted -- https://mail.python.org/mailman/listinfo/python-announce-list Support the Python Software Foundation: http://www.python.org/psf/donations/
ANN: numexpr 2.4.6 released
Hi, This is a quick release fixing some reported problems in the 2.4.5 version that I announced a few hours ago. Hope I have fixed the main issues now. Now, the official announcement: = Announcing Numexpr 2.4.6 = Numexpr is a fast numerical expression evaluator for NumPy. With it, expressions that operate on arrays (like "3*a+4*b") are accelerated and use less memory than doing the same calculation in Python. It wears multi-threaded capabilities, as well as support for Intel's MKL (Math Kernel Library), which allows an extremely fast evaluation of transcendental functions (sin, cos, tan, exp, log...) while squeezing the last drop of performance out of your multi-core processors. Look here for a some benchmarks of numexpr using MKL: https://github.com/pydata/numexpr/wiki/NumexprMKL Its only dependency is NumPy (MKL is optional), so it works well as an easy-to-deploy, easy-to-use, computational engine for projects that don't want to adopt other solutions requiring more heavy dependencies. What's new == This is a quick maintenance version that offers better handling of MSVC symbols (#168, Francesc Alted), as well as fising some UserWarnings in Solaris (#189, Graham Jones). In case you want to know more in detail what has changed in this version, see: https://github.com/pydata/numexpr/blob/master/RELEASE_NOTES.rst Where I can find Numexpr? = The project is hosted at GitHub in: https://github.com/pydata/numexpr You can get the packages from PyPI as well (but not for RC releases): http://pypi.python.org/pypi/numexpr Share your experience = Let us know of any bugs, suggestions, gripes, kudos, etc. you may have. Enjoy data! -- Francesc Alted -- https://mail.python.org/mailman/listinfo/python-announce-list Support the Python Software Foundation: http://www.python.org/psf/donations/
ANN: numexpr 2.4.5 released
= Announcing Numexpr 2.4.5 = Numexpr is a fast numerical expression evaluator for NumPy. With it, expressions that operate on arrays (like "3*a+4*b") are accelerated and use less memory than doing the same calculation in Python. It wears multi-threaded capabilities, as well as support for Intel's MKL (Math Kernel Library), which allows an extremely fast evaluation of transcendental functions (sin, cos, tan, exp, log...) while squeezing the last drop of performance out of your multi-core processors. Look here for a some benchmarks of numexpr using MKL: https://github.com/pydata/numexpr/wiki/NumexprMKL Its only dependency is NumPy (MKL is optional), so it works well as an easy-to-deploy, easy-to-use, computational engine for projects that don't want to adopt other solutions requiring more heavy dependencies. What's new == This is a maintenance release where an important bug in multithreading code has been fixed (#185 Benedikt Reinartz, Francesc Alted). Also, many harmless warnings (overflow/underflow, divide by zero and others) in the test suite have been silenced (#183, Francesc Alted). In case you want to know more in detail what has changed in this version, see: https://github.com/pydata/numexpr/blob/master/RELEASE_NOTES.rst Where I can find Numexpr? = The project is hosted at GitHub in: https://github.com/pydata/numexpr You can get the packages from PyPI as well (but not for RC releases): http://pypi.python.org/pypi/numexpr Share your experience = Let us know of any bugs, suggestions, gripes, kudos, etc. you may have. Enjoy data! -- Francesc Alted -- https://mail.python.org/mailman/listinfo/python-announce-list Support the Python Software Foundation: http://www.python.org/psf/donations/
ANN: bcolz 0.11.3 released!
=== Announcing bcolz 0.11.3 === What's new == Implemented new feature (#255): bcolz.zeros() can create new ctables too, either empty or filled with zeros. (#256 @FrancescElies @FrancescAlted). Also, in previous, non announced versions (0.11.1 and 0.11.2), new dependencies were added and other fixes are there too. For a more detailed change log, see: https://github.com/Blosc/bcolz/blob/master/RELEASE_NOTES.rst What it is == *bcolz* provides columnar and compressed data containers that can live either on-disk or in-memory. Column storage allows for efficiently querying tables with a large number of columns. It also allows for cheap addition and removal of column. In addition, bcolz objects are compressed by default for reducing memory/disk I/O needs. The compression process is carried out internally by Blosc, an extremely fast meta-compressor that is optimized for binary data. Lastly, high-performance iterators (like ``iter()``, ``where()``) for querying the objects are provided. bcolz can use numexpr internally so as to accelerate many vector and query operations (although it can use pure NumPy for doing so too). numexpr optimizes the memory usage and use several cores for doing the computations, so it is blazing fast. Moreover, since the carray/ctable containers can be disk-based, and it is possible to use them for seamlessly performing out-of-memory computations. bcolz has minimal dependencies (NumPy), comes with an exhaustive test suite and fully supports both 32-bit and 64-bit platforms. Also, it is typically tested on both UNIX and Windows operating systems. Together, bcolz and the Blosc compressor, are finally fulfilling the promise of accelerating memory I/O, at least for some real scenarios: http://nbviewer.ipython.org/github/Blosc/movielens-bench/blob/master/querying-ep14.ipynb#Plots Other users of bcolz are Visualfabriq (http://www.visualfabriq.com/) the Blaze project (http://blaze.pydata.org/), Quantopian (https://www.quantopian.com/) and Scikit-Allel (https://github.com/cggh/scikit-allel) which you can read more about by pointing your browser at the links below. * Visualfabriq: * *bquery*, A query and aggregation framework for Bcolz: * https://github.com/visualfabriq/bquery * Blaze: * Notebooks showing Blaze + Pandas + BColz interaction: * http://nbviewer.ipython.org/url/blaze.pydata.org/notebooks/timings-csv.ipynb * http://nbviewer.ipython.org/url/blaze.pydata.org/notebooks/timings-bcolz.ipynb * Quantopian: * Using compressed data containers for faster backtesting at scale: * https://quantopian.github.io/talks/NeedForSpeed/slides.html * Scikit-Allel * Provides an alternative backend to work with compressed arrays * https://scikit-allel.readthedocs.org/en/latest/model/bcolz.html Installing == bcolz is in the PyPI repository, so installing it is easy:: $ pip install -U bcolz Resources = Visit the main bcolz site repository at: http://github.com/Blosc/bcolz Manual: http://bcolz.blosc.org Home of Blosc compressor: http://blosc.org User's mail list: bc...@googlegroups.com http://groups.google.com/group/bcolz License is the new BSD: https://github.com/Blosc/bcolz/blob/master/LICENSES/BCOLZ.txt Release notes can be found in the Git repository: https://github.com/Blosc/bcolz/blob/master/RELEASE_NOTES.rst **Enjoy data!** -- Francesc Alted -- https://mail.python.org/mailman/listinfo/python-announce-list Support the Python Software Foundation: http://www.python.org/psf/donations/
ANN: python-blosc 1.2.8 released
= Announcing python-blosc 1.2.8 = What is new? This is a maintenance release. Internal C-Blosc has been upgraded to 1.7.0 (although new bitshuffle support has not been made public, as it seems not ready for production yet). Also, there is support for bytes-like objects that support the buffer interface as input to ``compress`` and ``decompress``. On Python 2.x this includes unicode, on Python 3.x it doesn't. Thanks to Valentin Haenel. Finally, a memory leak in ``decompress```has been hunted and fixed. And new tests have been added to catch possible leaks in the future. Thanks to Santi Villalba. For more info, you can have a look at the release notes in: https://github.com/Blosc/python-blosc/blob/master/RELEASE_NOTES.rst More docs and examples are available in the documentation site: http://python-blosc.blosc.org What is it? === Blosc (http://www.blosc.org) is a high performance compressor optimized for binary data. It has been designed to transmit data to the processor cache faster than the traditional, non-compressed, direct memory fetch approach via a memcpy() OS call. Blosc is the first compressor that is meant not only to reduce the size of large datasets on-disk or in-memory, but also to accelerate object manipulations that are memory-bound (http://www.blosc.org/docs/StarvingCPUs.pdf). See http://www.blosc.org/synthetic-benchmarks.html for some benchmarks on how much speed it can achieve in some datasets. Blosc works well for compressing numerical arrays that contains data with relatively low entropy, like sparse data, time series, grids with regular-spaced values, etc. python-blosc (http://python-blosc.blosc.org/) is the Python wrapper for the Blosc compression library. There is also a handy tool built on Blosc called Bloscpack (https://github.com/Blosc/bloscpack). It features a commmand line interface that allows you to compress large binary datafiles on-disk. It also comes with a Python API that has built-in support for serializing and deserializing Numpy arrays both on-disk and in-memory at speeds that are competitive with regular Pickle/cPickle machinery. Installing == python-blosc is in PyPI repository, so installing it is easy: $ pip install -U blosc # yes, you must omit the 'python-' prefix Download sources The sources are managed through github services at: http://github.com/Blosc/python-blosc Documentation = There is Sphinx-based documentation site at: http://python-blosc.blosc.org/ Mailing list There is an official mailing list for Blosc at: bl...@googlegroups.com http://groups.google.es/group/blosc Licenses Both Blosc and its Python wrapper are distributed using the MIT license. See: https://github.com/Blosc/python-blosc/blob/master/LICENSES for more details. **Enjoy data!** -- Francesc Alted -- https://mail.python.org/mailman/listinfo/python-announce-list Support the Python Software Foundation: http://www.python.org/psf/donations/
numexpr 2.4.4 released
= Announcing Numexpr 2.4.4 = Numexpr is a fast numerical expression evaluator for NumPy. With it, expressions that operate on arrays (like "3*a+4*b") are accelerated and use less memory than doing the same calculation in Python. It wears multi-threaded capabilities, as well as support for Intel's MKL (Math Kernel Library), which allows an extremely fast evaluation of transcendental functions (sin, cos, tan, exp, log...) while squeezing the last drop of performance out of your multi-core processors. Look here for a some benchmarks of numexpr using MKL: https://github.com/pydata/numexpr/wiki/NumexprMKL Its only dependency is NumPy (MKL is optional), so it works well as an easy-to-deploy, easy-to-use, computational engine for projects that don't want to adopt other solutions requiring more heavy dependencies. What's new == This is a maintenance release which contains several bug fixes, like better testing on Python3 platform and some harmless data race. Among the enhancements, AppVeyor support is here and OMP_NUM_THREADS is honored as a fallback in case NUMEXPR_NUM_THREADS is not set. In case you want to know more in detail what has changed in this version, see: https://github.com/pydata/numexpr/blob/master/RELEASE_NOTES.rst Where I can find Numexpr? = The project is hosted at GitHub in: https://github.com/pydata/numexpr You can get the packages from PyPI as well (but not for RC releases): http://pypi.python.org/pypi/numexpr Share your experience = Let us know of any bugs, suggestions, gripes, kudos, etc. you may have. Enjoy data! -- Francesc Alted -- https://mail.python.org/mailman/listinfo/python-announce-list Support the Python Software Foundation: http://www.python.org/psf/donations/
bcolz 0.11.0 released!
=== Announcing bcolz 0.11.0 === What's new == Although this is mostly a maintenance release that fixes some bugs, the setup.py is enterily based now in setuptools and has been greatly modernized to use a new versioning system. Just this deserves a bump in the minor version. Thanks to Gabi Davar (@mindw) for such a nice improvement. Also, many improvements in the Continuous Integration part (and hence not directly visible to users) have been made by Francesc Elies (@FrancescElies). Thanks for his quiet but effective work. And last but not least, I would like to announce that Valentin Haenel (@esc) just stepped down as release manager. Thanks Valentin for all the hard work that you put in making bcolz a better piece of software! What it is == *bcolz* provides columnar and compressed data containers that can live either on-disk or in-memory. Column storage allows for efficiently querying tables with a large number of columns. It also allows for cheap addition and removal of column. In addition, bcolz objects are compressed by default for reducing memory/disk I/O needs. The compression process is carried out internally by Blosc, an extremely fast meta-compressor that is optimized for binary data. Lastly, high-performance iterators (like ``iter()``, ``where()``) for querying the objects are provided. bcolz can use numexpr internally so as to accelerate many vector and query operations (although it can use pure NumPy for doing so too). numexpr optimizes the memory usage and use several cores for doing the computations, so it is blazing fast. Moreover, since the carray/ctable containers can be disk-based, and it is possible to use them for seamlessly performing out-of-memory computations. bcolz has minimal dependencies (NumPy), comes with an exhaustive test suite and fully supports both 32-bit and 64-bit platforms. Also, it is typically tested on both UNIX and Windows operating systems. Together, bcolz and the Blosc compressor, are finally fulfilling the promise of accelerating memory I/O, at least for some real scenarios: http://nbviewer.ipython.org/github/Blosc/movielens-bench/blob/master/querying-ep14.ipynb#Plots Other users of bcolz are Visualfabriq (http://www.visualfabriq.com/) the Blaze project (http://blaze.pydata.org/), Quantopian (https://www.quantopian.com/) and Scikit-Allel (https://github.com/cggh/scikit-allel) which you can read more about by pointing your browser at the links below. * Visualfabriq: * *bquery*, A query and aggregation framework for Bcolz: * https://github.com/visualfabriq/bquery * Blaze: * Notebooks showing Blaze + Pandas + BColz interaction: * http://nbviewer.ipython.org/url/blaze.pydata.org/notebooks/timings-csv.ipynb * http://nbviewer.ipython.org/url/blaze.pydata.org/notebooks/timings-bcolz.ipynb * Quantopian: * Using compressed data containers for faster backtesting at scale: * https://quantopian.github.io/talks/NeedForSpeed/slides.html * Scikit-Allel * Provides an alternative backend to work with compressed arrays * https://scikit-allel.readthedocs.org/en/latest/bcolz.html Installing == bcolz is in the PyPI repository, so installing it is easy:: $ pip install -U bcolz Resources = Visit the main bcolz site repository at: http://github.com/Blosc/bcolz Manual: http://bcolz.blosc.org Home of Blosc compressor: http://blosc.org User's mail list: bc...@googlegroups.com http://groups.google.com/group/bcolz License is the new BSD: https://github.com/Blosc/bcolz/blob/master/LICENSES/BCOLZ.txt Release notes can be found in the Git repository: https://github.com/Blosc/bcolz/blob/master/RELEASE_NOTES.rst **Enjoy data!** -- Francesc Alted -- https://mail.python.org/mailman/listinfo/python-announce-list Support the Python Software Foundation: http://www.python.org/psf/donations/
ANN: PyTables 3.2.0 (final) released!
=== Announcing PyTables 3.2.0 === We are happy to announce PyTables 3.2.0. *** IMPORTANT NOTICE: If you are a user of PyTables, it needs your help to keep going. Please read the next thread as it contains important information about the future (or the lack of it) of the project: https://groups.google.com/forum/#!topic/pytables-users/yY2aUa4H7W4 Thanks! *** What's new == This is a major release of PyTables and it is the result of more than a year of accumulated patches, but most specially it fixes a couple of nasty problem with indexed queries not returning the correct results in some scenarios. There are many usablity and performance improvements too. In case you want to know more in detail what has changed in this version, please refer to: http://www.pytables.org/release_notes.html You can install it via pip or download a source package with generated PDF and HTML docs from: http://sourceforge.net/projects/pytables/files/pytables/3.2.0 For an online version of the manual, visit: http://www.pytables.org/usersguide/index.html What it is? === PyTables is a library for managing hierarchical datasets and designed to efficiently cope with extremely large amounts of data with support for full 64-bit file addressing. PyTables runs on top of the HDF5 library and NumPy package for achieving maximum throughput and convenient use. PyTables includes OPSI, a new indexing technology, allowing to perform data lookups in tables exceeding 10 gigarows (10**10 rows) in less than a tenth of a second. Resources = About PyTables: http://www.pytables.org About the HDF5 library: http://hdfgroup.org/HDF5/ About NumPy: http://numpy.scipy.org/ Acknowledgments === Thanks to many users who provided feature improvements, patches, bug reports, support and suggestions. See the ``THANKS`` file in the distribution package for a (incomplete) list of contributors. Most specially, a lot of kudos go to the HDF5 and NumPy makers. Without them, PyTables simply would not exist. Share your experience = Let us know of any bugs, suggestions, gripes, kudos, etc. you may have. **Enjoy data!** -- The PyTables Developers -- https://mail.python.org/mailman/listinfo/python-announce-list Support the Python Software Foundation: http://www.python.org/psf/donations/
ANN: python-blosc 1.2.7 released
= Announcing python-blosc 1.2.7 = What is new? Updated to use c-blosc v1.6.1. Although that this supports AVX2, it is not enabled in python-blosc because we still need a way to devise how to detect AVX2 in the underlying platform. At any rate, c-blosc 1.6.1 fixed an important bug in the blosclz codec that a release was deemed important. For more info, you can have a look at the release notes in: https://github.com/Blosc/python-blosc/wiki/Release-notes More docs and examples are available in the documentation site: http://python-blosc.blosc.org What is it? === Blosc (http://www.blosc.org) is a high performance compressor optimized for binary data. It has been designed to transmit data to the processor cache faster than the traditional, non-compressed, direct memory fetch approach via a memcpy() OS call. Blosc is the first compressor that is meant not only to reduce the size of large datasets on-disk or in-memory, but also to accelerate object manipulations that are memory-bound (http://www.blosc.org/docs/StarvingCPUs.pdf). See http://www.blosc.org/synthetic-benchmarks.html for some benchmarks on how much speed it can achieve in some datasets. Blosc works well for compressing numerical arrays that contains data with relatively low entropy, like sparse data, time series, grids with regular-spaced values, etc. python-blosc (http://python-blosc.blosc.org/) is the Python wrapper for the Blosc compression library. There is also a handy tool built on Blosc called Bloscpack (https://github.com/Blosc/bloscpack). It features a commmand line interface that allows you to compress large binary datafiles on-disk. It also comes with a Python API that has built-in support for serializing and deserializing Numpy arrays both on-disk and in-memory at speeds that are competitive with regular Pickle/cPickle machinery. Installing == python-blosc is in PyPI repository, so installing it is easy: $ pip install -U blosc # yes, you should omit the python- prefix Download sources The sources are managed through github services at: http://github.com/Blosc/python-blosc Documentation = There is Sphinx-based documentation site at: http://python-blosc.blosc.org/ Mailing list There is an official mailing list for Blosc at: bl...@googlegroups.com http://groups.google.es/group/blosc Licenses Both Blosc and its Python wrapper are distributed using the MIT license. See: https://github.com/Blosc/python-blosc/blob/master/LICENSES for more details. **Enjoy data!** -- Francesc Alted -- https://mail.python.org/mailman/listinfo/python-announce-list Support the Python Software Foundation: http://www.python.org/psf/donations/
ANN: bcolz 0.7.1 released
ICENSES/ directory. Share your experience - Let us know of any bugs, suggestions, gripes, kudos, etc. you may have. **Enjoy Data!** -- Francesc Alted -- https://mail.python.org/mailman/listinfo/python-announce-list Support the Python Software Foundation: http://www.python.org/psf/donations/
[CORRECTION] python-blosc 1.2.4 released (Was: ANN: python-blosc 1.2.7 released)
Indeed it was 1.2.4 the version just released and not 1.2.7. Sorry for the typo! Francesc On 7/7/14, 8:20 PM, Francesc Alted wrote: = Announcing python-blosc 1.2.4 = What is new? This is a maintenance release, where included c-blosc sources have been updated to 1.4.0. This adds support for non-Intel architectures, most specially those not supporting unaligned access. For more info, you can have a look at the release notes in: https://github.com/Blosc/python-blosc/wiki/Release-notes More docs and examples are available in the documentation site: http://python-blosc.blosc.org What is it? === Blosc (http://www.blosc.org) is a high performance compressor optimized for binary data. It has been designed to transmit data to the processor cache faster than the traditional, non-compressed, direct memory fetch approach via a memcpy() OS call. Blosc is the first compressor that is meant not only to reduce the size of large datasets on-disk or in-memory, but also to accelerate object manipulations that are memory-bound (http://www.blosc.org/docs/StarvingCPUs.pdf). See http://www.blosc.org/synthetic-benchmarks.html for some benchmarks on how much speed it can achieve in some datasets. Blosc works well for compressing numerical arrays that contains data with relatively low entropy, like sparse data, time series, grids with regular-spaced values, etc. python-blosc (http://python-blosc.blosc.org/) is the Python wrapper for the Blosc compression library. There is also a handy command line and Python library for Blosc called Bloscpack (https://github.com/Blosc/bloscpack) that allows you to compress large binary datafiles on-disk. Installing == python-blosc is in PyPI repository, so installing it is easy: $ pip install -U blosc # yes, you should omit the python- prefix Download sources The sources are managed through github services at: http://github.com/Blosc/python-blosc Documentation = There is Sphinx-based documentation site at: http://python-blosc.blosc.org/ Mailing list There is an official mailing list for Blosc at: bl...@googlegroups.com http://groups.google.es/group/blosc Licenses Both Blosc and its Python wrapper are distributed using the MIT license. See: https://github.com/Blosc/python-blosc/blob/master/LICENSES for more details. **Enjoy data!** -- Francesc Alted -- https://mail.python.org/mailman/listinfo/python-announce-list Support the Python Software Foundation: http://www.python.org/psf/donations/
ANN: python-blosc 1.2.7 released
= Announcing python-blosc 1.2.4 = What is new? This is a maintenance release, where included c-blosc sources have been updated to 1.4.0. This adds support for non-Intel architectures, most specially those not supporting unaligned access. For more info, you can have a look at the release notes in: https://github.com/Blosc/python-blosc/wiki/Release-notes More docs and examples are available in the documentation site: http://python-blosc.blosc.org What is it? === Blosc (http://www.blosc.org) is a high performance compressor optimized for binary data. It has been designed to transmit data to the processor cache faster than the traditional, non-compressed, direct memory fetch approach via a memcpy() OS call. Blosc is the first compressor that is meant not only to reduce the size of large datasets on-disk or in-memory, but also to accelerate object manipulations that are memory-bound (http://www.blosc.org/docs/StarvingCPUs.pdf). See http://www.blosc.org/synthetic-benchmarks.html for some benchmarks on how much speed it can achieve in some datasets. Blosc works well for compressing numerical arrays that contains data with relatively low entropy, like sparse data, time series, grids with regular-spaced values, etc. python-blosc (http://python-blosc.blosc.org/) is the Python wrapper for the Blosc compression library. There is also a handy command line and Python library for Blosc called Bloscpack (https://github.com/Blosc/bloscpack) that allows you to compress large binary datafiles on-disk. Installing == python-blosc is in PyPI repository, so installing it is easy: $ pip install -U blosc # yes, you should omit the python- prefix Download sources The sources are managed through github services at: http://github.com/Blosc/python-blosc Documentation = There is Sphinx-based documentation site at: http://python-blosc.blosc.org/ Mailing list There is an official mailing list for Blosc at: bl...@googlegroups.com http://groups.google.es/group/blosc Licenses Both Blosc and its Python wrapper are distributed using the MIT license. See: https://github.com/Blosc/python-blosc/blob/master/LICENSES for more details. **Enjoy data!** -- Francesc Alted -- https://mail.python.org/mailman/listinfo/python-announce-list Support the Python Software Foundation: http://www.python.org/psf/donations/
ANN: numexpr 2.4 is out
Announcing Numexpr 2.4 Numexpr is a fast numerical expression evaluator for NumPy. With it, expressions that operate on arrays (like "3*a+4*b") are accelerated and use less memory than doing the same calculation in Python. It wears multi-threaded capabilities, as well as support for Intel's MKL (Math Kernel Library), which allows an extremely fast evaluation of transcendental functions (sin, cos, tan, exp, log...) while squeezing the last drop of performance out of your multi-core processors. Look here for a some benchmarks of numexpr using MKL: https://github.com/pydata/numexpr/wiki/NumexprMKL Its only dependency is NumPy (MKL is optional), so it works well as an easy-to-deploy, easy-to-use, computational engine for projects that don't want to adopt other solutions requiring more heavy dependencies. What's new == A new `contains()` function has been added for detecting substrings in strings. Only plain strings (bytes) are supported for now (see ticket #142). Thanks to Marcin Krol. You can have a glimpse on how `contains()` works in this notebook: http://nbviewer.ipython.org/gist/FrancescAlted/10595974 where it can be seen that this can make substring searches more than 10x faster than with regular Python. You can find the source for the notebook here: https://github.com/FrancescAlted/ngrams Also, there is a new version of setup.py that allows better management of the NumPy dependency during pip installs. Thanks to Aleks Bunin. Windows related bugs have been addressed and (hopefully) squashed. Thanks to Christoph Gohlke. In case you want to know more in detail what has changed in this version, see: https://github.com/pydata/numexpr/wiki/Release-Notes or have a look at RELEASE_NOTES.txt in the tarball. Where I can find Numexpr? = The project is hosted at GitHub in: https://github.com/pydata/numexpr You can get the packages from PyPI as well (but not for RC releases): http://pypi.python.org/pypi/numexpr Share your experience = Let us know of any bugs, suggestions, gripes, kudos, etc. you may have. Enjoy data! -- Francesc Alted -- Francesc Alted -- https://mail.python.org/mailman/listinfo/python-announce-list Support the Python Software Foundation: http://www.python.org/psf/donations/
ANN: BLZ 0.6.1 has been released
Announcing BLZ 0.6 series = What it is -- BLZ is a chunked container for numerical data. Chunking allows for efficient enlarging/shrinking of data container. In addition, it can also be compressed for reducing memory/disk needs. The compression process is carried out internally by Blosc, a high-performance compressor that is optimized for binary data. The main objects in BLZ are `barray` and `btable`. `barray` is meant for storing multidimensional homogeneous datasets efficiently. `barray` objects provide the foundations for building `btable` objects, where each column is made of a single `barray`. Facilities are provided for iterating, filtering and querying `btables` in an efficient way. You can find more info about `barray` and `btable` in the tutorial: http://blz.pydata.org/blz-manual/tutorial.html BLZ can use numexpr internally so as to accelerate many vector and query operations (although it can use pure NumPy for doing so too) either from memory or from disk. In the future, it is planned to use Numba as the computational kernel and to provide better Blaze (http://blaze.pydata.org) integration. What's new -- BLZ has been branched off from the Blaze project (http://blaze.pydata.org). BLZ was meant as a persistent format and library for I/O in Blaze. BLZ in Blaze is based on previous carray 0.5 and this is why this new version is labeled 0.6. BLZ supports completely transparent storage on-disk in addition to memory. That means that *everything* that can be done with the in-memory container can be done using the disk as well. The advantages of a disk-based container is that the addressable space is much larger than just your available memory. Also, as BLZ is based on a chunked and compressed data layout based on the super-fast Blosc compression library, the data access speed is very good. The format chosen for the persistence layer is based on the 'bloscpack' library and described in the "Persistent format for BLZ" chapter of the user manual ('docs/source/persistence-format.rst'). More about Bloscpack here: https://github.com/esc/bloscpack You may want to know more about BLZ in this blog entry: http://continuum.io/blog/blz-format In this version, support for Blosc 1.3 has been added, that meaning that a new `cname` parameter has been added to the `bparams` class, so that you can select you preferred compressor from 'blosclz', 'lz4', 'lz4hc', 'snappy' and 'zlib'. Also, many bugs have been fixed, providing a much smoother experience. CAVEAT: The BLZ/bloscpack format is still evolving, so don't trust on forward compatibility of the format, at least until 1.0, where the internal format will be declared frozen. Resources - Visit the main BLZ site repository at: http://github.com/ContinuumIO/blz Read the online docs at: http://blz.pydata.org/blz-manual/index.html Home of Blosc compressor: http://www.blosc.org User's mail list: blaze-...@continuum.io Enjoy! Francesc Alted Continuum Analytics, Inc. -- https://mail.python.org/mailman/listinfo/python-announce-list Support the Python Software Foundation: http://www.python.org/psf/donations/
ANN: numexpr 2.3 released
== Announcing Numexpr 2.3 == Numexpr is a fast numerical expression evaluator for NumPy. With it, expressions that operate on arrays (like "3*a+4*b") are accelerated and use less memory than doing the same calculation in Python. It wears multi-threaded capabilities, as well as support for Intel's MKL (Math Kernel Library), which allows an extremely fast evaluation of transcendental functions (sin, cos, tan, exp, log...) while squeezing the last drop of performance out of your multi-core processors. Look here for a some benchmarks of numexpr using MKL: https://github.com/pydata/numexpr/wiki/NumexprMKL Its only dependency is NumPy (MKL is optional), so it works well as an easy-to-deploy, easy-to-use, computational engine for projects that don't want to adopt other solutions requiring more heavy dependencies. What's new == The repository has been migrated to https://github.com/pydata/numexpr. All new tickets and PR should be directed there. Also, a `conj()` function for computing the conjugate of complex arrays has been added. Thanks to David Menéndez. See PR #125. Finallly, we fixed a DeprecationWarning derived of using ``oa_ndim == 0`` and ``op_axes == NULL`` when using `NpyIter_AdvancedNew()` and NumPy 1.8. Thanks to Mark Wiebe for advise on how to fix this properly. Many thanks to Christoph Gohlke and Ilan Schnell for his help during the testing of this release in all kinds of possible combinations of platforms and MKL. In case you want to know more in detail what has changed in this version, see: https://github.com/pydata/numexpr/wiki/Release-Notes or have a look at RELEASE_NOTES.txt in the tarball. Where I can find Numexpr? = The project is hosted at GitHub in: https://github.com/pydata/numexpr You can get the packages from PyPI as well (but not for RC releases): http://pypi.python.org/pypi/numexpr Share your experience = Let us know of any bugs, suggestions, gripes, kudos, etc. you may have. Enjoy data! -- Francesc Alted -- https://mail.python.org/mailman/listinfo/python-announce-list Support the Python Software Foundation: http://www.python.org/psf/donations/
ANN: python-blosc 1.2.0 released
github.com/ContinuumIO/python-blosc Documentation = There is Sphinx-based documentation site at: http://blosc.pydata.org/ Mailing list There is an official mailing list for Blosc at: bl...@googlegroups.com http://groups.google.es/group/blosc Licenses Both Blosc and its Python wrapper are distributed using the MIT license. See: https://github.com/ContinuumIO/python-blosc/blob/master/LICENSES for more details. **Enjoy data!** -- Francesc Alted Continuum Analytics, Inc. -- https://mail.python.org/mailman/listinfo/python-announce-list Support the Python Software Foundation: http://www.python.org/psf/donations/
[ANN] numexpr 2.2 released
== Announcing Numexpr 2.2 == Numexpr is a fast numerical expression evaluator for NumPy. With it, expressions that operate on arrays (like "3*a+4*b") are accelerated and use less memory than doing the same calculation in Python. It wears multi-threaded capabilities, as well as support for Intel's VML library (included in Intel MKL), which allows an extremely fast evaluation of transcendental functions (sin, cos, tan, exp, log...) while squeezing the last drop of performance out of your multi-core processors. Its only dependency is NumPy (MKL is optional), so it works well as an easy-to-deploy, easy-to-use, computational kernel for projects that don't want to adopt other solutions that require more heavy dependencies. What's new == This release is mainly meant to fix a problem with the license the numexpr/win32/pthread.{c,h} files emulating pthreads on Windows. After persmission from the original authors is granted, these files adopt the MIT license and can be redistributed without problems. See issue #109 for details (https://code.google.com/p/numexpr/issues/detail?id=110). Another important improvement is the algorithm to decide the initial number of threads to be used. This was necessary because by default, numexpr was using a number of threads equal to the detected number of cores, and this can be just too much for moder systems where this number can be too high (and counterporductive for performance in many cases). Now, the 'NUMEXPR_NUM_THREADS' environment variable is honored, and in case this is not present, a maximum number of *8* threads are setup initially. The new algorithm is fully described in the Users Guide now in the note of 'General routines' section: https://code.google.com/p/numexpr/wiki/UsersGuide#General_routines. Closes #110. In case you want to know more in detail what has changed in this version, see: http://code.google.com/p/numexpr/wiki/ReleaseNotes or have a look at RELEASE_NOTES.txt in the tarball. Where I can find Numexpr? = The project is hosted at Google code in: http://code.google.com/p/numexpr/ You can get the packages from PyPI as well: http://pypi.python.org/pypi/numexpr Share your experience = Let us know of any bugs, suggestions, gripes, kudos, etc. you may have. Enjoy data! -- Francesc Alted -- http://mail.python.org/mailman/listinfo/python-announce-list Support the Python Software Foundation: http://www.python.org/psf/donations/
ANN: python-blosc 1.1 (final) released
=== Announcing python-blosc 1.1 === What is it? === python-blosc (http://blosc.pydata.org/) is a Python wrapper for the Blosc compression library. Blosc (http://blosc.org) is a high performance compressor optimized for binary data. It has been designed to transmit data to the processor cache faster than the traditional, non-compressed, direct memory fetch approach via a memcpy() OS call. Whether this is achieved or not depends of the data compressibility, the number of cores in the system, and other factors. See a series of benchmarks conducted for many different systems: http://blosc.org/trac/wiki/SyntheticBenchmarks. Blosc works well for compressing numerical arrays that contains data with relatively low entropy, like sparse data, time series, grids with regular-spaced values, etc. There is also a handy command line for Blosc called Bloscpack (https://github.com/esc/bloscpack) that allows you to compress large binary datafiles on-disk. Although the format for Bloscpack has not stabilized yet, it allows you to effectively use Blosc from your favorite shell. What is new? - Added new `compress_ptr` and `decompress_ptr` functions that allows to compress and decompress from/to a data pointer, avoiding an itermediate copy for maximum speed. Be careful, as these are low level calls, and user must make sure that the pointer data area is safe. - Since Blosc (the C library) already supports to be installed as an standalone library (via cmake), it is also possible to link python-blosc against a system Blosc library. - The Python calls to Blosc are now thread-safe (another consequence of recent Blosc library supporting this at C level). - Many checks on types and ranges of values have been added. Most of the calls will now complain when passed the wrong values. - Docstrings are much improved. Also, Sphinx-based docs are available now. Many thanks to Valentin Hänel for his impressive work for this release. For more info, you can see the release notes in: https://github.com/FrancescAlted/python-blosc/wiki/Release-notes More docs and examples are available in the documentation site: http://blosc.pydata.org Installing == python-blosc is in PyPI repository, so installing it is easy: $ pip install -U blosc # yes, you should omit the python- prefix Download sources The sources are managed through github services at: http://github.com/FrancescAlted/python-blosc Documentation = There is Sphinx-based documentation site at: http://blosc.pydata.org/ Mailing list There is an official mailing list for Blosc at: bl...@googlegroups.com http://groups.google.es/group/blosc Licenses Both Blosc and its Python wrapper are distributed using the MIT license. See: https://github.com/FrancescAlted/python-blosc/blob/master/LICENSES for more details. Enjoy! -- Francesc Alted -- http://mail.python.org/mailman/listinfo/python-announce-list Support the Python Software Foundation: http://www.python.org/psf/donations/
ANN: python-blosc 1.1 RC1, a wrapper for the compression library, is available
Announcing python-blosc 1.1 RC1 What is it? === python-blosc (http://blosc.pydata.org/) is a Python wrapper for the Blosc compression library. Blosc (http://blosc.org) is a high performance compressor optimized for binary data. It has been designed to transmit data to the processor cache faster than the traditional, non-compressed, direct memory fetch approach via a memcpy() OS call. Whether this is achieved or not depends of the data compressibility, the number of cores in the system, and other factors. See a series of benchmarks conducted for many different systems: http://blosc.org/trac/wiki/SyntheticBenchmarks. Blosc works well for compressing numerical arrays that contains data with relatively low entropy, like sparse data, time series, grids with regular-spaced values, etc. There is also a handy command line for Blosc called Bloscpack (https://github.com/esc/bloscpack) that allows you to compress large binary datafiles on-disk. Although the format for Bloscpack has not stabilized yet, it allows you to effectively use Blosc from your favorite shell. What is new? - Added new `compress_ptr` and `decompress_ptr` functions that allows to compress and decompress from/to a data pointer. These are low level calls and user must make sure that the pointer data area is safe. - Since Blosc (the C library) already supports to be installed as an standalone library (via cmake), it is also possible to link python-blosc against a system Blosc library. - The Python calls to Blosc are now thread-safe (another consequence of recent Blosc library supporting this at C level). - Many checks on types and ranges of values have been added. Most of the calls will now complain when passed the wrong values. - Docstrings are much improved. Also, Sphinx-based docs are available now. Many thanks to Valentin Hänel for his impressive work for this release. For more info, you can see the release notes in: https://github.com/FrancescAlted/python-blosc/wiki/Release-notes More docs and examples are available in the documentation site: http://blosc.pydata.org Installing == python-blosc is in PyPI repository, so installing it is easy: $ pip install -U blosc # yes, you should omit the blosc- prefix Download sources The sources are managed through github services at: http://github.com/FrancescAlted/python-blosc Documentation = There is Sphinx-based documentation site at: http://blosc.pydata.org/ Mailing list There is an official mailing list for Blosc at: bl...@googlegroups.com http://groups.google.es/group/blosc Licenses Both Blosc and its Python wrapper are distributed using the MIT license. See: https://github.com/FrancescAlted/python-blosc/blob/master/LICENSES for more details. -- Francesc Alted -- http://mail.python.org/mailman/listinfo/python-announce-list Support the Python Software Foundation: http://www.python.org/psf/donations/
[ANN] python-blosc 1.0.5 released
= Announcing python-blosc 1.0.5 = What is it? === A Python wrapper for the Blosc compression library. Blosc (http://blosc.pytables.org) is a high performance compressor optimized for binary data. It has been designed to transmit data to the processor cache faster than the traditional, non-compressed, direct memory fetch approach via a memcpy() OS call. Blosc works well for compressing numerical arrays that contains data with relatively low entropy, like sparse data, time series, grids with regular-spaced values, etc. What is new? - Upgraded to latest Blosc 1.1.4. - Better handling of condition errors, and improved memory releasing in case of errors (thanks to Valentin Haenel and Han Genuit). - Better handling of types (should compile without warning now, at least with GCC). For more info, you can see the release notes in: https://github.com/FrancescAlted/python-blosc/wiki/Release-notes More docs and examples are available in the Quick User's Guide wiki page: https://github.com/FrancescAlted/python-blosc/wiki/Quick-User's-Guide Download sources Go to: http://github.com/FrancescAlted/python-blosc and download the most recent release from there. Blosc is distributed using the MIT license, see LICENSES/BLOSC.txt for details. Mailing list There is an official mailing list for Blosc at: bl...@googlegroups.com http://groups.google.es/group/blosc -- Francesc Alted -- http://mail.python.org/mailman/listinfo/python-announce-list Support the Python Software Foundation: http://www.python.org/psf/donations/
[ANN] carray 0.5 released
Announcing carray 0.5 = What's new -- carray 0.5 supports completely transparent storage on-disk in addition to memory. That means that everything that can be done with an in-memory container can be done using the disk instead. The advantages of a disk-based container is that your addressable space is much larger than just your available memory. Also, as carray is based on a chunked and compressed data layout based on the super-fast Blosc compression library, and the different cache levels existing in both modern operating systems and the internal carray machinery, the data access speed is very good. The format chosen for the persistence layer is based on the 'bloscpack' library (thanks to Valentin Haenel for his inspiration) and described in 'persistence.rst', although not everything has been implemented yet. You may want to contribute by proposing enhancements to it. See: https://github.com/FrancescAlted/carray/wiki/PersistenceProposal CAVEAT: The bloscpack format is still evolving, so don't trust on forward compatibility of the format, at least until 1.0, where the internal format will be declared frozen. For more detailed info, see the release notes in: https://github.com/FrancescAlted/carray/wiki/Release-0.5 What it is -- carray is a chunked container for numerical data. Chunking allows for efficient enlarging/shrinking of data container. In addition, it can also be compressed for reducing memory/disk needs. The compression process is carried out internally by Blosc, a high-performance compressor that is optimized for binary data. carray can use numexpr internally so as to accelerate many vector and query operations (although it can use pure NumPy for doing so too). numexpr can use optimize the memory usage and use several cores for doing the computations, so it is blazing fast. Moreover, with the introduction of a carray/ctable disk-based container (in version 0.5), it can be used for seamlessly performing out-of-core computations. carray comes with an exhaustive test suite and fully supports both 32-bit and 64-bit platforms. Also, it is typically tested on both UNIX and Windows operating systems. Resources - Visit the main carray site repository at: http://github.com/FrancescAlted/carray You can download a source package from: http://carray.pytables.org/download Manual: http://carray.pytables.org/docs/manual Home of Blosc compressor: http://blosc.pytables.org User's mail list: car...@googlegroups.com http://groups.google.com/group/carray Enjoy! -- Francesc Alted -- http://mail.python.org/mailman/listinfo/python-announce-list Support the Python Software Foundation: http://www.python.org/psf/donations/
ANN: Numexpr 2.0 released
Announcing Numexpr 2.0 Numexpr is a fast numerical expression evaluator for NumPy. With it, expressions that operate on arrays (like "3*a+4*b") are accelerated and use less memory than doing the same calculation in Python. It wears multi-threaded capabilities, as well as support for Intel's VML library, which allows for squeezing the last drop of performance out of your multi-core processors. What's new == This version comes with support for the new iterator in NumPy (introduced in NumPy 1.6), allowing for improved performance in practically all the scenarios (the exception being very small arrays), and most specially for operations implying broadcasting, fortran-ordered or non-native byte orderings. The carefully crafted mix of the new NumPy iterator and direct access to data buffers turned out to be so powerful and flexible, that the internal virtual machine has been completely revamped around this combination. The drawback is that you will need NumPy >= 1.6 to run numexpr 2.0. However, NumPy 1.6 has been released more than 6 months ago now, so we think this is a good time for taking advantage of it. Many thanks to Mark Wiebe for such an important contribution! For some benchmarks on the new virtual machine, see: http://code.google.com/p/numexpr/wiki/NewVM Also, Gaëtan de Menten contributed important bug fixes, code cleanup as well as speed enhancements. Francesc Alted contributed some fixes, and added compatibility code with existing applications (PyTables) too. In case you want to know more in detail what has changed in this version, see: http://code.google.com/p/numexpr/wiki/ReleaseNotes or have a look at RELEASE_NOTES.txt in the tarball. Where I can find Numexpr? = The project is hosted at Google code in: http://code.google.com/p/numexpr/ You can get the packages from PyPI as well: http://pypi.python.org/pypi/numexpr Share your experience = Let us know of any bugs, suggestions, gripes, kudos, etc. you may have. Enjoy! -- Francesc Alted -- http://mail.python.org/mailman/listinfo/python-announce-list Support the Python Software Foundation: http://www.python.org/psf/donations/
ANN: carray 0.4 is out
[Let's hope that the message is complete this time :)] Announcing carray 0.4 = What's new -- The most prominent feature in 0.4 is the support of multidimensional carrays. Than means that, for example, you can do:: >>> a = ca.arange(6).reshape((2,3)) Now, you can access any element in any dimension:: >>> a[:] array([[0, 1, 2], [3, 4, 5]]) >>> a[1] array([3, 4, 5]) >>> a[1,::2] array([3, 5]) >>> a[1,1] 4 Also, all the iterators in carray have received a couple of new parameters that allows to `limit` or `skip` selected elements in queries. Finally, many performance improvements have been implemented (mainly related with efficient zero-detection code). This allows for improved query times when using iterators. See: https://github.com/FrancescAlted/carray/wiki/query-compress for an example on how fast the new iterators can do. For more detailed info, see the release notes in: https://github.com/FrancescAlted/carray/wiki/Release-0.4 What it is -- carray is a chunked container for numerical data. Chunking allows for efficient enlarging/shrinking of data container. In addition, it can also be compressed for reducing memory needs. The compression process is carried out internally by Blosc, a high-performance compressor that is optimized for binary data. carray comes with an exhaustive test suite and fully supports both 32-bit and 64-bit platforms. Also, it is typically tested on both UNIX and Windows operating systems. Resources - Visit the main carray site repository at: http://github.com/FrancescAlted/carray You can download a source package from: http://carray.pytables.org/download Manual: http://carray.pytables.org/docs/manual Home of Blosc compressor: http://blosc.pytables.org User's mail list: car...@googlegroups.com http://groups.google.com/group/carray Enjoy! -- Francesc Alted -- http://mail.python.org/mailman/listinfo/python-announce-list Support the Python Software Foundation: http://www.python.org/psf/donations/
ANN: numexpr 1.4.2 released
== Announcing Numexpr 1.4.2 == Numexpr is a fast numerical expression evaluator for NumPy. With it, expressions that operate on arrays (like "3*a+4*b") are accelerated and use less memory than doing the same calculation in Python. What's new == This is a maintenance release. The most annying issues have been fixed (including the reduction malfunction introduced in 1.4 series). Also, several performance enhancements (specially for VML and small array operation) are included too. In case you want to know more in detail what has changed in this version, see: http://code.google.com/p/numexpr/wiki/ReleaseNotes or have a look at RELEASE_NOTES.txt in the tarball. Where I can find Numexpr? = The project is hosted at Google code in: http://code.google.com/p/numexpr/ You can get the packages from PyPI as well: http://pypi.python.org/pypi/numexpr Share your experience = Let us know of any bugs, suggestions, gripes, kudos, etc. you may have. Enjoy! -- Francesc Alted -- http://mail.python.org/mailman/listinfo/python-announce-list Support the Python Software Foundation: http://www.python.org/psf/donations/
ANN: carray released
= Announcing carray 0.3 = What's new == A lot of stuff. The most outstanding feature in this version is the introduction of a `ctable` object. A `ctable` is similar to a structured array in NumPy, but instead of storing the data row-wise, it uses a column-wise arrangement. This allows for much better performance for very wide tables, which is one of the scenarios where a `ctable` makes more sense. Of course, as `ctable` is based on `carray` objects, it inherits all its niceties (like on-the-flight compression and fast iterators). Also, the `carray` object itself has received many improvements, like new constructors (arange(), fromiter(), zeros(), ones(), fill()), iterators (where(), wheretrue()) or resize mehtods (resize(), trim()). Most of these also work with the new `ctable`. Besides, Numexpr is supported now (but it is optional) in order to carry out stunningly fast queries on `ctable` objects. For example, doing a query on a table with one million rows and one thousand columns can be up to 2x faster than using a plain structured array, and up to 20x faster than using SQLite (using the ":memory:" backend and indexing). See 'bench/ctable-query.py' for details. Finally, binaries for Windows (both 32-bit and 64-bit) are provided. For more detailed info, see the release notes in: https://github.com/FrancescAlted/carray/wiki/Release-0.3 What it is == carray is a container for numerical data that can be compressed in-memory. The compression process is carried out internally by Blosc, a high-performance compressor that is optimized for binary data. Having data compressed in-memory can reduce the stress of the memory subsystem. The net result is that carray operations may be faster than using a traditional ndarray object from NumPy. carray also supports fully 64-bit addressing (both in UNIX and Windows). Below, a carray with 1 trillion of rows has been created (7.3 TB total), filled with zeros, modified some positions, and finally, summed-up:: >>> %time b = ca.zeros(1e12) CPU times: user 54.76 s, sys: 0.03 s, total: 54.79 s Wall time: 55.23 s >>> %time b[[1, 1e9, 1e10, 1e11, 1e12-1]] = (1,2,3,4,5) CPU times: user 2.08 s, sys: 0.00 s, total: 2.08 s Wall time: 2.09 s >>> b carray((1,), float64) nbytes: 7450.58 GB; cbytes: 2.27 GB; ratio: 3275.35 cparams := cparams(clevel=5, shuffle=True) [0.0, 1.0, 0.0, ..., 0.0, 0.0, 5.0] >>> %time b.sum() CPU times: user 10.08 s, sys: 0.00 s, total: 10.08 s Wall time: 10.15 s 15.0 ['%time' is a magic function provided by the IPyhton shell] Please note that the example above is provided for demonstration purposes only. Do not try to run this at home unless you have more than 3 GB of RAM available, or you will get into trouble. Resources = Visit the main carray site repository at: http://github.com/FrancescAlted/carray You can download a source package from: http://carray.pytables.org/download Manual: http://carray.pytables.org/manual Home of Blosc compressor: http://blosc.pytables.org User's mail list: car...@googlegroups.com http://groups.google.com/group/carray Share your experience = Let us know of any bugs, suggestions, gripes, kudos, etc. you may have. Enjoy! -- Francesc Alted -- http://mail.python.org/mailman/listinfo/python-announce-list Support the Python Software Foundation: http://www.python.org/psf/donations/
python-blosc 1.0.3 released
Announcing python-blosc 1.0.3 A Python wrapper for the Blosc compression library What is it? === Blosc (http://blosc.pytables.org) is a high performance compressor optimized for binary data. It has been designed to transmit data to the processor cache faster than the traditional, non-compressed, direct memory fetch approach via a memcpy() OS call. Blosc works well for compressing numerical arrays that contains data with relatively low entropy, like sparse data, time series, grids with regular-spaced values, etc. python-blosc is a Python package that wraps it. What is new? Blosc has been updated to 1.1.3, allowing much improved compression ratio under some circumstances. Also, the number of cores on Windows platform is detected correctly now (thanks to Han Genuit). Last, but not least, Windows binaries for Python 2.6 and 2.7 are provided (both in 32-bit and 64-bit flavors). For more info, you can see the release notes in: https://github.com/FrancescAlted/python-blosc/wiki/Release-notes Basic Usage === # Create a binary string made of int (32-bit) elements >>> import array >>> a = array.array('i', range(10*1000*1000)) >>> bytes_array = a.tostring() # Compress it >>> import blosc >>> bpacked = blosc.compress(bytes_array, typesize=a.itemsize) >>> len(bytes_array) / len(bpacked) 110 # 110x compression ratio. Not bad! # Compression speed? >>> from timeit import timeit >>> timeit("blosc.compress(bytes_array, a.itemsize)", "import blosc, array; \ a = array.array('i', range(10*1000*1000)); \ bytes_array = a.tostring()", \ number=10) 0.040534019470214844 >>> len(bytes_array)*10 / 0.0405 / (1024*1024*1024) 9.1982476505232444 # wow, compressing at ~ 9 GB/s. That's fast! # This is actually much faster than a `memcpy` system call >>> timeit("ctypes.memmove(b.buffer_info()[0], a.buffer_info()[0], \ len(a)*a.itemsize)", "import array, ctypes; \ a = array.array('i', range(10*1000*1000)); \ b = a[::-1]", number=10) 0.10316681861877441 >>> len(bytes_array)*10 / 0.1031 / (1024*1024*1024) 3.6132786600018565 # ~ 3.6 GB/s is memcpy speed # Decompress it >>> bytes_array2 = blosc.decompress(bpacked) # Check whether our data have had a good trip >>> bytes_array == bytes_array2 True# yup, it seems so # Decompression speed? >>> timeit("s2 = blosc.decompress(bpacked)", "import blosc, array; \ a = array.array('i', range(10*1000*1000)); \ bytes_array = a.tostring(); \ bpacked = blosc.compress(bytes_array, a.itemsize)", \ number=10) 0.083872079849243164 > len(bytes_array)*10 / 0.0838 / (1024*1024*1024) 4.4454538167803275 # decompressing at ~ 4.4 GB/s is pretty good too! [Using a machine with 8 physical cores with hyper-threading] The above examples use maximum compression level 9 (default), and although lower compression levels produce smaller compression ratios, they are also faster (reaching speeds exceeding 11 GB/s). More examples showing other features (and using NumPy arrays) are available on the python-blosc wiki page: http://github.com/FrancescAlted/python-blosc/wiki Documentation = Please refer to docstrings. Start by the main package: >>> import blosc >>> help(blosc) and ask for more docstrings in the referenced functions. Download sources Go to: http://github.com/FrancescAlted/python-blosc and download the most recent release from here. Blosc is distributed using the MIT license, see LICENSES/BLOSC.txt for details. Mailing list There is an official mailing list for Blosc at: bl...@googlegroups.com http://groups.google.es/group/blosc **Enjoy data!** -- Francesc Alted -- http://mail.python.org/mailman/listinfo/python-announce-list Support the Python Software Foundation: http://www.python.org/psf/donations/
ANN: PyTables 2.2.1 released
=== Announcing PyTables 2.2.1 === This is maintenance release. The upgrade is recommended for all that are running PyTables in production environments. What's new == Many fixes have been included, as well as a fair bunch of performance improvements. Also, the Blosc compression library has been updated to 1.1.2, in order to prevent locks in some scenarios. Finally, the new evaluation version of PyTables Pro is based on the previous Pro 2.2. In case you want to know more in detail what has changed in this version, have a look at: http://www.pytables.org/moin/ReleaseNotes/Release_2.2.1 You can download a source package with generated PDF and HTML docs, as well as binaries for Windows, from: http://www.pytables.org/download/stable For an on-line version of the manual, visit: http://www.pytables.org/docs/manual-2.2.1 What it is? === PyTables is a library for managing hierarchical datasets and designed to efficiently cope with extremely large amounts of data with support for full 64-bit file addressing. PyTables runs on top of the HDF5 library and NumPy package for achieving maximum throughput and convenient use. Resources = About PyTables: http://www.pytables.org About the HDF5 library: http://hdfgroup.org/HDF5/ About NumPy: http://numpy.scipy.org/ Acknowledgments === Thanks to many users who provided feature improvements, patches, bug reports, support and suggestions. See the ``THANKS`` file in the distribution package for a (incomplete) list of contributors. Most specially, a lot of kudos go to the HDF5 and NumPy (and numarray!) makers. Without them, PyTables simply would not exist. Share your experience = Let us know of any bugs, suggestions, gripes, kudos, etc. you may have. **Enjoy data!** -- The PyTables Team -- Francesc Alted -- http://mail.python.org/mailman/listinfo/python-announce-list Support the Python Software Foundation: http://www.python.org/psf/donations/
ANN: python-blosc 1.0.2
Announcing python-blosc 1.0.2 A Python wrapper for the Blosc compression library What is it? === Blosc (http://blosc.pytables.org) is a high performance compressor optimized for binary data. It has been designed to transmit data to the processor cache faster than the traditional, non-compressed, direct memory fetch approach via a memcpy() OS call. Blosc works well for compressing numerical arrays that contains data with relatively low entropy, like sparse data, time series, grids with regular-spaced values, etc. python-blosc is a Python package that wraps it. What is new? Updated to Blosc 1.1.2. Fixes some bugs when dealing with very small buffers (typically smaller than specified typesizes). Closes #1. Basic Usage === [Using IPython shell and a 2-core machine below] # Create a binary string made of int (32-bit) elements >>> import array >>> a = array.array('i', range(10*1000*1000)) >>> bytes_array = a.tostring() # Compress it >>> import blosc >>> bpacked = blosc.compress(bytes_array, typesize=a.itemsize) >>> len(bytes_array) / len(bpacked) 110 # 110x compression ratio. Not bad! # Compression speed? >>> timeit blosc.compress(bytes_array, typesize=a.itemsize) 100 loops, best of 3: 12.8 ms per loop >>> len(bytes_array) / 0.0128 / (1024*1024*1024) 2.9103830456733704 # wow, compressing at ~ 3 GB/s, that's fast! # Decompress it >>> bytes_array2 = blosc.decompress(bpacked) # Check whether our data have had a good trip >>> bytes_array == bytes_array2 True# yup, it seems so # Decompression speed? >>> timeit blosc.decompress(bpacked) 10 loops, best of 3: 21.3 ms per loop >>> len(bytes_array) / 0.0213 / (1024*1024*1024) 1.7489625814375185 # decompressing at ~ 1.7 GB/s is pretty good too! More examples showing other features (and using NumPy arrays) are available on the python-blosc wiki page: http://github.com/FrancescAlted/python-blosc/wiki Documentation = Please refer to docstrings. Start by the main package: >>> import blosc >>> help(blosc) and ask for more docstrings in the referenced functions. Download sources Go to: http://github.com/FrancescAlted/python-blosc and download the most recent release from here. Blosc is distributed using the MIT license, see LICENSES/BLOSC.txt for details. Mailing list There is an official mailing list for Blosc at: bl...@googlegroups.com http://groups.google.es/group/blosc **Enjoy data!** -- Francesc Alted -- http://mail.python.org/mailman/listinfo/python-announce-list Support the Python Software Foundation: http://www.python.org/psf/donations/
ANN: Numexpr 1.4.1 released
== Announcing Numexpr 1.4.1 == Numexpr is a fast numerical expression evaluator for NumPy. With it, expressions that operate on arrays (like "3*a+4*b") are accelerated and use less memory than doing the same calculation in Python. What's new == This is a maintenance release. On it, several improvements have been done in order to prevent deadlocks in new threaded code (fixes #33). Also the GIL is released now during computations, which should be interesting for embedding numexpr in threaded Python apps. In case you want to know more in detail what has changed in this version, see: http://code.google.com/p/numexpr/wiki/ReleaseNotes or have a look at RELEASE_NOTES.txt in the tarball. Where I can find Numexpr? = The project is hosted at Google code in: http://code.google.com/p/numexpr/ And you can get the packages from PyPI as well: http://pypi.python.org/pypi Share your experience = Let us know of any bugs, suggestions, gripes, kudos, etc. you may have. Enjoy! -- Francesc Alted -- http://mail.python.org/mailman/listinfo/python-announce-list Support the Python Software Foundation: http://www.python.org/psf/donations/
[ANN] python-blosc 1.0.1, a wrapper for the Blosc compression library
Announcing python-blosc 1.0.1 A Python wrapper for the Blosc compression library What is it? === Blosc (http://blosc.pytables.org) is a high performance compressor optimized for binary data. It has been designed to transmit data to the processor cache faster than the traditional, non-compressed, direct memory fetch approach via a memcpy() OS call. Blosc works well for compressing numerical arrays that contains data with relatively low entropy, like sparse data, time series, grids with regular-spaced values, etc. This is a Python package that wraps it. What is new? Everything. This is the first public version of the Python wrapper for Blosc (1.1.1). It supports Python 2.6, 2.7 and 3.1. The API is very simple and it loosely follows that of the zlib module. There are two basic functions, `compress()` and `decompress()`, as well as two additional calls specific for compressing NumPy arrays, namely `pack_array()` and `unpack_array`. There are also utilities for changing dynamically the number of threads used or to release resources when you are not going to need blosc for a while. Basic Usage === >>> import numpy as np >>> a = np.linspace(0, 100, 1e7) >>> bytes_array = a.tostring() >>> import blosc >>> bpacked = blosc.compress(bytes_array, typesize=8) >>> bytes_array2 = blosc.decompress(bpacked) >>> print(bytes_array == bytes_array2) True More examples are available on python-blosc wiki page: http://github.com/FrancescAlted/python-blosc/wiki Documentation = Please refer to docstrings. Start by the main package: >>> import blosc >>> help(blosc) and ask for more docstrings in the referenced functions. Download sources Go to: http://github.com/FrancescAlted/python-blosc and download the most recent release from here. Blosc is distributed using the MIT license, see LICENSES/BLOSC.txt for details. Mailing list There is an official mailing list for Blosc at: bl...@googlegroups.com http://groups.google.es/group/blosc **Enjoy data!** -- Francesc Alted -- http://mail.python.org/mailman/listinfo/python-announce-list Support the Python Software Foundation: http://www.python.org/psf/donations/
ANN: Numexpr 1.4 released
Announcing Numexpr 1.4 Numexpr is a fast numerical expression evaluator for NumPy. With it, expressions that operate on arrays (like "3*a+4*b") are accelerated and use less memory than doing the same calculation in Python. What's new == The main improvement in this version is the support for multi-threading in pure C. Threading in C provides the best performance in nowadays multi-core machines. In addition, this avoids the GIL that hampers performance in many Python apps. Just to wet your appetite, look into this page where the implementation is briefly described and where some benchmarks are shown: http://code.google.com/p/numexpr/wiki/MultiThreadVM In case you want to know more in detail what has changed in this version, see: http://code.google.com/p/numexpr/wiki/ReleaseNotes or have a look at RELEASE_NOTES.txt in the tarball. Where I can find Numexpr? = The project is hosted at Google code in: http://code.google.com/p/numexpr/ And you can get the packages from PyPI as well: http://pypi.python.org/pypi Share your experience = Let us know of any bugs, suggestions, gripes, kudos, etc. you may have. Enjoy! -- Francesc Alted -- http://mail.python.org/mailman/listinfo/python-announce-list Support the Python Software Foundation: http://www.python.org/psf/donations/
PyTables 2.2 released: entering the multi-core age
= Announcing PyTables 2.2 (final) = I'm happy to announce PyTables 2.2 (final). After 18 months of continuous development and testing, this is, by far, the most powerful and well-tested release ever. I hope you like it too. What's new == The main new features in 2.2 series are: * A new compressor called Blosc, designed to read/write data to/from memory at speeds that can be faster than a system `memcpy()` call. With it, many internal PyTables operations that are currently bounded by CPU or I/O bandwith are speed-up. Some benchmarks: http://blosc.pytables.org/trac/wiki/SyntheticBenchmarks And a demonstration on how Blosc can improve PyTables performance: http://www.pytables.org/docs/manual/ch05.html#chunksizeFineTune * Support for HDF5 hard links, soft links and external links (kind of mounting external filesystems). A new tutorial about its usage has been added to the 'Tutorials' chapter of User's Manual. See: http://www.pytables.org/docs/manual/ch03.html#LinksTutorial * A new `tables.Expr` module (based on Numexpr) that allows to do persistent, on-disk computations on many algebraic operations. For a brief look on its performance, see: http://pytables.org/moin/ComputingKernel * Suport for 'fancy' indexing (i.e., à la NumPy) in all the data containers in PyTables. Backported from the implementation in the h5py project. Thanks to Andrew Collette for his fine work on this! * Binaries for both Windows 32-bit and 64-bit are provided now. As always, a large amount of bugs have been addressed and squashed too. In case you want to know more in detail what has changed in this version, have a look at: http://www.pytables.org/moin/ReleaseNotes/Release_2.2 You can download a source package with generated PDF and HTML docs, as well as binaries for Windows, from: http://www.pytables.org/download/stable For an on-line version of the manual, visit: http://www.pytables.org/docs/manual-2.2 What it is? === PyTables is a library for managing hierarchical datasets and designed to efficiently cope with extremely large amounts of data with support for full 64-bit file addressing. PyTables runs on top of the HDF5 library and NumPy package for achieving maximum throughput and convenient use. Resources = About PyTables: http://www.pytables.org About the HDF5 library: http://hdfgroup.org/HDF5/ About NumPy: http://numpy.scipy.org/ Acknowledgments === Thanks to many users who provided feature improvements, patches, bug reports, support and suggestions. See the ``THANKS`` file in the distribution package for a (incomplete) list of contributors. Most specially, a lot of kudos go to the HDF5 and NumPy (and numarray!) makers. Without them, PyTables simply would not exist. Share your experience = Let us know of any bugs, suggestions, gripes, kudos, etc. you may have. **Enjoy data!** -- The PyTables Team -- Francesc Alted -- http://mail.python.org/mailman/listinfo/python-announce-list Support the Python Software Foundation: http://www.python.org/psf/donations/
PyTables 2.2rc2 ready to test
=== Announcing PyTables 2.2rc2 === PyTables is a library for managing hierarchical datasets and designed to efficiently cope with extremely large amounts of data with support for full 64-bit file addressing. PyTables runs on top of the HDF5 library and NumPy package for achieving maximum throughput and convenient use. This is the second (and probably last) release candidate for PyTables 2.2, so please test it as much as you can before I declare the beast stable. The main new features in 2.2 series are: * A new compressor called Blosc, designed to read/write data to/from memory at speeds that can be faster than a system `memcpy()` call. With it, many internal PyTables operations that are currently bounded by CPU or I/O bandwith are speed-up. Some benchmarks: http://blosc.pytables.org/trac/wiki/SyntheticBenchmarks * A new `tables.Expr` module (based on Numexpr) that allows to do persistent, on-disk computations on many algebraic operations. For a brief look on its performance, see: http://pytables.org/moin/ComputingKernel * Support for HDF5 hard links, soft links and automatic external links (kind of mounting external filesystems). A new tutorial about its usage has been added to the 'Tutorials' chapter of User's Manual. * Suport for 'fancy' indexing (i.e., à la NumPy) in all the data containers in PyTables. Backported from the implementation in the h5py project. Thanks to Andrew Collette for his fine work on this! As always, a large amount of bugs have been addressed and squashed too. In case you want to know more in detail what has changed in this version, have a look at: http://www.pytables.org/moin/ReleaseNotes/Release_2.2rc2 You can download a source package with generated PDF and HTML docs, as well as binaries for Windows, from: http://www.pytables.org/download/preliminary For an on-line version of the manual, visit: http://www.pytables.org/docs/manual-2.2rc2 Resources = About PyTables: http://www.pytables.org About the HDF5 library: http://hdfgroup.org/HDF5/ About NumPy: http://numpy.scipy.org/ Acknowledgments === Thanks to many users who provided feature improvements, patches, bug reports, support and suggestions. See the ``THANKS`` file in the distribution package for a (incomplete) list of contributors. Most specially, a lot of kudos go to the HDF5 and NumPy (and numarray!) makers. Without them, PyTables simply would not exist. Share your experience = Let us know of any bugs, suggestions, gripes, kudos, etc. you may have. **Enjoy data!** -- The PyTables Team -- Francesc Alted -- http://mail.python.org/mailman/listinfo/python-announce-list Support the Python Software Foundation: http://www.python.org/psf/donations/
ANN: PyTables 2.2b3 released
=== Announcing PyTables 2.2b3 === PyTables is a library for managing hierarchical datasets and designed to efficiently cope with extremely large amounts of data with support for full 64-bit file addressing. PyTables runs on top of the HDF5 library and NumPy package for achieving maximum throughput and convenient use. This is the third, and most probably last, beta version of 2.2 release. The main addition in this beta version is the addition of Blosc (http://blosc.pytables.org), a high-speed compressor that is meant to work at similar speeds, or higher, than the memory-cache bandwidth in modern processors. This will allow for very high performance in internal, in-memory PyTables computations while still using compression. Remember that Blosc is still in *beta* and it is not meant for production purposes yet. You have been warned! In case you want to know more in detail what has changed in this version, have a look at: http://www.pytables.org/moin/ReleaseNotes/Release_2.2b3 You can download a source package with generated PDF and HTML docs, as well as binaries for Windows, from: http://www.pytables.org/download/preliminary For an on-line version of the manual, visit: http://www.pytables.org/docs/manual-2.2b3 Resources = About PyTables: http://www.pytables.org About the HDF5 library: http://hdfgroup.org/HDF5/ About NumPy: http://numpy.scipy.org/ Acknowledgments === Thanks to many users who provided feature improvements, patches, bug reports, support and suggestions. See the ``THANKS`` file in the distribution package for a (incomplete) list of contributors. Most specially, a lot of kudos go to the HDF5 and NumPy (and numarray!) makers. Without them, PyTables simply would not exist. Share your experience = Let us know of any bugs, suggestions, gripes, kudos, etc. you may have. **Enjoy data!** -- Francesc Alted -- http://mail.python.org/mailman/listinfo/python-announce-list Support the Python Software Foundation: http://www.python.org/psf/donations/
Python-es mailing list changes home
=== Python-es mailing list changes home === Due to technical problems with the site that usually ran the Python-es mailing list (Python list for the Spanish speaking community), we are setting up a new one under the python.org umbrella. Hence, the new list will become (the old one was ). Please feel free to subscribe to the new list in: http://mail.python.org/mailman/listinfo/python-es Thanks! === La lista de distribución Python-es cambia de lugar === Debido a problemas técnicos con el sitio que normalmente albergaba la lista de Python-es (Lista de Python para la comunidad hispano-hablante), estamos configurando una nueva en el sitio python.org. Así que la nueva lista será (en sustitución de la antigua ). Por favor, si lo deseas, date de alta en la nueva lista en: http://mail.python.org/mailman/listinfo/python-es ¡Gracias! Chema Cortes, Oswaldo Hernández y Francesc Alted -- http://mail.python.org/mailman/listinfo/python-announce-list Support the Python Software Foundation: http://www.python.org/psf/donations/
ANN: PyTables 2.2b2 released
=== Announcing PyTables 2.2b2 === PyTables is a library for managing hierarchical datasets and designed to efficiently cope with extremely large amounts of data with support for full 64-bit file addressing. PyTables runs on top of the HDF5 library and NumPy package for achieving maximum throughput and convenient use. This is the second beta version of 2.2 release. The main addition is the support for links. All HDF5 kind of links are supported: hard, soft and external. Hard and soft links are similar to hard and symbolic links in regular UNIX filesystems, while external links are more like mounting external filesystems (in this case, HDF5 files) on top of existing ones. This allows for a considerable degree of flexibility when defining your object tree. See the new tutorial at: http://www.pytables.org/docs/manual-2.2b2/ch03.html#LinksTutorial Also, some other new features (like complete control of HDF5 chunk cache parameters and native compound types in attributes), bug fixes and a couple of (small) API changes happened. In case you want to know more in detail what has changed in this version, have a look at: http://www.pytables.org/moin/ReleaseNotes/Release_2.2b2 You can download a source package with generated PDF and HTML docs, as well as binaries for Windows, from: http://www.pytables.org/download/preliminary For an on-line version of the manual, visit: http://www.pytables.org/docs/manual-2.2b2 Resources = About PyTables: http://www.pytables.org About the HDF5 library: http://hdfgroup.org/HDF5/ About NumPy: http://numpy.scipy.org/ Acknowledgments === Thanks to many users who provided feature improvements, patches, bug reports, support and suggestions. See the ``THANKS`` file in the distribution package for a (incomplete) list of contributors. Most specially, a lot of kudos go to the HDF5 and NumPy (and numarray!) makers. Without them, PyTables simply would not exist. Share your experience = Let us know of any bugs, suggestions, gripes, kudos, etc. you may have. **Enjoy data!** -- The PyTables Team -- Francesc Alted -- http://mail.python.org/mailman/listinfo/python-announce-list Support the Python Software Foundation: http://www.python.org/psf/donations/
ANN: Numexpr 1.3.1 released
== Announcing Numexpr 1.3.1 == Numexpr is a fast numerical expression evaluator for NumPy. With it, expressions that operate on arrays (like "3*a+4*b") are accelerated and use less memory than doing the same calculation in Python. This is a maintenance release. On it, support for the `unit32` type has been added (it is internally upcasted to `int64`), as well as a new `abs()` function (thanks to Pauli Virtanen for the patch). Also, a little tweaking in the treatment of unaligned arrays on Intel architectures allowed for up to 2x speedups in computations involving unaligned arrays. For example, for multiplying 2 arrays (see the included ``unaligned-simple.py`` benchmark), figures before the tweaking were: NumPy aligned: 0.63 s NumPy unaligned:1.66 s Numexpr aligned:0.65 s Numexpr unaligned: 1.09 s while now they are: NumPy aligned: 0.63 s NumPy unaligned:1.65 s Numexpr aligned:0.65 s Numexpr unaligned: 0.57 s <-- almost 2x faster than above You can also see how the unaligned case can be even faster than the aligned one. The explanation is that the 'aligned' array was actually a strided one (actually a column of an structured array), and the total working data size was a bit larger for this case. In case you want to know more in detail what has changed in this version, see: http://code.google.com/p/numexpr/wiki/ReleaseNotes or have a look at RELEASE_NOTES.txt in the tarball. Where I can find Numexpr? = The project is hosted at Google code in: http://code.google.com/p/numexpr/ And you can get the packages from PyPI as well: http://pypi.python.org/pypi How it works? = See: http://code.google.com/p/numexpr/wiki/Overview for a detailed description by the original author (David M. Cooke). Share your experience = Let us know of any bugs, suggestions, gripes, kudos, etc. you may have. Enjoy! -- Francesc Alted -- http://mail.python.org/mailman/listinfo/python-announce-list Support the Python Software Foundation: http://www.python.org/psf/donations/
ANN: Numexpr 1.3 released
Announcing Numexpr 1.3 Numexpr is a fast numerical expression evaluator for NumPy. With it, expressions that operate on arrays (like "3*a+4*b") are accelerated and use less memory than doing the same calculation in Python. On this release, and due to popular demand, support for single precision floating point types has been added. This allows for both improved performance and optimal usage of memory for the single precision computations. Of course, support for single precision in combination with Intel's VML is there too :) However, caveat emptor: the casting rules for floating point types slightly differs from those of NumPy. See the ``Casting rules`` section at: http://code.google.com/p/numexpr/wiki/Overview or the README.txt file for more info on this issue. In case you want to know more in detail what has changed in this version, see: http://code.google.com/p/numexpr/wiki/ReleaseNotes or have a look at RELEASE_NOTES.txt in the tarball. Where I can find Numexpr? = The project is hosted at Google code in: http://code.google.com/p/numexpr/ And you can get the packages from PyPI as well: http://pypi.python.org/pypi How it works? = See: http://code.google.com/p/numexpr/wiki/Overview for a detailed description by the original author (David M. Cooke). Share your experience = Let us know of any bugs, suggestions, gripes, kudos, etc. you may have. Enjoy! -- Francesc Alted -- http://mail.python.org/mailman/listinfo/python-announce-list Support the Python Software Foundation: http://www.python.org/psf/donations.html
[ANN] PyTables 2.1.1 released
=== Announcing PyTables 2.1.1 === PyTables is a library for managing hierarchical datasets and designed to efficiently cope with extremely large amounts of data with support for full 64-bit file addressing. PyTables runs on top of the HDF5 library and NumPy package for achieving maximum throughput and convenient use. This is a maintenance release, so you should not expect API changes. Instead, a handful of bugs, like `File` not being subclassable, incorrectly retrieved default values for data types, a memory leak, and more, have been fixed. Besides, some enhancements have been implemented, like improved Unicode support for filenames, better handling of Unicode attributes, and the possibility to create very large data types exceeding 64 KB in size (with some limitations). Last but not least, this is the first PyTables version fully tested against Python 2.6. It is worth noting that binaries for Windows and Python 2.6 wears the newest HDF5 1.8.2 libraries (instead of the traditional HDF5 1.6.x) now. In case you want to know more in detail what has changed in this version, have a look at: http://www.pytables.org/moin/ReleaseNotes/Release_2.1.1 You can download a source package with generated PDF and HTML docs, as well as binaries for Windows, from: http://www.pytables.org/download/stable For an on-line version of the manual, visit: http://www.pytables.org/docs/manual-2.1.1 You may want to fetch an evaluation version for PyTables Pro from: http://www.pytables.org/download/evaluation Resources = About PyTables: http://www.pytables.org About the HDF5 library: http://www.hdfgroup.org/HDF5/ About NumPy: http://numpy.scipy.org/ Acknowledgments === Thanks to many users who provided feature improvements, patches, bug reports, support and suggestions. See the ``THANKS`` file in the distribution package for a (incomplete) list of contributors. Most specially, a lot of kudos go to the HDF5 and NumPy (and numarray!) makers. Without them, PyTables simply would not exist. Share your experience = Let us know of any bugs, suggestions, gripes, kudos, etc. you may have. **Enjoy data!** -- The PyTables Team -- Francesc Alted -- http://mail.python.org/mailman/listinfo/python-announce-list Support the Python Software Foundation: http://www.python.org/psf/donations.html
ANN: Numexpr 1.2 released
Announcing Numexpr 1.2 Numexpr is a fast numerical expression evaluator for NumPy. With it, expressions that operate on arrays (like "3*a+4*b") are accelerated and use less memory than doing the same calculation in Python. The main feature added in this version is the support of the Intel VML library (many thanks to Gregor Thalhammer for his nice work on this!). In addition, when the VML support is on, several processors can be used in parallel (see the new `set_vml_num_threads()` function). When the VML support is on, the computation of transcendental functions (like trigonometrical, exponential, logarithmic, hyperbolic, power...) can be accelerated quite a few. Typical speed-ups when using one single core for contiguous arrays are around 3x, with peaks of 7.5x (for the pow() function). When using 2 cores the speed-ups are around 4x and 14x respectively. In case you want to know more in detail what has changed in this version, have a look at the release notes: http://code.google.com/p/numexpr/wiki/ReleaseNotes Where I can find Numexpr? = The project is hosted at Google code in: http://code.google.com/p/numexpr/ And you can get the packages from PyPI as well: http://pypi.python.org/pypi How it works? = See: http://code.google.com/p/numexpr/wiki/Overview for a detailed description of the package. Share your experience = Let us know of any bugs, suggestions, gripes, kudos, etc. you may have. Enjoy! -- Francesc Alted -- http://mail.python.org/mailman/listinfo/python-announce-list Support the Python Software Foundation: http://www.python.org/psf/donations.html
ANN: PyTables 2.1 (final) released
=== Announcing PyTables 2.1 === PyTables is a library for managing hierarchical datasets and designed to efficiently cope with extremely large amounts of data with support for full 64-bit file addressing. PyTables runs on top of the HDF5 library and NumPy package for achieving maximum throughput and convenient use. PyTables 2.1 introduces important improvements, like much faster node opening, creation or navigation, a file-based way to fine-tune the different PyTables parameters (fully documented now in a new appendix of the manual) and support for multidimensional atoms in EArray/CArray objects. Regarding the Pro edition, four different kinds of indexes are supported so that the user can choose the best for her needs. Also, and due to the introduction of the concept of chunkmaps in OPSI, the responsiveness of complex queries with low selectivity has improved quite a lot. And last but not least, it is possible now to sort tables by a specific field with no practical limit in size (tables up to 2**48 rows). Also, a lot of work has gone in the reworking of the "Optimization tips" chapter of the manual where many benchmarks have been redone using newer software and machines and a few new sections have been added. In particular, see the new "Fine-tuning the chunksize" section where you will find an in-deep introduction to the subject of chunking and the "Indexing and Solid State Disks (SSD)" where the advantages of using low-latency SSD disks have been analysed in the context of indexation. In case you want to know more in detail what has changed in this version, have a look at ``RELEASE_NOTES.txt`` in the tarball. Find the HTML version for this document at: http://www.pytables.org/moin/ReleaseNotes/Release_2.1 You can download a source package of the version 2.1 with generated PDF and HTML docs and binaries for Windows from http://www.pytables.org/download/stable For an on-line version of the manual, visit: http://www.pytables.org/docs/manual-2.1 Finally, you can get an evaluation version for PyTables Pro in: http://www.pytables.org/download/evaluation Resources = Go to the PyTables web site for more details: http://www.pytables.org About the HDF5 library: http://hdfgroup.org/HDF5/ About NumPy: http://numpy.scipy.org/ Acknowledgments === Thanks to many users who provided feature improvements, patches, bug reports, support and suggestions. See the ``THANKS`` file in the distribution package for a (incomplete) list of contributors. Many thanks also to SourceForge who have helped to make and distribute this package! And last, but not least thanks a lot to the HDF5 and NumPy (and numarray!) makers. Without them PyTables simply would not exist. Share your experience = Let us know of any bugs, suggestions, gripes, kudos, etc. you may have. **Enjoy data!** -- The PyTables Team -- Francesc Alted -- http://mail.python.org/mailman/listinfo/python-announce-list Support the Python Software Foundation: http://www.python.org/psf/donations.html
ANN: PyTables 2.1rc2 ready for testing
=== Announcing PyTables 2.1rc2 === PyTables is a library for managing hierarchical datasets and designed to efficiently cope with extremely large amounts of data with support for full 64-bit file addressing. PyTables runs on top of the HDF5 library and NumPy package for achieving maximum throughput and convenient use. This is the second release candidate for 2.1, and I have decided to release it because many bugs have been fixed and some enhancements have been added since 2.1rc1. For details, see the ``RELEASE_NOTES.txt`` at: http://www.pytables.org/moin/ReleaseNotes/Release_2.1rc2 PyTables 2.1 introduces important improvements, like much faster node opening, creation or navigation, a file-based way to fine-tune the different PyTables parameters (fully documented now in a new appendix of the UG) and support for multidimensional atoms in EArray/CArray objects. Regarding the Pro edition, four different kind of indexes are supported so that the user can choose the best for her needs. Also, and due to the introduction of the concept of chunkmaps in OPSI, the responsiveness of complex queries with low selectivity has improved quite a lot. And last but not least, it is possible now to sort tables by a specific field with no practical limit in size (tables up to 2**48 rows). You can download a source package of the version 2.1rc2 with generated PDF and HTML docs and binaries for Windows from http://www.pytables.org/download/preliminary Finally, and for the first time, an evaluation version for PyTables Pro has been made available in: http://www.pytables.org/download/evaluation Please read the evaluation license for terms of use of this version: http://www.pytables.org/moin/PyTablesProEvaluationLicense For an on-line version of the manual, visit: http://www.pytables.org/docs/manual-2.1rc2 Resources = Go to the PyTables web site for more details: http://www.pytables.org About the HDF5 library: http://hdfgroup.org/HDF5/ About NumPy: http://numpy.scipy.org/ Acknowledgments === Thanks to many users who provided feature improvements, patches, bug reports, support and suggestions. See the ``THANKS`` file in the distribution package for a (incomplete) list of contributors. Many thanks also to SourceForge who have helped to make and distribute this package! And last, but not least thanks a lot to the HDF5 and NumPy (and numarray!) makers. Without them PyTables simply would not exist. Share your experience = Let us know of any bugs, suggestions, gripes, kudos, etc. you may have. **Enjoy data!** -- The PyTables Team -- Francesc Alted -- http://mail.python.org/mailman/listinfo/python-announce-list Support the Python Software Foundation: http://www.python.org/psf/donations.html
ANN: PyTables 2.1rc1 ready for testing
Announcing PyTables 2.1rc1 PyTables is a library for managing hierarchical datasets and designed to efficiently cope with extremely large amounts of data with support for full 64-bit file addressing. PyTables runs on top of the HDF5 library and NumPy package for achieving maximum throughput and convenient use. In PyTables 2.1rc1 many new features and a handful of bugs have been addressed. This is a release candidate, so, in addition to the tarball, binaries for Windows are provided too. Also, the API has been frozen and you should only expect bug fixes and documentation improvements for 2.1 final (due to release in a couple of weeks now). This version introduces important improvements, like much faster node opening, creation or navigation, a file-based way to fine-tune the different PyTables parameters (fully documented now in a new appendix of the UG) and support for multidimensional atoms in EArray/CArray objects. Regarding the Pro edition, 3 different kind of indexes have been added so that the user can choose the best for her needs. Also, and due to the introduction of the concept of chunkmaps in OPSI, the responsiveness of complex queries with low selectivity has improved quite a lot. And last but not least, it is possible now to sort completely tables that are ordered by a specific field, with no practical limit in size (up to 2**48 rows, that is, around 281 trillion of rows). More info in: http://www.pytables.org/moin/PyTablesPro#WhatisnewinforthcomingPyTablesPro2.1 In case you want to know more in detail what has changed in this version, have a look at ``RELEASE_NOTES.txt`` in the tarball. Find the HTML version for this document at: http://www.pytables.org/moin/ReleaseNotes/Release_2.1rc1 You can download a source package of the version 2.1rc1 with generated PDF and HTML docs and binaries for Windows from http://www.pytables.org/download/preliminary Finally, and for the first time, an evaluation version for PyTables Pro has been made available in: http://www.pytables.org/download/evaluation Please read the evaluation license for terms of use of this version: http://www.pytables.org/moin/PyTablesProEvaluationLicense For an on-line version of the manual, visit: http://www.pytables.org/docs/manual-2.1rc1 Resources = Go to the PyTables web site for more details: http://www.pytables.org About the HDF5 library: http://hdfgroup.org/HDF5/ About NumPy: http://numpy.scipy.org/ Acknowledgments === Thanks to many users who provided feature improvements, patches, bug reports, support and suggestions. See the ``THANKS`` file in the distribution package for a (incomplete) list of contributors. Many thanks also to SourceForge who have helped to make and distribute this package! And last, but not least thanks a lot to the HDF5 and NumPy (and numarray!) makers. Without them PyTables simply would not exist. Share your experience = Let us know of any bugs, suggestions, gripes, kudos, etc. you may have. **Enjoy data!** -- The PyTables Team -- Francesc Alted -- http://mail.python.org/mailman/listinfo/python-announce-list Support the Python Software Foundation: http://www.python.org/psf/donations.html
ANN: PyTables 2.0.4 available
=== Announcing PyTables 2.0.4 === PyTables is a library for managing hierarchical datasets and designed to efficiently cope with extremely large amounts of data with support for full 64-bit file addressing. PyTables runs on top of the HDF5 library and NumPy package for achieving maximum throughput and convenient use. After some months without new versions (I have been busy for a while doing things not related with PyTables, unfortunately), I'm happy to announce the availability of PyTables 2.0.4. It fixes some important issues, and now it is possible to use table selections in threaded environments. Also, ``EArray.truncate(0)`` can be used so that you can completely void existing EArrays (only enabled if you have a recent version, i.e. >= 1.8.0, of the HDF5 library installed). Besides, the compatibility with native HDF5 files has been improved too. Finally, the usage of recent versions of NumPy (1.1) and HDF5 (1.8.1) has been tested and, fortunately, they work just fine. In case you want to know more in detail what has changed in this version, have a look at ``RELEASE_NOTES.txt``. Find the HTML version for this document at: http://www.pytables.org/moin/ReleaseNotes/Release_2.0.4 You can download a source package of the version 2.0.4 with generated PDF and HTML docs and binaries for Windows from http://www.pytables.org/download/stable/ For an on-line version of the manual, visit: http://www.pytables.org/docs/manual-2.0.4 *Important note for PyTables Pro users*: due to lack of resources, I'll not be delivering a MacOSX binary version of Pro for the time being (this is pretty easy to compile, though). However, I'll continue offering the all-in-one binary for Windows (32-bit). Migration Notes for PyTables 1.x users == If you are a user of PyTables 1.x, probably it is worth for you to look at ``MIGRATING_TO_2.x.txt`` file where you will find directions on how to migrate your existing PyTables 1.x apps to the 2.x versions. You can find an HTML version of this document at http://www.pytables.org/moin/ReleaseNotes/Migrating_To_2.x Resources = Go to the PyTables web site for more details: http://www.pytables.org About the HDF5 library: http://hdfgroup.org/HDF5/ About NumPy: http://numpy.scipy.org/ Acknowledgments === Thanks to many users who provided feature improvements, patches, bug reports, support and suggestions. See the ``THANKS`` file in the distribution package for a (incomplete) list of contributors. Many thanks also to SourceForge who have helped to make and distribute this package! And last, but not least thanks a lot to the HDF5 and NumPy (and numarray!) makers. Without them, PyTables simply would not exist. Share your experience = Let me know of any bugs, suggestions, gripes, kudos, etc. you may have. **Enjoy your data!** -- Francesc Alted -- http://mail.python.org/mailman/listinfo/python-announce-list Support the Python Software Foundation: http://www.python.org/psf/donations.html