Hi,
i recently evaluated the performance of various python avro
implementations. Besides the official python implementation and fastavro
there is a fourth implementation called pyavroc [1]. pyavroc seems to be
even faster than fastavro in terms of parsing performance but it uses
the avro C library with python bindings rather than pure python. I am
not sure if this is desired but maybe it could be a good option to
develop fastavro in a way that it is possible to integrate the avro C
into the code in order to improve the performance (in addition i am not
sure if optimizing the code for cython could might improve the
performance to similar level). In addition pyavroc does not seem to have
much API compatibility so i am not sure what should be focus, API
compatibility or performance.
In terms of parsing performance i found the following (normalized
against normal python avro 1.7.7):
avro_python: 1
fastavro: 0.2717 (-> i am not sure if i used cython correctly)
pyavroc: 0.0285 (only functions used that are built-in in
python, means no numpy or sth. similar)
The results were more or less stable with various tests and files.
Cheers
[1] https://github.com/Byhiras/pyavroc
Am 29.10.2015 um 14:23 schrieb Sean Busbey:
sounds great to me.
On Wed, Oct 28, 2015 at 1:14 PM, Ryan Blue <[email protected]> wrote:
Hi everyone,
Right now, we have two python implementations: py and py3. And there is also
fastavro [1], which is popular because it is fast and more pythonic. It also
works with python 2.7, python 3.x, pypy, and can be sped up by cython.
I had a recent e-mail exchange with Miki Tebeka, the creator and maintainer
of fastavro, about the current python Avro implementations and he's
interested in working with the Apache community to merge the existing
implementations into one. I'm really excited about it, since this is a great
opportunity to grow the Avro community and consolidate the python
implementations.
I'd like to start a discussion from this thread about next steps. I think
the best way forward is to bring fastavro in, and then work on building
compatibility with the current APIs where we need to so that we can
deprecate the existing py and py3 projects.
Does that sound reasonable?
rb
[1]: https://github.com/tebeka/fastavro
--
Ryan Blue
Software Engineer
Cloudera, Inc.