Hi,

i recently evaluated the performance of various python avro implementations. Besides the official python implementation and fastavro there is a fourth implementation called pyavroc [1]. pyavroc seems to be even faster than fastavro in terms of parsing performance but it uses the avro C library with python bindings rather than pure python. I am not sure if this is desired but maybe it could be a good option to develop fastavro in a way that it is possible to integrate the avro C into the code in order to improve the performance (in addition i am not sure if optimizing the code for cython could might improve the performance to similar level). In addition pyavroc does not seem to have much API compatibility so i am not sure what should be focus, API compatibility or performance.

In terms of parsing performance i found the following (normalized against normal python avro 1.7.7):
avro_python:    1
fastavro:            0.2717 (-> i am not sure if i used cython correctly)
pyavroc: 0.0285 (only functions used that are built-in in python, means no numpy or sth. similar)

The results were more or less stable with various tests and files.


Cheers

[1] https://github.com/Byhiras/pyavroc

Am 29.10.2015 um 14:23 schrieb Sean Busbey:
sounds great to me.

On Wed, Oct 28, 2015 at 1:14 PM, Ryan Blue <[email protected]> wrote:
Hi everyone,

Right now, we have two python implementations: py and py3. And there is also
fastavro [1], which is popular because it is fast and more pythonic. It also
works with python 2.7, python 3.x, pypy, and can be sped up by cython.

I had a recent e-mail exchange with Miki Tebeka, the creator and maintainer
of fastavro, about the current python Avro implementations and he's
interested in working with the Apache community to merge the existing
implementations into one. I'm really excited about it, since this is a great
opportunity to grow the Avro community and consolidate the python
implementations.

I'd like to start a discussion from this thread about next steps. I think
the best way forward is to bring fastavro in, and then work on building
compatibility with the current APIs where we need to so that we can
deprecate the existing py and py3 projects.

Does that sound reasonable?

rb


[1]: https://github.com/tebeka/fastavro

--
Ryan Blue
Software Engineer
Cloudera, Inc.



Reply via email to