Hi all,

Thank you for the feedback. Looking at the responses, it seems like there
is a consensus to move forward with fastavro as the default implementation
on Python 3.

There are 2 questions left however:
- Should fastavro also become the default implementation on Python 2?
This is a trade-off between having a consistent API across Python versions,
or keeping the current behavior on Python 2.

- Should we keep the avro-python3 dependency?
With the proposed solution, we could remove the avro-python3 dependency,
but it might have to be re-added if we want to support Avro again on Python
3 in a future version.

Kind regards,
Robbe

[image: https://ml6.eu] <https://ml6.eu/>

* Robbe Sneyders*

ML6 Gent
<https://www.google.be/maps/place/ML6/@51.037408,3.7044893,17z/data=!3m1!4b1!4m5!3m4!1s0x47c37161feeca14b:0xb8f72585fdd21c90!8m2!3d51.037408!4d3.706678?hl=nl>

M: +32 474 71 31 08


On Thu, 28 Mar 2019 at 18:28, Ahmet Altay <[email protected]> wrote:

> Hi Ismaël,
>
> It is great to hear that Avro is planning to make a release soon.
>
> To answer your concerns, fastavro has a set of tests using regular avro
> files[1] and it also has a large set of users (with 675470 package
> downloads). This is in addition to it being a py2 & py3 compatible package
> and offering ~7x performance improvements [2]. Another data point, we were
> testing fastavro for a while behind an experimental flag and have not seen
> issues related compatibility.
>
> pyavro-rs sounds promising however I could not find a released version of
> it on pypi. The source code does not look like being maintained either with
> last commit on Jul 2, 2018. (for comparison last change on fastavro was on
> Mar 19, 2019).
>
> I think given the state of things, it makes sense to switch to fastavro as
> the default implementation to unblock python 3 changes. When avro offers a
> similar level of performance we could switch back without any visible user
> impact.
>
> Ahmet
>
> [1] https://github.com/fastavro/fastavro/tree/master/tests
> [2] https://pypi.org/project/fastavro/
>
> On Thu, Mar 28, 2019 at 7:53 AM Ismaël Mejía <[email protected]> wrote:
>
>> Hello,
>>
>> The problem of switching implementations is the risk of losing
>> interoperability, and this is more important than performance. Does
>> fastavro have tests that guarantee that it is fully compatible with
>> Avro’s Java version? (given that it is the de-facto implementation
>> used everywhere).
>>
>> If performance is a more important criteria maybe it is worth to check
>> at pyavro-rs [1], you can take a look at its performance in the great
>> talk of last year [2].
>>
>> I have been involved actively in the Avro community in the last months
>> and I am now a committer there. Also Dan Kulp who has done multiple
>> contributions in Beam is now a PMC member too. We are at this point
>> working hard to get the next release of Avro out, actually the branch
>> cut of Avro 1.9.0 is happening this week, and we plan to improve the
>> release cadence. Please understand that the issue with Avro is that it
>> is a really specific and ‘old‘ project (~10 years) so part of the
>> active moved to other areas because it is stable, but we are still
>> there working on it and we are eager to improve it for everyone’s
>> needs (and of course Beam needs).
>>
>> I know that Python 3’s Avro implementation is still lacking and could
>> be improved (views expressed here are clearly valid), but maybe this
>> is a chance to contribute there too. Remember Apache projects are a
>> family and we have a history of cross colaboration with other
>> communities e.g. Flink, Calcite so why not give it a chance to Avro
>> too.
>>
>> Regards,
>> Ismaël
>>
>> [1] https://github.com/flavray/pyavro-rs
>> [2]
>> https://ep2018.europython.eu/media/conference/slides/how-to-write-rust-instead-of-c-and-get-away-with-it-yes-its-a-python-talk.pdf
>>
>> On Wed, Mar 27, 2019 at 11:42 PM Chamikara Jayalath
>> <[email protected]> wrote:
>> >
>> > +1 for making use_fastavro the default for Python3. I don't see any
>> significant drawbacks in doing this from Beam's point of view. One concern
>> is whether avro and fastavro can safely co-exist in the same environment so
>> that Beam continues to work for users who already have avro library
>> installed.
>> >
>> > Note that there are two use_fastavro flags (confusingly enough).
>> > (1) for avro file source [1]
>> > (2) an experiment flag [2] with the same name that makes Dataflow
>> runner use fastavro library for reading/writing intermediate files and for
>> reading Avro files exported by BigQuery.
>> >
>> > I can help with the latter.
>> >
>> > [1]
>> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/avroio.py#L81
>> > [2]
>> https://lists.apache.org/thread.html/94bd362a3a041654e6ef9003fb3fa797e25274fdb4766065481a0796@%3Cuser.beam.apache.org%3E
>> >
>> > Thanks,
>> > Cham
>> >
>> > On Wed, Mar 27, 2019 at 3:27 PM Valentyn Tymofieiev <
>> [email protected]> wrote:
>> >>
>> >> Thanks, Robbe and Frederik, for raising this.
>> >>
>> >> Over the course of making Beam Python 3 compatible this is at least
>> the second time [1] we have to deal with an error in avro-python3 package.
>> The release cadence of Apache Avro (1 release a year)
>> >> is concerning to me [2]. Even if we have a new release with Python 3
>> fixes soon, as Beam users start use Beam more actively on Python 3, we may
>> encounter more issues in avro-python3. If this happens, Beam will have to
>> monkey-patch its way around the avro-python3 issues, because waiting for
>> next Avro release may not be practical.
>> >>
>> >> So, I agree that it is be a good time to start transitioning off of
>> avro/avro-python3 dependency, given that fastavro is known to be a faster
>> alternative [3], and is released monthly[4]
>> >>
>> >> There are couple of ways to make this transition depending on how
>> careful we want to be. We should:
>> >>
>> >> 1. Remove the dependency on avro in the current codepath whenever
>> fastavro is used, as you propose.
>> >> 2. Remove Beam dependency on avro-python3 now,  OR,  if we want to be
>> safer,  set use_fastavro=True a default option on Python 3, but keep the
>> dependency on avro-python3, and keep that codepath, even though it may not
>> work right now on Py3, but might work after next Avro release.
>> >> 3. set use_fastavro=True a default option on Python 2.
>> >> 4. Remove Beam dependency on avro and avro-python3 after several
>> releases.
>> >>
>> >> Adding +Chamikara Jayalath and +Udi Meiri who have been working on
>> Beam IOs may have some thoughts here. Do you think that it is safe to make
>> use_fastavro=True a default option for both Py2 and Py3 now? If we make
>> use_fastavro a default option on Py3, do you think there is a benefit to
>> still keep the Avro codepath on Py3, or we can remove it?
>> >>
>> >> Thanks,
>> >> Valentyn
>> >>
>> >> [1] https://github.com/apache/avro/pull/436
>> >> [2] https://avro.apache.org/releases.html
>> >> [3]
>> https://medium.com/@abrarsheikh/benchmarking-avro-and-fastavro-using-pytest-benchmark-tox-and-matplotlib-bd7a83964453
>> >> [4] https://pypi.org/project/fastavro/#history
>> >>
>> >> On Wed, Mar 27, 2019 at 10:49 AM Robbe Sneyders <[email protected]>
>> wrote:
>> >>>
>> >>> Hi all,
>> >>>
>> >>> We're looking at fixing avroio on Python 3, which still fails due to
>> a non-picklable schema class in Avro [1]. This is fixed when using the
>> latest Avro master, but the last release dates back to May 2017.
>> >>>
>> >>> Fastavro does not have the same problem, but is currently also
>> failing due to a dependency of avroio on Avro for schema parsing.
>> >>>
>> >>> We would therefore propose to (temporarily?) deprecate Avro on Python
>> 3, and implement a pure fastavro solution instead. +Frederik Bode  already
>> submitted a PR for this [2].
>> >>>
>> >>> Use of fastavro is currently activated with the `use_fastavro` flag,
>> which defaults to False. Since this flag would not make sense anymore on
>> Python 3, we would like to switch the default value to True. The
>> documentation already mentions that this will probably become the default
>> on the long term, but this change would also impact Python 2. Is this a
>> problem?
>> >>>
>> >>> Also, looking at the performance gain of fastavro, is there any
>> reason to not deprecate Avro in favor of fastavro on Python 3 indefinitely?
>> >>>
>> >>> [1] https://issues.apache.org/jira/browse/BEAM-6522#comment-16784499
>> >>> [2] https://github.com/apache/beam/pull/8130
>> >>>
>> >>> Kind regards,
>> >>> Robbe
>>
>

Reply via email to