+1 for making use_fastavro the default for Python3. I don't see any
significant drawbacks in doing this from Beam's point of view. One concern
is whether avro and fastavro can safely co-exist in the same environment so
that Beam continues to work for users who already have avro library
installed.

Note that there are two use_fastavro flags (confusingly enough).
(1) for avro file source [1]
(2) an experiment flag [2] with the same name that makes Dataflow runner
use fastavro library for reading/writing intermediate files and for reading
Avro files exported by BigQuery.

I can help with the latter.

[1]
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/avroio.py#L81
[2]
https://lists.apache.org/thread.html/94bd362a3a041654e6ef9003fb3fa797e25274fdb4766065481a0796@%3Cuser.beam.apache.org%3E

Thanks,
Cham

On Wed, Mar 27, 2019 at 3:27 PM Valentyn Tymofieiev <[email protected]>
wrote:

> Thanks, Robbe and Frederik, for raising this.
>
> Over the course of making Beam Python 3 compatible this is at least the
> second time [1] we have to deal with an error in avro-python3 package. The
> release cadence of Apache Avro (1 release a year)
> is concerning to me [2]. Even if we have a new release with Python 3 fixes
> soon, as Beam users start use Beam more actively on Python 3, we may
> encounter more issues in avro-python3. If this happens, Beam will have to
> monkey-patch its way around the avro-python3 issues, because waiting for
> next Avro release may not be practical.
>
> So, I agree that it is be a good time to start transitioning off of
> avro/avro-python3 dependency, given that fastavro is known to be a faster
> alternative [3], and is released monthly[4]
>
> There are couple of ways to make this transition depending on how careful
> we want to be. We should:
>
> 1. Remove the dependency on avro in the current codepath whenever fastavro
> is used, as you propose.
> 2. Remove Beam dependency on avro-python3 now,  OR,  if we want to be
> safer,  set use_fastavro=True a default option on Python 3, but keep the
> dependency on avro-python3, and keep that codepath, even though it may not
> work right now on Py3, but might work after next Avro release.
> 3. set use_fastavro=True a default option on Python 2.
> 4. Remove Beam dependency on avro and avro-python3 after several releases.
>
> Adding +Chamikara Jayalath <[email protected]> and +Udi Meiri
> <[email protected]> who have been working on Beam IOs may have some
> thoughts here. Do you think that it is safe to make use_fastavro=True a
> default option for both Py2 and Py3 now? If we make use_fastavro a default
> option on Py3, do you think there is a benefit to still keep the Avro
> codepath on Py3, or we can remove it?
>
> Thanks,
> Valentyn
>
> [1] https://github.com/apache/avro/pull/436
> [2] https://avro.apache.org/releases.html
> [3]
> https://medium.com/@abrarsheikh/benchmarking-avro-and-fastavro-using-pytest-benchmark-tox-and-matplotlib-bd7a83964453
> [4] https://pypi.org/project/fastavro/#history
>
> On Wed, Mar 27, 2019 at 10:49 AM Robbe Sneyders <[email protected]>
> wrote:
>
>> Hi all,
>>
>> We're looking at fixing avroio on Python 3, which still fails due to a
>> non-picklable schema class in Avro [1]. This is fixed when using the latest
>> Avro master, but the last release dates back to May 2017.
>>
>> Fastavro does not have the same problem, but is currently also failing
>> due to a dependency of avroio on Avro for schema parsing.
>>
>> We would therefore propose to (temporarily?) deprecate Avro on Python 3,
>> and implement a pure fastavro solution instead. +Frederik Bode
>> <[email protected]>  already submitted a PR for this [2].
>>
>> Use of fastavro is currently activated with the `use_fastavro` flag,
>> which defaults to False. Since this flag would not make sense anymore on
>> Python 3, we would like to switch the default value to True. The
>> documentation already mentions that this will probably become the default
>> on the long term, but this change would also impact Python 2. Is this a
>> problem?
>>
>> Also, looking at the performance gain of fastavro, is there any reason to
>> not deprecate Avro in favor of fastavro on Python 3 indefinitely?
>>
>> [1] https://issues.apache.org/jira/browse/BEAM-6522#comment-16784499
>> [2] https://github.com/apache/beam/pull/8130
>>
>> Kind regards,
>> Robbe
>>
>

Reply via email to