Thanks, Robbe and Frederik, for raising this.

Over the course of making Beam Python 3 compatible this is at least the
second time [1] we have to deal with an error in avro-python3 package. The
release cadence of Apache Avro (1 release a year)
is concerning to me [2]. Even if we have a new release with Python 3 fixes
soon, as Beam users start use Beam more actively on Python 3, we may
encounter more issues in avro-python3. If this happens, Beam will have to
monkey-patch its way around the avro-python3 issues, because waiting for
next Avro release may not be practical.

So, I agree that it is be a good time to start transitioning off of
avro/avro-python3 dependency, given that fastavro is known to be a faster
alternative [3], and is released monthly[4]

There are couple of ways to make this transition depending on how careful
we want to be. We should:

1. Remove the dependency on avro in the current codepath whenever fastavro
is used, as you propose.
2. Remove Beam dependency on avro-python3 now,  OR,  if we want to be
safer,  set use_fastavro=True a default option on Python 3, but keep the
dependency on avro-python3, and keep that codepath, even though it may not
work right now on Py3, but might work after next Avro release.
3. set use_fastavro=True a default option on Python 2.
4. Remove Beam dependency on avro and avro-python3 after several releases.

Adding +Chamikara Jayalath <chamik...@google.com> and +Udi Meiri
<eh...@google.com> who have been working on Beam IOs may have some thoughts
here. Do you think that it is safe to make use_fastavro=True a default
option for both Py2 and Py3 now? If we make use_fastavro a default option
on Py3, do you think there is a benefit to still keep the Avro codepath on
Py3, or we can remove it?

Thanks,
Valentyn

[1] https://github.com/apache/avro/pull/436
[2] https://avro.apache.org/releases.html
[3]
https://medium.com/@abrarsheikh/benchmarking-avro-and-fastavro-using-pytest-benchmark-tox-and-matplotlib-bd7a83964453
[4] https://pypi.org/project/fastavro/#history

On Wed, Mar 27, 2019 at 10:49 AM Robbe Sneyders <robbe.sneyd...@ml6.eu>
wrote:

> Hi all,
>
> We're looking at fixing avroio on Python 3, which still fails due to a
> non-picklable schema class in Avro [1]. This is fixed when using the latest
> Avro master, but the last release dates back to May 2017.
>
> Fastavro does not have the same problem, but is currently also failing due
> to a dependency of avroio on Avro for schema parsing.
>
> We would therefore propose to (temporarily?) deprecate Avro on Python 3,
> and implement a pure fastavro solution instead. +Frederik Bode
> <frederik.b...@ml6.eu>  already submitted a PR for this [2].
>
> Use of fastavro is currently activated with the `use_fastavro` flag, which
> defaults to False. Since this flag would not make sense anymore on Python
> 3, we would like to switch the default value to True. The documentation
> already mentions that this will probably become the default on the long
> term, but this change would also impact Python 2. Is this a problem?
>
> Also, looking at the performance gain of fastavro, is there any reason to
> not deprecate Avro in favor of fastavro on Python 3 indefinitely?
>
> [1] https://issues.apache.org/jira/browse/BEAM-6522#comment-16784499
> [2] https://github.com/apache/beam/pull/8130
>
> Kind regards,
> Robbe
>

Reply via email to