Thanks, Robbe and Frederik, for raising this. Over the course of making Beam Python 3 compatible this is at least the second time [1] we have to deal with an error in avro-python3 package. The release cadence of Apache Avro (1 release a year) is concerning to me [2]. Even if we have a new release with Python 3 fixes soon, as Beam users start use Beam more actively on Python 3, we may encounter more issues in avro-python3. If this happens, Beam will have to monkey-patch its way around the avro-python3 issues, because waiting for next Avro release may not be practical.
So, I agree that it is be a good time to start transitioning off of avro/avro-python3 dependency, given that fastavro is known to be a faster alternative [3], and is released monthly[4] There are couple of ways to make this transition depending on how careful we want to be. We should: 1. Remove the dependency on avro in the current codepath whenever fastavro is used, as you propose. 2. Remove Beam dependency on avro-python3 now, OR, if we want to be safer, set use_fastavro=True a default option on Python 3, but keep the dependency on avro-python3, and keep that codepath, even though it may not work right now on Py3, but might work after next Avro release. 3. set use_fastavro=True a default option on Python 2. 4. Remove Beam dependency on avro and avro-python3 after several releases. Adding +Chamikara Jayalath <[email protected]> and +Udi Meiri <[email protected]> who have been working on Beam IOs may have some thoughts here. Do you think that it is safe to make use_fastavro=True a default option for both Py2 and Py3 now? If we make use_fastavro a default option on Py3, do you think there is a benefit to still keep the Avro codepath on Py3, or we can remove it? Thanks, Valentyn [1] https://github.com/apache/avro/pull/436 [2] https://avro.apache.org/releases.html [3] https://medium.com/@abrarsheikh/benchmarking-avro-and-fastavro-using-pytest-benchmark-tox-and-matplotlib-bd7a83964453 [4] https://pypi.org/project/fastavro/#history On Wed, Mar 27, 2019 at 10:49 AM Robbe Sneyders <[email protected]> wrote: > Hi all, > > We're looking at fixing avroio on Python 3, which still fails due to a > non-picklable schema class in Avro [1]. This is fixed when using the latest > Avro master, but the last release dates back to May 2017. > > Fastavro does not have the same problem, but is currently also failing due > to a dependency of avroio on Avro for schema parsing. > > We would therefore propose to (temporarily?) deprecate Avro on Python 3, > and implement a pure fastavro solution instead. +Frederik Bode > <[email protected]> already submitted a PR for this [2]. > > Use of fastavro is currently activated with the `use_fastavro` flag, which > defaults to False. Since this flag would not make sense anymore on Python > 3, we would like to switch the default value to True. The documentation > already mentions that this will probably become the default on the long > term, but this change would also impact Python 2. Is this a problem? > > Also, looking at the performance gain of fastavro, is there any reason to > not deprecate Avro in favor of fastavro on Python 3 indefinitely? > > [1] https://issues.apache.org/jira/browse/BEAM-6522#comment-16784499 > [2] https://github.com/apache/beam/pull/8130 > > Kind regards, > Robbe >
