Hi Claire, Thanks for bringing this up.
Since Avro has incompatibilities between these releases (which is natural since the second number of the Avro versions is considered to be the major one), we only can state compatibility with one if we actually test with it. So, I would vote on your second proposal or even both. Which Avro version do you think we shall support? (I think we need to support the latest one, and all the major ones below we think are required.) I am not sure if we need separate modules to be actually released for the different Avro versions or this is only required for testing. For the first case, it'll be quite obvious which Avro version we support since it'll be part of the package naming. If you want to invest efforts in this, I am happy to help with reviewing. Cheers, Gabor Claire McGinty <[email protected]> ezt írta (időpont: 2024. aug. 26., H, 19:01): > Hi all, > > I wanted to start a thread discussing Avro cross-version support in > parquet-java. The parquet-avro module has been on Avro 1.11 since the 1.13 > release, but since then we've made fixes and added feature support for Avro > 1.8 APIs (ex1 <https://github.com/apache/parquet-java/pull/2957>, ex2 > <https://github.com/apache/parquet-java/pull/2993>). > > Mostly the Avro APIs referenced by parquet-avro are > cross-version-compatible, with a few exceptions: > > - Evolution of Schema constructor APIs > - New logical types (i.e., local timestamp and UUID) > - Renamed logical type conversion helpers > - Generated code for datetime types using Java Time vs Joda Time for > setters/getters > > Some of these are hard to catch when Parquet is compiled and tested with > Avro 1.11 only. Additionally, as a user who mostly relies on Avro 1.8 > currently, I'm not sure how much longer Parquet will continue to support > it. > > I have two proposals to build confidence and clarity around parquet-avro: > > - Codifying in the parquet-avro documentation > < > https://github.com/apache/parquet-java/blob/master/parquet-avro/README.md> > which Avro versions are officially supported and which are > deprecated/explicitly not supported > - Adding some kind of automated testing with all supported Avro > versions. This is a bit tricky because as I mentioned, the generated > SpecificRecord classes use incompatible logical type APIs across Avro > versions, so we'd have to find a way to invoke avro-compiler/load the > Avro > core library for different versions... this would probably require a > multi-module setup. > > I'd love to know what the Parquet community thinks about these ideas. > Additionally, I'm interested to learn more about what Avro versions other > Parquet users rely on. Seems like there's a lot of variance across the data > ecosystem--Spark keeps up-to-date with latest Avro version, Hadoop has Avro > 1.9 pinned, and Apache Beam used to be tightly coupled with 1.8, but has > recently refactored to be version-agnostic. > > Best, > Claire >
