Hey Claire, Thanks for raising this.
1.8.x -> 1.9.x is the most problematic upgrade because it breaks some public APIs. We had Jackson objects in the public API, and those broke when we switched from codehaus to fasterxml. Ideally, I would love to drop 1.8 Avro support (May 2017), but if it is still widely used, then we can check what it takes to bring back support. For the testing, I was hoping that we could leverage a profile, similar to what we do with Hadoop 2 <https://github.com/apache/parquet-java/blob/312a15f53a011d1dc4863df196c0169bdf6db629/pom.xml#L638-L643> . Both proposals are great, and happy to help! Kind regards, Fokko Op di 27 aug 2024 om 10:04 schreef Gábor Szádovszky <[email protected]>: > Hi Claire, > > Thanks for bringing this up. > > Since Avro has incompatibilities between these releases (which is natural > since the second number of the Avro versions is considered to be the major > one), we only can state compatibility with one if we actually test with it. > So, I would vote on your second proposal or even both. > > Which Avro version do you think we shall support? (I think we need to > support the latest one, and all the major ones below we think are > required.) > > I am not sure if we need separate modules to be actually released for the > different Avro versions or this is only required for testing. For the first > case, it'll be quite obvious which Avro version we support since it'll be > part of the package naming. > > If you want to invest efforts in this, I am happy to help with reviewing. > > Cheers, > Gabor > > Claire McGinty <[email protected]> ezt írta (időpont: 2024. aug. > 26., H, 19:01): > > > Hi all, > > > > I wanted to start a thread discussing Avro cross-version support in > > parquet-java. The parquet-avro module has been on Avro 1.11 since the > 1.13 > > release, but since then we've made fixes and added feature support for > Avro > > 1.8 APIs (ex1 <https://github.com/apache/parquet-java/pull/2957>, ex2 > > <https://github.com/apache/parquet-java/pull/2993>). > > > > Mostly the Avro APIs referenced by parquet-avro are > > cross-version-compatible, with a few exceptions: > > > > - Evolution of Schema constructor APIs > > - New logical types (i.e., local timestamp and UUID) > > - Renamed logical type conversion helpers > > - Generated code for datetime types using Java Time vs Joda Time for > > setters/getters > > > > Some of these are hard to catch when Parquet is compiled and tested with > > Avro 1.11 only. Additionally, as a user who mostly relies on Avro 1.8 > > currently, I'm not sure how much longer Parquet will continue to support > > it. > > > > I have two proposals to build confidence and clarity around parquet-avro: > > > > - Codifying in the parquet-avro documentation > > < > > > https://github.com/apache/parquet-java/blob/master/parquet-avro/README.md> > > which Avro versions are officially supported and which are > > deprecated/explicitly not supported > > - Adding some kind of automated testing with all supported Avro > > versions. This is a bit tricky because as I mentioned, the generated > > SpecificRecord classes use incompatible logical type APIs across Avro > > versions, so we'd have to find a way to invoke avro-compiler/load the > > Avro > > core library for different versions... this would probably require a > > multi-module setup. > > > > I'd love to know what the Parquet community thinks about these ideas. > > Additionally, I'm interested to learn more about what Avro versions other > > Parquet users rely on. Seems like there's a lot of variance across the > data > > ecosystem--Spark keeps up-to-date with latest Avro version, Hadoop has > Avro > > 1.9 pinned, and Apache Beam used to be tightly coupled with 1.8, but has > > recently refactored to be version-agnostic. > > > > Best, > > Claire > > >
