Hey Claire,

Thanks for raising this.

1.8.x -> 1.9.x is the most problematic upgrade because it breaks some
public APIs. We had Jackson objects in the public API, and those broke when
we switched from codehaus to fasterxml. Ideally, I would love to drop 1.8
Avro support (May 2017), but if it is still widely used, then we can check
what it takes to bring back support.

For the testing, I was hoping that we could leverage a profile, similar to
what we do with Hadoop 2
<https://github.com/apache/parquet-java/blob/312a15f53a011d1dc4863df196c0169bdf6db629/pom.xml#L638-L643>
.

Both proposals are great, and happy to help!

Kind regards,
Fokko



Op di 27 aug 2024 om 10:04 schreef Gábor Szádovszky <[email protected]>:

> Hi Claire,
>
> Thanks for bringing this up.
>
> Since Avro has incompatibilities between these releases (which is natural
> since the second number of the Avro versions is considered to be the major
> one), we only can state compatibility with one if we actually test with it.
> So, I would vote on your second proposal or even both.
>
> Which Avro version do you think we shall support? (I think we need to
> support the latest one, and all the major ones below we think are
> required.)
>
> I am not sure if we need separate modules to be actually released for the
> different Avro versions or this is only required for testing. For the first
> case, it'll be quite obvious which Avro version we support since it'll be
> part of the package naming.
>
> If you want to invest efforts in this, I am happy to help with reviewing.
>
> Cheers,
> Gabor
>
> Claire McGinty <[email protected]> ezt írta (időpont: 2024. aug.
> 26., H, 19:01):
>
> > Hi all,
> >
> > I wanted to start a thread discussing Avro cross-version support in
> > parquet-java. The parquet-avro module has been on Avro 1.11 since the
> 1.13
> > release, but since then we've made fixes and added feature support for
> Avro
> > 1.8 APIs (ex1 <https://github.com/apache/parquet-java/pull/2957>, ex2
> > <https://github.com/apache/parquet-java/pull/2993>).
> >
> > Mostly the Avro APIs referenced by parquet-avro are
> > cross-version-compatible, with a few exceptions:
> >
> >    - Evolution of Schema constructor APIs
> >    - New logical types (i.e., local timestamp and UUID)
> >    - Renamed logical type conversion helpers
> >    - Generated code for datetime types using Java Time vs Joda Time for
> >    setters/getters
> >
> > Some of these are hard to catch when Parquet is compiled and tested with
> > Avro 1.11 only. Additionally, as a user who mostly relies on Avro 1.8
> > currently, I'm not sure how much longer Parquet will continue to support
> > it.
> >
> > I have two proposals to build confidence and clarity around parquet-avro:
> >
> >    - Codifying in the parquet-avro documentation
> >    <
> >
> https://github.com/apache/parquet-java/blob/master/parquet-avro/README.md>
> >    which Avro versions are officially supported and which are
> >    deprecated/explicitly not supported
> >    - Adding some kind of automated testing with all supported Avro
> >    versions. This is a bit tricky because as I mentioned, the generated
> >    SpecificRecord classes use incompatible logical type APIs across Avro
> >    versions, so we'd have to find a way to invoke avro-compiler/load the
> > Avro
> >    core library for different versions... this would probably require a
> >    multi-module setup.
> >
> > I'd love to know what the Parquet community thinks about these ideas.
> > Additionally, I'm interested to learn more about what Avro versions other
> > Parquet users rely on. Seems like there's a lot of variance across the
> data
> > ecosystem--Spark keeps up-to-date with latest Avro version, Hadoop has
> Avro
> > 1.9 pinned, and Apache Beam used to be tightly coupled with 1.8, but has
> > recently refactored to be version-agnostic.
> >
> > Best,
> > Claire
> >
>

Reply via email to