Thanks so much for all these pointers, Alexey. Having that context really helps!

Skimming through the past conversations, this one key consideration hasn’t 
changed and seems still critical:
AvroCoder is the de facto standard for encoding complex user types (with 
SchemaCoder still being experimental).
Consequently, any backwards incompatible change in that respect would massively 
impact users.

Having that in mind, here’s my 2 cents on the options mentioned:


  1.  Bump Avro in core:
     *   This will seamlessly work for users that just need a coder for complex 
types, but don’t use Avro themselves.
     *   Anyone using Avro APIs from user code might get into trouble.
  2.  Support different Avro versions & making the dependency provided
     *   In terms of user impact, making Avro provided will be very 
troublesome. Unless Avro remains available on the classpath through different 
ways, this would break things in a really bad way. Worst of all, it might not 
even break at compile time but at runtime only.
     *   I don’t think this option requires Avro being a provided dependency. 
If the default is kept as is, anyone can (safely) bump Avro to a higher 
(supported) version on their classpath. In that case it would be good to warn 
users about CVEs in logs if they remain on an old version of Avro. But I’m not 
sure how feasible (reliable) it is to detect the Avro minor version given that 
things might be repackaged into fat jars.
  3.  Extract Avro as an extension
     *   I agree with Brian, I don’t think there’s a need to keep a 
shaded/vendored Avro in core… but
     *   It’s pretty hard to get this done without causing trouble for users. 
Making this a seamless migration would require core to depend on the new 
extension module and to maintain all the old public user facing APIs.
     *   Duplicating packages between core and the new extension module is 
going to cause issues with Java 11. That pretty much means existing public user 
facing APIs have to remain as a (deprecated) façade in core using the code in 
the new extension.
     *   The above makes the effort almost look worthless in the short term, 
but it would certainly help to have a well-defined scope that can be tested for 
compatibility with multiple different Avro versions.

To finally make progress on this topic, I’d suggest starting off with 
supporting multiple different Avro versions in core (2) but to keep the default 
Avro version as is.
At the same time, extracting Avro as extension in a backwards compatible way 
seems to be the right path forward long term together with a well communicated 
deprecation plan.

Best,
Moritz


From: Brian Hulette <[email protected]>
Date: Thursday, 12. May 2022 at 23:21
To: dev <[email protected]>
Subject: Re: [DISCUSS] Next steps for update of Avro dependency in Beam
Regarding Option (3) "but keep and shade Avro for “core” needs as v.1.8.2 
(still have an issue with CVEs)" Do we actually need to keep avro in core for 
any reason? I thought we only had it in core for AvroCoder, schema support, and
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.
Exercise caution when opening attachments or clicking any links.
ZjQcmQRYFpfptBannerEnd
Regarding Option (3) "but keep and shade Avro for “core” needs as v.1.8.2 
(still have an issue with CVEs)"

Do we actually need to keep avro in core for any reason? I thought we only had 
it in core for AvroCoder, schema support, and IOs, which I think are all 
reasonable to separate out into an extension (this would be comparable to the 
protobuf extension). To confirm I just grepped for files in core that import 
avro:

❯ grep -liIrn 'import org\.apache\.avro' sdks/java/core/src/main
sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/AvroByteBuddyUtils.java
sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/ConvertHelpers.java
sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/AvroUtils.java
sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/AvroRecordSchema.java
sdks/java/core/src/main/java/org/apache/beam/sdk/io/DynamicAvroDestinations.java
sdks/java/core/src/main/java/org/apache/beam/sdk/io/SerializableAvroCodecFactory.java
sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroSink.java
sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroSource.java
sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroSchemaIOProvider.java
sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java
sdks/java/core/src/main/java/org/apache/beam/sdk/io/ConstantAvroDestination.java
sdks/java/core/src/main/java/org/apache/beam/sdk/coders/AvroGenericCoder.java
sdks/java/core/src/main/java/org/apache/beam/sdk/coders/AvroCoder.java

Brian

On Thu, May 12, 2022 at 2:08 PM Robert Bradshaw 
<[email protected]<mailto:[email protected]>> wrote:
Keeping avro in our public (core) API and as an internal dependency
seems to be a recurring pain point, I would be all for pulling it out
(basically option 3) and subsequently updating our internal version
(hopefully no backwards compatibility issues here) and letting the
extension live with a variety of versions (insofar as this is
feasible).

On Thu, May 12, 2022 at 10:29 AM Alexey Romanenko
<[email protected]<mailto:[email protected]>> wrote:
>
> Hi everyone,
>
> Sorry in advance for a long email.
> TL;DR: Let’s discuss the next steps to update Avro dependency in Beam.
>
> I’d like to come back to this old and quite sensitive topic here which is 
> Apache Avro version update in Beam. Along the time, we already had several 
> discussions on this (for example [1]) but without any concrete resolutions in 
> the end, iirc.
>
> As we all know, Beam still depends on quite old Avro version 1.8.2 and there 
> were some attempts to bump it to more recent ones. One of the main reasons to 
> bump an Avro version, imho, is that Avro 1.8.2 dependency brings several CVEs 
> [2], but the latest Avro 1.11.0 brings only one [3]
>
> In the same time, this update with introduce some incompatible changes that 
> Avro has between versions and this may affect Beam users and potentially it 
> may affect transitive dependencies while using Beam with other project that 
> use Avro as well:
> - Avro completely moved to java.time.* instead of org.joda.time.*. So, we 
> need to adjust date/time conversions from/to Beam schema accordingly since 
> Beam schema still uses joda.time. It will require users to regenerate already 
> generated Java code with avro-compiler (if any) otherwise it won’t compile;
> - Some minor changes in Avro dependencies and user API;
> - Something else?
>
> I know that here, on the list, we have people from Avro community that are 
> much more experienced in this than me - so, please correct me if I say 
> something wrong or not 100% correct.
>
>
> In Beam, we also performed several attempts to update Avro - for example, 
> [4], [5], [6] and others.
>
> To make such update easier in the future, we also discussed to move Avro 
> dependency out of core Beam [7] and there were an attempt to do that [8] by 
> finally this PR was closed with a resolution that it’s not actually needed 
> and we may just want to test Beam with different Avro versions [9]
>
> The latest work on this was a PR to support several versions of Avro in Beam 
> (1.8.x and 1.9.x) [10] which still introduces some breaking changes for 
> users, iirc.
>
> So, seems that we are a bit stuck on this topic, though, imho, we need to 
> decide how move forward mostly because of CVEs in old Avro versions and 
> future Avro updates in Beam.
>
> The potential options (as I can see them):
>
> 1) Bump Avro dependency to the latest one (1.11.0) or the possible more 
> recent one
> - Pros:
> - latest/recent Avro dependency;
> - potentially easy to update in the future;
> - Cons:
> - breaking change for users;
> - potentially issues with other projects that use Avro (like Apache Spark 
> e.g.).
>
> 2) Support different Avro versions in Beam, make Avro dependency provided
> - Pros:
> - user decides which versions to use;
> - easy to update in the future;
> - Cons:
> - breaking change for users;
> - not fact that it’s possible to implement in reality;
> - more tests to test Beam with different Avro versions
>
> 3) Extract Avro as an extension, like we do for other formats, and update to 
> latest Avro version, but keep and shade Avro for “core” needs as v.1.8.2 
> (still have an issue with CVEs)
>
> 4) Anything else?
>
>
> Please, share your thoughts on this and correct me if I stated something 
> wrong. The goal of this discussion is finally to move forward with Avro 
> update topic.
>
> —
> Alexey
>
>
> [1] 
> https://lists.apache.org/thread/bkwrbqg2nwp1xq1j57xt3kvmy93vpj9r<https://urldefense.com/v3/__https:/lists.apache.org/thread/bkwrbqg2nwp1xq1j57xt3kvmy93vpj9r__;!!CiXD_PY!U48m_hnNQcvhj_OqIeZdFQ8kew4_4WjRiSJ6WV9W6WwisJ1F48Slu8Uabj2DXWI3yku9gw9kMOM1iA$>
> [2] 
> https://mvnrepository.com/artifact/org.apache.avro/avro/1.8.2<https://urldefense.com/v3/__https:/mvnrepository.com/artifact/org.apache.avro/avro/1.8.2__;!!CiXD_PY!U48m_hnNQcvhj_OqIeZdFQ8kew4_4WjRiSJ6WV9W6WwisJ1F48Slu8Uabj2DXWI3yku9gw-JQRaI8A$>
> [3] 
> https://mvnrepository.com/artifact/org.apache.avro/avro/1.11.0<https://urldefense.com/v3/__https:/mvnrepository.com/artifact/org.apache.avro/avro/1.11.0__;!!CiXD_PY!U48m_hnNQcvhj_OqIeZdFQ8kew4_4WjRiSJ6WV9W6WwisJ1F48Slu8Uabj2DXWI3yku9gw-P2VwtLA$>
> [4] 
> https://github.com/apache/beam/pull/9779<https://urldefense.com/v3/__https:/github.com/apache/beam/pull/9779__;!!CiXD_PY!U48m_hnNQcvhj_OqIeZdFQ8kew4_4WjRiSJ6WV9W6WwisJ1F48Slu8Uabj2DXWI3yku9gw-C6txcQA$>
> [5] 
> https://github.com/apache/beam/pull/17372<https://urldefense.com/v3/__https:/github.com/apache/beam/pull/17372__;!!CiXD_PY!U48m_hnNQcvhj_OqIeZdFQ8kew4_4WjRiSJ6WV9W6WwisJ1F48Slu8Uabj2DXWI3yku9gw8mE0kFWg$>
> [6] 
> https://github.com/apache/beam/pull/17246<https://urldefense.com/v3/__https:/github.com/apache/beam/pull/17246__;!!CiXD_PY!U48m_hnNQcvhj_OqIeZdFQ8kew4_4WjRiSJ6WV9W6WwisJ1F48Slu8Uabj2DXWI3yku9gw-6l8e4lQ$>
> [7] 
> https://lists.apache.org/thread/fw4w6xgm05nl5cg502co97pt6cygt4on<https://urldefense.com/v3/__https:/lists.apache.org/thread/fw4w6xgm05nl5cg502co97pt6cygt4on__;!!CiXD_PY!U48m_hnNQcvhj_OqIeZdFQ8kew4_4WjRiSJ6WV9W6WwisJ1F48Slu8Uabj2DXWI3yku9gw9iPVZxRg$>
> [8] 
> https://github.com/apache/beam/pull/12748<https://urldefense.com/v3/__https:/github.com/apache/beam/pull/12748__;!!CiXD_PY!U48m_hnNQcvhj_OqIeZdFQ8kew4_4WjRiSJ6WV9W6WwisJ1F48Slu8Uabj2DXWI3yku9gw9YWZV1hQ$>
> [9] 
> https://lists.apache.org/thread/y76wjqprm8dyfxxfwcqbzxtht2qkrgzg<https://urldefense.com/v3/__https:/lists.apache.org/thread/y76wjqprm8dyfxxfwcqbzxtht2qkrgzg__;!!CiXD_PY!U48m_hnNQcvhj_OqIeZdFQ8kew4_4WjRiSJ6WV9W6WwisJ1F48Slu8Uabj2DXWI3yku9gw9T8IrnEg$>
> [10] 
> https://github.com/apache/beam/pull/16271<https://urldefense.com/v3/__https:/github.com/apache/beam/pull/16271__;!!CiXD_PY!U48m_hnNQcvhj_OqIeZdFQ8kew4_4WjRiSJ6WV9W6WwisJ1F48Slu8Uabj2DXWI3yku9gw81M4oMhQ$>
>
>
>
>
>
>

As a recipient of an email from Talend, your contact personal data will be on 
our systems. Please see our privacy notice. <https://www.talend.com/privacy/>


Reply via email to