What if we introduced a core-lite package without avro? (We could take
inventory and see if there are other dependencies we could/should make
optional as well.) The existing core module would remain the same, but it
would provide a way for users to use other avros with Beam.

On Fri, Sep 11, 2020 at 10:28 AM Ismaël Mejía <[email protected]> wrote:

> Getting Avro out of core is a good idea in principle but it has
> consequences for users.
>
> The cleanest solution implies moving the packages as done in the PR
> but this will essentially break every existing user, so we should
> measure the impact on users and agree if it is worth to break SDK core
> backwards compatibility (or wait until Beam 3 to do this). Users tend
> to be really frustrated with such kind of breakage in particular if
> they don’t see a concrete benefit [1]. I thought for a moment that a
> solution could be to have the same packages in the extension but that
> won’t work because we will end up with broken packages/modules and
> that will get us issues with Java 11.
>
> There are two other ‘issues’ about the change:
>
> We MUST guarantee that such upgrade does not break users of the Spark
> runner. Spark leaks Avro 1.8.x by default on the recent versions 2.4.x
> / 3.x.x so Beam’s code that uses a different version of Avro should at
> least be source compatible between Avro versions or otherwise it will
> break users in that runner. In concrete terms this means that we
> should stay by default in Avro 1.8.2 and have the other modules to
> only provide the upgraded versions of the Avro dependencies, and it
> will be up to the users to provide the compatible versions they want
> to use.
>
> And for the concrete case of Confluent Schema Registry in KafkaIO this
> will imply that we need to find a way to make the Avro dependency
> aligned between core and KafkaIO otherwise users can have issues
> because of not available classes/methods in particular if users’s code
> depend on an unaligned version of Avro.
>
> I have to say I have mixed feelings. I am essentially pro removal from
> core because it should not have ever been there in the first place,
> but I am afraid of the impact vs the potential gains.
>
> [1]
> https://medium.com/@steve.yegge/dear-google-cloud-your-deprecation-policy-is-killing-you-ee7525dc05dc
>
> On Fri, Sep 11, 2020 at 7:05 PM Kenneth Knowles <[email protected]> wrote:
> >
> > Top-post: I'm generally in favor of moving Avro out of core specifically
> because it is something where different users (and dep chains) want
> different versions. The pain caused by having it in core has come up a lot
> to me. I don't think backwards-compatibility absolutism helps our users in
> this case. I do think gradual migration to ease pain is important.
> >
> > On Fri, Sep 11, 2020 at 9:30 AM Robert Bradshaw <[email protected]>
> wrote:
> >>
> >> On Thu, Sep 10, 2020 at 2:48 PM Brian Hulette <[email protected]>
> wrote:
> >>>
> >>>
> >>> On Tue, Sep 8, 2020 at 9:18 AM Robert Bradshaw <[email protected]>
> wrote:
> >>>>
> >>>> IIRC Dataflow (and perhaps others) implicitly depend on Avro to write
> >>>> out intermediate files (e.g. for non-shuffle Fusion breaks). Would
> >>>> this break if we just removed it?
> >>>
> >>>
> >>> I think Dataflow would just need to declare a dependency on the new
> extension.
> >>
> >>
> >> I'm not sure this would solve the underlying problem (it just pushes it
> onto users and makes it more obscure). Maybe my reasoning is incorrect, but
> from what I see
> >>
> >> * Many Beam modules (e.g. dataflow, spark, file-based-io, sql, kafka,
> parquet, ...) depend on Avro.
> >> * Using Avro 1.9 with the above modules doesn't work.
> >
> >
> > I suggest taking these on case-by-case.
> >
> >  - Dataflow: implementation detail, probably not a major problem (we can
> just upgrade the pre-portability worker while for portability it is a
> non-issue)
> >  - Spark: probably need to use whatever version of Avro works for each
> version of Spark (portability mitigates)
> >  - SQL: happy to upgrade lib version, just needs to be able to read the
> data, Avro version not user-facing
> >  - IOs: I'm guessing that we have a diamond dep getting resolved by
> clobbering. A quick glance seems like Parquet is on avro 1.10.0, Kafka's
> Avro serde is a separate thing distributed by Confluent with Avro version
> obfuscated by use of parent poms and properties, but their examples use
> Avro 1.9.1.
> >
> >> Doesn't this mean that, even if we remove avro from Beam core, a user
> that uses Beam + Avro 1.9 will have issues with any of the above (fairly
> fundamental) modules?
> >>
> >>>  We could mitigate this by first adding the new extension module and
> deprecating the core Beam counterpart for a release (or multiple releases).
> >>
> >>
> >> +1 to Reuven's concerns here.
> >
> >
> > Agree we should add the module and release it for at least one release,
> probably a few because users tend to hop a few releases. We have some
> precedent for breaking changes with the Python/Flink version dropping after
> asking users on user@ and polling on Twitter, etc.
> >
> > Kenn
>

Reply via email to