Hi everyone, PR https://github.com/apache/beam/pull/12748 now passes all the checks, and could potentially be merged (not advocating this, just saying). I've rebased on the latest master as of today. I've also left a comment in the PR with the high level changes for ALL the modules. I encourage all the interested parties to skim through that and raise any concerns they might have.
Also note that while I'm pretty good at refactoring things, Java isn't my strong language. Please keep that in mind as you review the code changes. That being said, my main goal is to get Beam to play nice with the new Confluent Schema libraries that include support for Protobuf and JSON schemas. But the Confluent libs depend on avro 1.9, and Beam is on 1.8. Upgrading Beam to use avro 1.9 has proven difficult (see https://github.com/apache/beam/pull/9779) hence why Avro should be taken out of core. If you have any concerns or any particular tests I should run, please let me know. Thank you! On Fri, Sep 11, 2020 at 5:48 AM Brian Hulette <bhule...@google.com> wrote: > > > On Tue, Sep 8, 2020 at 9:18 AM Robert Bradshaw <rober...@google.com> > wrote: > >> IIRC Dataflow (and perhaps others) implicitly depend on Avro to write >> out intermediate files (e.g. for non-shuffle Fusion breaks). Would >> this break if we just removed it? >> > > I think Dataflow would just need to declare a dependency on the new > extension. > > >> >> On Thu, Sep 3, 2020 at 10:51 PM Reuven Lax <re...@google.com> wrote: >> > >> > As for 2, maybe it's time to remove @Experimental from SchemaCoder? >> > >> > > Probably worth a separate thread about dropping `@Experimental` on > SchemaCoder. I'd be ok with that, the only breaking change I have in mind > is that I think we should deprecate and remove the DATETIME primitive type, > replacing it with a logical type. > > >> > 1 is tricky though. Changes like this have caused a lot of trouble for >> users in the past, and I think some users still have unpleasant memories of >> being told "you just have to change some package names and imports." >> > >> > > We could mitigate this by first adding the new extension module and > deprecating the core Beam counterpart for a release (or multiple releases). > > >> > On Thu, Sep 3, 2020 at 6:18 PM Brian Hulette <bhule...@google.com> >> wrote: >> >> >> >> Hi everyone, >> >> The fact that core Beam has a dependency on Avro has led to a lot of >> headaches when users (or runners) are using a different version. zeidoo [1] >> was generous enough to put up a WIP PR [2] that moves everything that >> depends on Avro (primarily AvroCoder and the Avro SchemaProvider I believe) >> out of core Beam and into a separate extensions module. This way we could >> have multiple extensions for different versions of Avro in the future. >> >> >> >> As I understand it, the downsides to making this change are: >> >> 1) It's a breaking change, users with AvroCoder in their pipeline will >> need to change their build dependencies and import statements. >> >> 2) AvroCoder is the only (non-experimental) coder in core Beam that >> can encode complex user types. So new users will need to dabble with the >> Experimental SchemaCoder or add a second dependency to build a pipeline >> with their own types. >> >> >> >> I think these costs are outweighed by the benefit of removing the >> dependency in core Beam, but I wanted to reach out to the community to see >> if there are any objections. >> >> >> >> Brian >> >> >> >> [1] github.com/zeidoo >> >> [2] https://github.com/apache/beam/pull/12748 >> >