> The concern here is that Avro 1.9 is not backwards compatible with Avro 1.8, 
> so would the future world would not be a simple "bring your own avro" but 
> might require separate dataflow-with-avro-1.8 and dataflow-with-avro-1.9 
> targets which certainly isn't scalable. (Or am I mistaken here? Maybe we 
> could solve this with vending?)

Thinking a bit about it looks similar to what I mentioned with Spark
runner save that we cannot control those targets so for that reason I
talked about source code compatibility.
Avro is really hard to shade correctly because of the way the code
generation works, otherwise it could have been the best solution.

On Fri, Sep 11, 2020 at 7:28 PM Robert Bradshaw <rober...@google.com> wrote:
>
> On Fri, Sep 11, 2020 at 10:05 AM Kenneth Knowles <k...@apache.org> wrote:
>>
>> Top-post: I'm generally in favor of moving Avro out of core specifically 
>> because it is something where different users (and dep chains) want 
>> different versions. The pain caused by having it in core has come up a lot 
>> to me. I don't think backwards-compatibility absolutism helps our users in 
>> this case. I do think gradual migration to ease pain is important.
>
>
> Agree. Backwards compatibility is not the absolute goal; whatever is best for 
> existing and new users is what we should go for. That being said, this whole 
> issue is caused by one of our dependencies not being backwards compatible 
> itself...
>
>>
>> On Fri, Sep 11, 2020 at 9:30 AM Robert Bradshaw <rober...@google.com> wrote:
>>>
>>> On Thu, Sep 10, 2020 at 2:48 PM Brian Hulette <bhule...@google.com> wrote:
>>>>
>>>>
>>>> On Tue, Sep 8, 2020 at 9:18 AM Robert Bradshaw <rober...@google.com> wrote:
>>>>>
>>>>> IIRC Dataflow (and perhaps others) implicitly depend on Avro to write
>>>>> out intermediate files (e.g. for non-shuffle Fusion breaks). Would
>>>>> this break if we just removed it?
>>>>
>>>>
>>>> I think Dataflow would just need to declare a dependency on the new 
>>>> extension.
>>>
>>>
>>> I'm not sure this would solve the underlying problem (it just pushes it 
>>> onto users and makes it more obscure). Maybe my reasoning is incorrect, but 
>>> from what I see
>>>
>>> * Many Beam modules (e.g. dataflow, spark, file-based-io, sql, kafka, 
>>> parquet, ...) depend on Avro.
>>> * Using Avro 1.9 with the above modules doesn't work.
>>
>>
>> I suggest taking these on case-by-case.
>>
>>  - Dataflow: implementation detail, probably not a major problem (we can 
>> just upgrade the pre-portability worker while for portability it is a 
>> non-issue)
>>  - Spark: probably need to use whatever version of Avro works for each 
>> version of Spark (portability mitigates)
>>  - SQL: happy to upgrade lib version, just needs to be able to read the 
>> data, Avro version not user-facing
>>  - IOs: I'm guessing that we have a diamond dep getting resolved by 
>> clobbering. A quick glance seems like Parquet is on avro 1.10.0, Kafka's 
>> Avro serde is a separate thing distributed by Confluent with Avro version 
>> obfuscated by use of parent poms and properties, but their examples use Avro 
>> 1.9.1.
>
>
> The concern here is that Avro 1.9 is not backwards compatible with Avro 1.8, 
> so would the future world would not be a simple "bring your own avro" but 
> might require separate dataflow-with-avro-1.8 and dataflow-with-avro-1.9 
> targets which certainly isn't scalable. (Or am I mistaken here? Maybe we 
> could solve this with vending?)
>
>>> Doesn't this mean that, even if we remove avro from Beam core, a user that 
>>> uses Beam + Avro 1.9 will have issues with any of the above (fairly 
>>> fundamental) modules?
>>>
>>>>  We could mitigate this by first adding the new extension module and 
>>>> deprecating the core Beam counterpart for a release (or multiple releases).
>>>
>>>
>>> +1 to Reuven's concerns here.
>>
>>
>> Agree we should add the module and release it for at least one release, 
>> probably a few because users tend to hop a few releases. We have some 
>> precedent for breaking changes with the Python/Flink version dropping after 
>> asking users on user@ and polling on Twitter, etc.
>>
>> Kenn

Reply via email to