Re: [DISCUSS] Migration guide on upgrading Kafka to 3.1 in Spark 3.3

Jungtaek Lim Fri, 18 Mar 2022 06:11:17 -0700

As always, I hope that the direction of the discussion would be focusing on
the topic. Let’s avoid ourselves to be side-tracked. Please consider the
mail thread as full context and feel free to ask me if there is a lack of
information for you to provide a voice.


Thanks for the voice in previous mail btw!

2022년 3월 18일 (금) 오후 9:41, Gabor Somogyi <gabor.g.somo...@gmail.com>님이 작성:

> I've just read the related PR and seems like the situation is not so black
> and white as I've presumed purely from tech point of view...
>
> On Fri, 18 Mar 2022, 12:44 Gabor Somogyi, <gabor.g.somo...@gmail.com>
> wrote:
>
>> Hi Jungtaek,
>>
>> I've taken a deeper look at the issue and here are my findings.
>> As far as I'm concerned there are basically 2 ways with some minor
>> decorations:
>> * We care
>> * We don't care
>>
>> I'm pretty sure users are clever enough but setting the expectation that
>> all users are tracking Kafka KIPs one-by-one would be a hard requirement.
>> This implies that I would vote on the "We care" point, the only question
>> is how.
>>
>> Unless we have a specific reason for point 3 I wouldn't override default
>> configs. The reason behind is simple.
>> Kafka has it's strategic direction and going against it w/o good reason
>> is rarely a good idea (maybe we have such but that would be said out).
>>
>> I think when Kafka version upgrade happens we engineers are having a look
>> whether all the changes in the new version
>> are backward compatible or not so point 2 makes sense to me. Honestly I'm
>> drinking coffee with some of the Kafka devs
>> so I've never ever read through all the KIPs between releases because
>> they've told what's important to check :)
>>
>> Seems like my Kafka Spark compatibility gist is out-of-date so maybe I
>> need to invest some time to resurrect it:
>> https://gist.github.com/gaborgsomogyi/3476c32d69ff2087ed5d7d031653c7a9
>>
>> Hope my thoughts are helpful!
>>
>> BR,
>> G
>>
>>
>> On Fri, Mar 18, 2022 at 11:15 AM Jungtaek Lim <
>> kabhwan.opensou...@gmail.com> wrote:
>>
>>> CORRECTION: in option 2, we enumerate KIPs which may bring
>>> incompatibility with older brokers (not all KIPs).
>>>
>>> On Fri, Mar 18, 2022 at 7:12 PM Jungtaek Lim <
>>> kabhwan.opensou...@gmail.com> wrote:
>>>
>>>> Hi dev,
>>>>
>>>> I would like to initiate the discussion about how to deal with the
>>>> migration guide on upgrading Kafka to 3.1 (from 2.8.1) in upcoming Spark
>>>> 3.3.
>>>>
>>>> We didn't care much about the upgrade of Kafka dependency since our
>>>> belief on Kafka client has been that the new Kafka client version should
>>>> have no compatibility issues with older brokers. Based on semantic
>>>> versioning, upgrading major versions rings an alarm for me.
>>>>
>>>> I haven't gone through changes that happened between versions, but
>>>> found one KIP (KIP-679
>>>> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-679%3A+Producer+will+enable+the+strongest+delivery+guarantee+by+default>)
>>>> which may not work with older brokers with specific setup. (It's described
>>>> in the "Compatibility, Deprecation, and Migration Plan" section of the 
>>>> KIP).
>>>>
>>>> This may not be problematic for the users who upgrade both client and
>>>> broker altogether, but end users of Spark may be unlikely the case.
>>>> Computation engines are relatively easier to upgrade. Storage systems
>>>> aren't. End users would think the components are independent.
>>>>
>>>> I looked through the notable changes in the Kafka doc, and it does
>>>> mention this KIP, but it just says the default config has changed and
>>>> doesn't mention about the impacts. There is a link to
>>>> KIP, that said, everyone needs to read through the KIP wiki page for
>>>> details.
>>>>
>>>> Based on the context, what would be the best way to notice end users
>>>> for the major version upgrade of Kafka? I can imagine several options
>>>> including...
>>>>
>>>> 1. Explicitly mention that Spark 3.3 upgrades Kafka to 3.1 with linking
>>>> the noticeable changes in the Kafka doc in the migration guide.
>>>> 2. Do 1 & spend more effort to read through all KIPs and check
>>>> "Compatibility, Deprecation, and Migration Plan" section, and enumerate all
>>>> KIPs (or even summarize) in the migration guide.
>>>> 3. Do 2 & actively override the default configs to be compatible with
>>>> older versions if the change of the default configs in Kafka 3.0 is
>>>> backward incompatible. End users should set these configs explicitly to
>>>> override them back.
>>>> 4. Do not care. End users can indicate the upgrade in the release note,
>>>> and we expect end users to actively check the notable changes (& KIPs) from
>>>> Kafka doc.
>>>> 5. Options not described above...
>>>>
>>>> Please take a look and provide your voice on this.
>>>>
>>>> Thanks,
>>>> Jungtaek Lim (HeartSaVioR)
>>>>
>>>> ps. Probably this would be applied to all non-bugfix versions of
>>>> dependency upgrades. We may still want to be pragmatic, e.g. pass-through
>>>> for minor versions, though.
>>>>
>>>

Re: [DISCUSS] Migration guide on upgrading Kafka to 3.1 in Spark 3.3

Reply via email to