Hi dev,

I would like to initiate the discussion about how to deal with the
migration guide on upgrading Kafka to 3.1 (from 2.8.1) in upcoming Spark
3.3.

We didn't care much about the upgrade of Kafka dependency since our belief
on Kafka client has been that the new Kafka client version should have no
compatibility issues with older brokers. Based on semantic versioning,
upgrading major versions rings an alarm for me.

I haven't gone through changes that happened between versions, but found
one KIP (KIP-679
<https://cwiki.apache.org/confluence/display/KAFKA/KIP-679%3A+Producer+will+enable+the+strongest+delivery+guarantee+by+default>)
which may not work with older brokers with specific setup. (It's described
in the "Compatibility, Deprecation, and Migration Plan" section of the KIP).

This may not be problematic for the users who upgrade both client and
broker altogether, but end users of Spark may be unlikely the case.
Computation engines are relatively easier to upgrade. Storage systems
aren't. End users would think the components are independent.

I looked through the notable changes in the Kafka doc, and it does mention
this KIP, but it just says the default config has changed and doesn't
mention about the impacts. There is a link to KIP, that said, everyone
needs to read through the KIP wiki page for details.

Based on the context, what would be the best way to notice end users for
the major version upgrade of Kafka? I can imagine several options
including...

1. Explicitly mention that Spark 3.3 upgrades Kafka to 3.1 with linking the
noticeable changes in the Kafka doc in the migration guide.
2. Do 1 & spend more effort to read through all KIPs and check
"Compatibility, Deprecation, and Migration Plan" section, and enumerate all
KIPs (or even summarize) in the migration guide.
3. Do 2 & actively override the default configs to be compatible with older
versions if the change of the default configs in Kafka 3.0 is backward
incompatible. End users should set these configs explicitly to override
them back.
4. Do not care. End users can indicate the upgrade in the release note, and
we expect end users to actively check the notable changes (& KIPs) from
Kafka doc.
5. Options not described above...

Please take a look and provide your voice on this.

Thanks,
Jungtaek Lim (HeartSaVioR)

ps. Probably this would be applied to all non-bugfix versions of dependency
upgrades. We may still want to be pragmatic, e.g. pass-through for minor
versions, though.

Reply via email to