sjvanrossum commented on code in PR #34201:
URL: https://github.com/apache/beam/pull/34201#discussion_r2095512482
##########
sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaUnboundedReader.java:
##########
@@ -173,19 +172,6 @@ public boolean advance() throws IOException {
elementsReadBySplit.inc();
ConsumerRecord<byte[], byte[]> rawRecord = pState.recordIter.next();
- long expected = pState.nextOffset;
- long offset = rawRecord.offset();
-
- if (offset < expected) { // -- (a)
- // this can happen when compression is enabled in Kafka (seems to be
fixed in 0.10)
Review Comment:
If I recall correctly, the Beam module Gradle plugin forces dependency
versions, but the dependency scope for Kafka clients is set to `provided` so it
doesn't apply for the `beam-sdks-java-io-kafka` artifact and the project build
file tests against these versions of the Kafka client library:
```
def kafkaVersions = [
'201': "2.0.1",
'231': "2.3.1",
'241': "2.4.1",
'251': "2.5.1",
'282': "2.8.2",
'312': "3.1.2",
'390': "3.9.0",
]
```
I don't think that's how we should test backwards compatibility since
there's built-in support for version ranges in both POM and Gradle metadata. In
any case, we're miles ahead of the version mentioned in this comment and I've
gone back to check on 0.11.0+ and found that the component responsible for
tracking this (from the fetch collector to the poll loop) hasn't changed much
during that time. I cross-referenced a few other Kafka source integrations
(Flink, Spark, Venice) and only Spark (pinned to client version 0.10.x) still
has this type of safeguard in place
([source](https://github.com/apache/spark/blob/v3.5.5/connector/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumer.scala#L131)).
We can also deprecate all those older client library versions at some point
since every release <4.0.0 still supports communication with brokers >=0.8.0.
Client library versions >=4.0.0 bump that up to broker versions >=2.1.0 so if
there's reason to build against multiple client library versions it should
simply be 3.x.y and 4.x.y going forward. :)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]