Hello,
I'm testing the KafkaIO with Google Cloud dataflow and getting warnings
when working with compacted logs. In the code there is a relevant check:
https://github.com/apache/beam/blob/master/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java#L1158
// sanity check
if (offset != expected) {
LOG.warn("{}: gap in offsets for {} at {}. {} records missing.",
this, pState.topicPartition, expected, offset - expected);
}
From what I understand, this can happen when log compaction is enabled
because the relevant entry can get cleaned up by Kafka with a newer one.
In this case, shouldn't this be a info log and / or warn only when log
compaction is disabled for the topic?
I'm still debugging some stuff because the pipeline also stops reading
on compacted logs, I'm not sure if this related / could also be an issue
with my Kafka test installation, but as far as I understand the gaps are
expected behaviour with log compaction enabled.
Thanks,
Elmar