Hi Elmar, You are right. We should not log this at all when the gaps are expected as you pointed out. I don't think client can check if compaction is enabled for a topic through Consumer api.
I think we should remove the log. The user can't really act on it other than reporting it. I will send a PR. As a temporary work around you can disable logging for a particular class on the worker with --workerLogLevelOverrides <https://cloud.google.com/dataflow/pipelines/logging> option. But this this would suppress rest of the logging the reader. Raghu On Wed, Jun 28, 2017 at 4:12 AM, Elmar Weber <i...@elmarweber.org> wrote: > Hello, > > I'm testing the KafkaIO with Google Cloud dataflow and getting warnings > when working with compacted logs. In the code there is a relevant check: > > > https://github.com/apache/beam/blob/master/sdks/java/io/kafk > a/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java#L1158 > > // sanity check > if (offset != expected) { > LOG.warn("{}: gap in offsets for {} at {}. {} records missing.", > this, pState.topicPartition, expected, offset - expected); > } > > From what I understand, this can happen when log compaction is enabled > because the relevant entry can get cleaned up by Kafka with a newer one. > In this case, shouldn't this be a info log and / or warn only when log > compaction is disabled for the topic? > > > I'm still debugging some stuff because the pipeline also stops reading on > compacted logs, I'm not sure if this related / could also be an issue with > my Kafka test installation, but as far as I understand the gaps are > expected behaviour with log compaction enabled. > > Thanks, > Elmar > >