Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/22042#discussion_r212033844 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchReader.scala --- @@ -337,6 +338,7 @@ private[kafka010] case class KafkaMicroBatchInputPartitionReader( val record = consumer.get(nextOffset, rangeToRead.untilOffset, pollTimeoutMs, failOnDataLoss) if (record != null) { nextRow = converter.toUnsafeRow(record) + nextOffset = record.offset + 1 --- End diff -- We should update `nextOffset` to `record.offset + 1` rather that `nextOffset + 1`. Otherwise, it may return duplicated records when `failOnDataLoss` is `false`. I will submit another PR to push this fix to 2.3 as it's a correctness issue. In addition, we should change `nextOffset` in the `next` method as the `get` method is designed to be called multiple times.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org