[ https://issues.apache.org/jira/browse/SPARK-34187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-34187: ------------------------------------ Assignee: Apache Spark (was: L. C. Hsieh) > Use available offset range obtained during polling when checking offset > validation > ---------------------------------------------------------------------------------- > > Key: SPARK-34187 > URL: https://issues.apache.org/jira/browse/SPARK-34187 > Project: Spark > Issue Type: Bug > Components: Structured Streaming > Affects Versions: 2.4.7, 3.0.1, 3.1.0 > Reporter: L. C. Hsieh > Assignee: Apache Spark > Priority: Major > > We support non-consecutive offsets for Kafka since 2.4.0. In `fetchRecord`, > we do offset validation by checking if the offset is in available offset > range. But currently we obtain latest available offset range to do the check. > It looks not correct as the available offset range could be changed during > the batch, so the available offset range is different than the one when we > polling the records from Kafka. > It is possible that an offset is valid when polling, but at the time we do > the above check, it is out of latest available offset range. We will wrongly > consider it as data loss case and fail the query or drop the record. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org