[ https://issues.apache.org/jira/browse/SPARK-34187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon updated SPARK-34187: --------------------------------- Fix Version/s: 3.1.1 > Use available offset range obtained during polling when checking offset > validation > ---------------------------------------------------------------------------------- > > Key: SPARK-34187 > URL: https://issues.apache.org/jira/browse/SPARK-34187 > Project: Spark > Issue Type: Bug > Components: Structured Streaming > Affects Versions: 2.4.7, 3.0.1, 3.1.0 > Reporter: L. C. Hsieh > Assignee: L. C. Hsieh > Priority: Major > Labels: correctness > Fix For: 2.4.8, 3.0.2, 3.1.1, 3.1.2 > > > We support non-consecutive offsets for Kafka since 2.4.0. In `fetchRecord`, > we do offset validation by checking if the offset is in available offset > range. But currently we obtain latest available offset range to do the check. > It looks not correct as the available offset range could be changed during > the batch, so the available offset range is different than the one when we > polling the records from Kafka. > It is possible that an offset is valid when polling, but at the time we do > the above check, it is out of latest available offset range. We will wrongly > consider it as data loss case and fail the query or drop the record. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org