[jira] [Updated] (SPARK-34187) Use available offset range obtained during polling when checking offset validation
[ https://issues.apache.org/jira/browse/SPARK-34187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-34187: - Fix Version/s: (was: 3.1.2) > Use available offset range obtained during polling when checking offset > validation > -- > > Key: SPARK-34187 > URL: https://issues.apache.org/jira/browse/SPARK-34187 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.4.7, 3.0.1, 3.1.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > Labels: correctness > Fix For: 2.4.8, 3.0.2, 3.1.1 > > > We support non-consecutive offsets for Kafka since 2.4.0. In `fetchRecord`, > we do offset validation by checking if the offset is in available offset > range. But currently we obtain latest available offset range to do the check. > It looks not correct as the available offset range could be changed during > the batch, so the available offset range is different than the one when we > polling the records from Kafka. > It is possible that an offset is valid when polling, but at the time we do > the above check, it is out of latest available offset range. We will wrongly > consider it as data loss case and fail the query or drop the record. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34187) Use available offset range obtained during polling when checking offset validation
[ https://issues.apache.org/jira/browse/SPARK-34187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-34187: - Fix Version/s: 3.1.1 > Use available offset range obtained during polling when checking offset > validation > -- > > Key: SPARK-34187 > URL: https://issues.apache.org/jira/browse/SPARK-34187 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.4.7, 3.0.1, 3.1.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > Labels: correctness > Fix For: 2.4.8, 3.0.2, 3.1.1, 3.1.2 > > > We support non-consecutive offsets for Kafka since 2.4.0. In `fetchRecord`, > we do offset validation by checking if the offset is in available offset > range. But currently we obtain latest available offset range to do the check. > It looks not correct as the available offset range could be changed during > the batch, so the available offset range is different than the one when we > polling the records from Kafka. > It is possible that an offset is valid when polling, but at the time we do > the above check, it is out of latest available offset range. We will wrongly > consider it as data loss case and fail the query or drop the record. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34187) Use available offset range obtained during polling when checking offset validation
[ https://issues.apache.org/jira/browse/SPARK-34187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-34187: -- Fix Version/s: 2.4.8 > Use available offset range obtained during polling when checking offset > validation > -- > > Key: SPARK-34187 > URL: https://issues.apache.org/jira/browse/SPARK-34187 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.4.7, 3.0.1, 3.1.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > Labels: correctness > Fix For: 2.4.8, 3.0.2, 3.1.2 > > > We support non-consecutive offsets for Kafka since 2.4.0. In `fetchRecord`, > we do offset validation by checking if the offset is in available offset > range. But currently we obtain latest available offset range to do the check. > It looks not correct as the available offset range could be changed during > the batch, so the available offset range is different than the one when we > polling the records from Kafka. > It is possible that an offset is valid when polling, but at the time we do > the above check, it is out of latest available offset range. We will wrongly > consider it as data loss case and fail the query or drop the record. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34187) Use available offset range obtained during polling when checking offset validation
[ https://issues.apache.org/jira/browse/SPARK-34187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-34187: -- Fix Version/s: (was: 3.1.1) 3.1.2 3.0.2 > Use available offset range obtained during polling when checking offset > validation > -- > > Key: SPARK-34187 > URL: https://issues.apache.org/jira/browse/SPARK-34187 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.4.7, 3.0.1, 3.1.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > Labels: correctness > Fix For: 3.0.2, 3.1.2 > > > We support non-consecutive offsets for Kafka since 2.4.0. In `fetchRecord`, > we do offset validation by checking if the offset is in available offset > range. But currently we obtain latest available offset range to do the check. > It looks not correct as the available offset range could be changed during > the batch, so the available offset range is different than the one when we > polling the records from Kafka. > It is possible that an offset is valid when polling, but at the time we do > the above check, it is out of latest available offset range. We will wrongly > consider it as data loss case and fail the query or drop the record. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34187) Use available offset range obtained during polling when checking offset validation
[ https://issues.apache.org/jira/browse/SPARK-34187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-34187: -- Fix Version/s: (was: 3.2.0) > Use available offset range obtained during polling when checking offset > validation > -- > > Key: SPARK-34187 > URL: https://issues.apache.org/jira/browse/SPARK-34187 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.4.7, 3.0.1, 3.1.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > Labels: correctness > Fix For: 3.1.1 > > > We support non-consecutive offsets for Kafka since 2.4.0. In `fetchRecord`, > we do offset validation by checking if the offset is in available offset > range. But currently we obtain latest available offset range to do the check. > It looks not correct as the available offset range could be changed during > the batch, so the available offset range is different than the one when we > polling the records from Kafka. > It is possible that an offset is valid when polling, but at the time we do > the above check, it is out of latest available offset range. We will wrongly > consider it as data loss case and fail the query or drop the record. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34187) Use available offset range obtained during polling when checking offset validation
[ https://issues.apache.org/jira/browse/SPARK-34187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-34187: -- Affects Version/s: 3.2.0 > Use available offset range obtained during polling when checking offset > validation > -- > > Key: SPARK-34187 > URL: https://issues.apache.org/jira/browse/SPARK-34187 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.4.7, 3.0.1, 3.1.0, 3.2.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > Labels: correctness > Fix For: 3.2.0, 3.1.1 > > > We support non-consecutive offsets for Kafka since 2.4.0. In `fetchRecord`, > we do offset validation by checking if the offset is in available offset > range. But currently we obtain latest available offset range to do the check. > It looks not correct as the available offset range could be changed during > the batch, so the available offset range is different than the one when we > polling the records from Kafka. > It is possible that an offset is valid when polling, but at the time we do > the above check, it is out of latest available offset range. We will wrongly > consider it as data loss case and fail the query or drop the record. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34187) Use available offset range obtained during polling when checking offset validation
[ https://issues.apache.org/jira/browse/SPARK-34187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-34187: -- Affects Version/s: (was: 3.2.0) > Use available offset range obtained during polling when checking offset > validation > -- > > Key: SPARK-34187 > URL: https://issues.apache.org/jira/browse/SPARK-34187 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.4.7, 3.0.1, 3.1.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > Labels: correctness > Fix For: 3.2.0, 3.1.1 > > > We support non-consecutive offsets for Kafka since 2.4.0. In `fetchRecord`, > we do offset validation by checking if the offset is in available offset > range. But currently we obtain latest available offset range to do the check. > It looks not correct as the available offset range could be changed during > the batch, so the available offset range is different than the one when we > polling the records from Kafka. > It is possible that an offset is valid when polling, but at the time we do > the above check, it is out of latest available offset range. We will wrongly > consider it as data loss case and fail the query or drop the record. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34187) Use available offset range obtained during polling when checking offset validation
[ https://issues.apache.org/jira/browse/SPARK-34187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] L. C. Hsieh updated SPARK-34187: Labels: correctness (was: ) > Use available offset range obtained during polling when checking offset > validation > -- > > Key: SPARK-34187 > URL: https://issues.apache.org/jira/browse/SPARK-34187 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.4.7, 3.0.1, 3.1.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > Labels: correctness > > We support non-consecutive offsets for Kafka since 2.4.0. In `fetchRecord`, > we do offset validation by checking if the offset is in available offset > range. But currently we obtain latest available offset range to do the check. > It looks not correct as the available offset range could be changed during > the batch, so the available offset range is different than the one when we > polling the records from Kafka. > It is possible that an offset is valid when polling, but at the time we do > the above check, it is out of latest available offset range. We will wrongly > consider it as data loss case and fail the query or drop the record. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34187) Use available offset range obtained during polling when checking offset validation
[ https://issues.apache.org/jira/browse/SPARK-34187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] L. C. Hsieh updated SPARK-34187: Summary: Use available offset range obtained during polling when checking offset validation (was: Use available offset range obtainted during polling when checking offset validation) > Use available offset range obtained during polling when checking offset > validation > -- > > Key: SPARK-34187 > URL: https://issues.apache.org/jira/browse/SPARK-34187 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.4.7, 3.0.1, 3.1.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > > We support non-consecutive offsets for Kafka since 2.4.0. In `fetchRecord`, > we do offset validation by checking if the offset is in available offset > range. But currently we obtain latest available offset range to do the check. > It looks not correct as the available offset range could be changed during > the batch, so the available offset range is different than the one when we > polling the records from Kafka. > It is possible that an offset is valid when polling, but at the time we do > the above check, it is out of latest available offset range. We will wrongly > consider it as data loss case and fail the query or drop the record. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org