Yuchen Liu created SPARK-53560:
----------------------------------
Summary: Crash looping when retrying uncommitted batch in Kafka
source and AvailableNow trigger
Key: SPARK-53560
URL: https://issues.apache.org/jira/browse/SPARK-53560
Project: Spark
Issue Type: Bug
Components: Structured Streaming
Affects Versions: 3.4.0
Reporter: Yuchen Liu
There is a crash loop when using Kafka source with AvailableNow. It will be
triggered deterministically when the query fails / terminates in the middle of
the run, and the user changes the Kafka topic partition, then restarts the
query. This is because the topic partitions in uncommitted batch and the latest
topic partitions aren't the same, but trigger.AvailableNow requires it anyway.
This PR solves the issue by not fetching the latest topic partitions in
function `prepareForTriggerAvailableNow`. Instead, it fetches in
`latestOffset`, because Spark will not call `latestOffset` when retrying the
uncommitted batch.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]