[ https://issues.apache.org/jira/browse/SPARK-41387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jungtaek Lim resolved SPARK-41387. ---------------------------------- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 38911 [https://github.com/apache/spark/pull/38911] > Add assertion on end offset range for Kafka data source with > Trigger.AvailableNow > --------------------------------------------------------------------------------- > > Key: SPARK-41387 > URL: https://issues.apache.org/jira/browse/SPARK-41387 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming > Affects Versions: 3.4.0 > Reporter: Jungtaek Lim > Assignee: Jungtaek Lim > Priority: Minor > Fix For: 3.4.0 > > > Although there are lots of benefits Trigger.AvailableNow provides, we figure > out one caveat of Trigger.AvailableNow, very sensitive on the offset range. > Trigger.AvailableNow stops the query when the start offset and end offset are > being same, producing no data from data source. Given the semantic of > Trigger.AvailableNow, the implementation of data source is expected to > retrieve the final offset at the start of the query, and gradually increase > the offset range to eventually reach the final offset. > Any bug breaking this leads to infinity run of the query, hence all data > source implementations supporting Trigger.AvailableNow are encouraged to have > some assertion to prevent such case in prior. > For built-in data sources, only Kafka data source is something supporting > Trigger.AvailableNow but don't have some assertion on the offset range. We'd > like to add some assertion against Kafka data source, for > Trigger.AvailableNow. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org