[ 
https://issues.apache.org/jira/browse/SPARK-41387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim reassigned SPARK-41387:
------------------------------------

    Assignee: Jungtaek Lim

> Add assertion on end offset range for Kafka data source with 
> Trigger.AvailableNow
> ---------------------------------------------------------------------------------
>
>                 Key: SPARK-41387
>                 URL: https://issues.apache.org/jira/browse/SPARK-41387
>             Project: Spark
>          Issue Type: Improvement
>          Components: Structured Streaming
>    Affects Versions: 3.4.0
>            Reporter: Jungtaek Lim
>            Assignee: Jungtaek Lim
>            Priority: Minor
>
> Although there are lots of benefits Trigger.AvailableNow provides, we figure 
> out one caveat of Trigger.AvailableNow, very sensitive on the offset range.
> Trigger.AvailableNow stops the query when the start offset and end offset are 
> being same, producing no data from data source. Given the semantic of 
> Trigger.AvailableNow, the implementation of data source is expected to 
> retrieve the final offset at the start of the query, and gradually increase 
> the offset range to eventually reach the final offset.
> Any bug breaking this leads to infinity run of the query, hence all data 
> source implementations supporting Trigger.AvailableNow are encouraged to have 
> some assertion to prevent such case in prior.
> For built-in data sources, only Kafka data source is something supporting 
> Trigger.AvailableNow but don't have some assertion on the offset range. We'd 
> like to add some assertion against Kafka data source, for 
> Trigger.AvailableNow.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to