HeartSaVioR opened a new pull request, #42940:
URL: https://github.com/apache/spark/pull/42940

   ### What changes were proposed in this pull request?
   
   This PR proposes to change the behavior when user runs streaming query with 
Trigger.AvailableNow, which query has any source which does not support 
Trigger.AvailableNow. Instead of using wrapper implementation, this PR proposes 
to fall back to execute a single batch (a.k.a Trigger.Once). 
   
   This PR introduces a new flag 
`spark.sql.streaming.triggerAvailableNowWrapper.enabled` to retain the behavior 
for advanced and extreme users. The flag is marked as internal since it's 
really only for extreme users who are concerned about behavioral change.
   
   Minor details would be following:
   
   * This PR does not use Trigger.Once, hence users won't see deprecation 
warning for Trigger.Once. 
   * This PR will provide a warning log to inform the source(s) which doesn't 
support Trigger.AvailableNow, so that users can indicate which source(s) is/are 
preventing them to enjoy benefits of Trigger.AvailableNow.
   
   ### Why are the changes needed?
   
   We have observed a data duplication issue with 3rd party data source when 
it's used with Trigger.AvailableNow. The source didn't support 
Trigger.AvailableNow, and unfortunately is also not played well with wrapper 
implementation.
   
   We care more about possible correctness issue than better coverage of 
Trigger.AvailableNow, hence want to stop using wrapper implementation by 
default. We also care about not breaking existing query, so fallback to single 
batch execution rather than failing the query.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, this introduces a behavioral change for streaming query with 
Trigger.AvailableNow which contains any source not supporting 
Trigger.AvailableNow.
   
   ### How was this patch tested?
   
   Modified UT.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to