[ 
https://issues.apache.org/jira/browse/SPARK-42485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17690944#comment-17690944
 ] 

Dongjoon Hyun commented on SPARK-42485:
---------------------------------------

Thank you for the removal of Target Version. The remaining stuff among my 
comments is to change `Affect Versions` to 3.5.0 from 3.2.2.

> SPIP: Shutting down spark structured streaming when the streaming process 
> completed current process
> ---------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-42485
>                 URL: https://issues.apache.org/jira/browse/SPARK-42485
>             Project: Spark
>          Issue Type: New Feature
>          Components: Structured Streaming
>    Affects Versions: 3.3.2
>            Reporter: Mich Talebzadeh
>            Priority: Major
>              Labels: SPIP
>
> Spark Structured Streaming is a very useful tool in dealing with Event Driven 
> Architecture. In an Event Driven Architecture, there is generally a main loop 
> that listens for events and then triggers a call-back function when one of 
> those events is detected. In a streaming application the application waits to 
> receive the source messages in a set interval or whenever they happen and 
> reacts accordingly.
> There are occasions that you may want to stop the Spark program gracefully. 
> Gracefully meaning that Spark application handles the last streaming message 
> completely and terminates the application. This is different from invoking 
> interrupts such as CTRL-C.
> Of course one can terminate the process based on the following
>  # query.awaitTermination() # Waits for the termination of this query, with 
> stop() or with error
>  # query.awaitTermination(timeoutMs) # Returns true if this query is 
> terminated within the timeout in milliseconds.
> So the first one above waits until an interrupt signal is received. The 
> second one will count the timeout and will exit when timeout in milliseconds 
> is reached.
> The issue is that one needs to predict how long the streaming job needs to 
> run. Clearly any interrupt at the terminal or OS level (kill process), may 
> end up the processing terminated without a proper completion of the streaming 
> process.
> I have devised a method that allows one to terminate the spark application 
> internally after processing the last received message. Within say 2 seconds 
> of the confirmation of shutdown, the process will invoke a graceful shutdown.
> {color:#000000}This new feature proposes a solution to handle the topic doing 
> work for the message being processed gracefully, wait for it to complete and 
> shutdown the streaming process for a given topic without loss of data or 
> orphaned transactions{color}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to