[jira] [Created] (SPARK-42485) SPIP: Shutting down spark structured streaming when the streaming process completed current process

2023-02-18 Thread Mich Talebzadeh (Jira)
Mich Talebzadeh created SPARK-42485:
---

 Summary: SPIP: Shutting down spark structured streaming when the 
streaming process completed current process
 Key: SPARK-42485
 URL: https://issues.apache.org/jira/browse/SPARK-42485
 Project: Spark
  Issue Type: New Feature
  Components: Structured Streaming
Affects Versions: 3.3.2
Reporter: Mich Talebzadeh


{color:#00}How to shutdown the topic doing work for the message being 
processed, wait for it to complete and shutdown the streaming process for a 
given topic.{color}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42485) SPIP: Shutting down spark structured streaming when the streaming process completed current process

2023-02-18 Thread Mich Talebzadeh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mich Talebzadeh updated SPARK-42485:

Shepherd: Dongjoon Hyun
Target Version/s: 3.3.2
  Labels: SPIP  (was: )

> SPIP: Shutting down spark structured streaming when the streaming process 
> completed current process
> ---
>
> Key: SPARK-42485
> URL: https://issues.apache.org/jira/browse/SPARK-42485
> Project: Spark
>  Issue Type: New Feature
>  Components: Structured Streaming
>Affects Versions: 3.3.2
>Reporter: Mich Talebzadeh
>Priority: Major
>  Labels: SPIP
>
> {color:#00}How to shutdown the topic doing work for the message being 
> processed, wait for it to complete and shutdown the streaming process for a 
> given topic.{color}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42486) Upgrade ZooKeeper from 3.6.3 to 3.6.4

2023-02-18 Thread Jira
Bjørn Jørgensen created SPARK-42486:
---

 Summary: Upgrade ZooKeeper from 3.6.3 to 3.6.4
 Key: SPARK-42486
 URL: https://issues.apache.org/jira/browse/SPARK-42486
 Project: Spark
  Issue Type: Dependency upgrade
  Components: Build
Affects Versions: 3.5.0
Reporter: Bjørn Jørgensen


[ZooKeeper 3.6.3 is EoL since 30th December, 
2022|https://zookeeper.apache.org/releases.html]


[Release notes|https://zookeeper.apache.org/doc/r3.6.4/releasenotes.html]




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42486) Upgrade ZooKeeper from 3.6.3 to 3.6.4

2023-02-18 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42486:


Assignee: (was: Apache Spark)

> Upgrade ZooKeeper from 3.6.3 to 3.6.4
> -
>
> Key: SPARK-42486
> URL: https://issues.apache.org/jira/browse/SPARK-42486
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: Bjørn Jørgensen
>Priority: Major
>
> [ZooKeeper 3.6.3 is EoL since 30th December, 
> 2022|https://zookeeper.apache.org/releases.html]
> [Release notes|https://zookeeper.apache.org/doc/r3.6.4/releasenotes.html]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42486) Upgrade ZooKeeper from 3.6.3 to 3.6.4

2023-02-18 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17690800#comment-17690800
 ] 

Apache Spark commented on SPARK-42486:
--

User 'bjornjorgensen' has created a pull request for this issue:
https://github.com/apache/spark/pull/40079

> Upgrade ZooKeeper from 3.6.3 to 3.6.4
> -
>
> Key: SPARK-42486
> URL: https://issues.apache.org/jira/browse/SPARK-42486
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: Bjørn Jørgensen
>Priority: Major
>
> [ZooKeeper 3.6.3 is EoL since 30th December, 
> 2022|https://zookeeper.apache.org/releases.html]
> [Release notes|https://zookeeper.apache.org/doc/r3.6.4/releasenotes.html]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42486) Upgrade ZooKeeper from 3.6.3 to 3.6.4

2023-02-18 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42486:


Assignee: Apache Spark

> Upgrade ZooKeeper from 3.6.3 to 3.6.4
> -
>
> Key: SPARK-42486
> URL: https://issues.apache.org/jira/browse/SPARK-42486
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: Bjørn Jørgensen
>Assignee: Apache Spark
>Priority: Major
>
> [ZooKeeper 3.6.3 is EoL since 30th December, 
> 2022|https://zookeeper.apache.org/releases.html]
> [Release notes|https://zookeeper.apache.org/doc/r3.6.4/releasenotes.html]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42486) Upgrade ZooKeeper from 3.6.3 to 3.6.4

2023-02-18 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17690801#comment-17690801
 ] 

Apache Spark commented on SPARK-42486:
--

User 'bjornjorgensen' has created a pull request for this issue:
https://github.com/apache/spark/pull/40079

> Upgrade ZooKeeper from 3.6.3 to 3.6.4
> -
>
> Key: SPARK-42486
> URL: https://issues.apache.org/jira/browse/SPARK-42486
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: Bjørn Jørgensen
>Priority: Major
>
> [ZooKeeper 3.6.3 is EoL since 30th December, 
> 2022|https://zookeeper.apache.org/releases.html]
> [Release notes|https://zookeeper.apache.org/doc/r3.6.4/releasenotes.html]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42486) Upgrade ZooKeeper from 3.6.3 to 3.6.4

2023-02-18 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-42486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bjørn Jørgensen updated SPARK-42486:

Description: 
[ZooKeeper 3.6 is EoL since 30th December, 
2022|https://zookeeper.apache.org/releases.html]


[Release notes|https://zookeeper.apache.org/doc/r3.6.4/releasenotes.html]


  was:
[ZooKeeper 3.6.3 is EoL since 30th December, 
2022|https://zookeeper.apache.org/releases.html]


[Release notes|https://zookeeper.apache.org/doc/r3.6.4/releasenotes.html]



> Upgrade ZooKeeper from 3.6.3 to 3.6.4
> -
>
> Key: SPARK-42486
> URL: https://issues.apache.org/jira/browse/SPARK-42486
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: Bjørn Jørgensen
>Priority: Major
>
> [ZooKeeper 3.6 is EoL since 30th December, 
> 2022|https://zookeeper.apache.org/releases.html]
> [Release notes|https://zookeeper.apache.org/doc/r3.6.4/releasenotes.html]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42485) SPIP: Shutting down spark structured streaming when the streaming process completed current process

2023-02-18 Thread Mich Talebzadeh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mich Talebzadeh updated SPARK-42485:

Description: 
Spark Structured Streaming AKA SSS is a very useful tool in dealing with Event 
Driven Architecture. In an Event Driven Architecture, there is generally a main 
loop that listens for events and then triggers a call-back function when one of 
those events is detected. In a streaming application the application waits to 
receive the source messages in a set interval or whenever they happen and 
reacts accordingly.

There are occasions that you may want to stop the Spark program gracefully. 
Gracefully meaning that Spark application handles the last streaming message 
completely and terminates the application. This is different from invoking 
interrupts such as CTRL-C.

Of course one can terminate the process based on the following
 # query.awaitTermination() # Waits for the termination of this query, with 
stop() or with error

 # query.awaitTermination(timeoutMs) # Returns true if this query is terminated 
within the timeout in milliseconds.

So the first one above waits until an interrupt signal is received. The second 
one will count the timeout and will exit when timeout in milliseconds is 
reached.

The issue is that one needs to predict how long the streaming job needs to run. 
Clearly any interrupt at the terminal or OS level (kill process), may end up 
the processing terminated without a proper completion of the streaming process.

 

I have devised a method that allows one to terminate the spark application 
internally after processing the last received message. Within say 2 seconds of 
the confirmation of shutdown, the process will invoke a graceful shutdown.

{color:#00}This new feature proposes a solution to handle the topic doing 
work for the message being processed gracefully, wait for it to complete and 
shutdown the streaming process for a given topic without loss of data or 
orphaned transactions{color}

  was:{color:#00}How to shutdown the topic doing work for the message being 
processed, wait for it to complete and shutdown the streaming process for a 
given topic.{color}


> SPIP: Shutting down spark structured streaming when the streaming process 
> completed current process
> ---
>
> Key: SPARK-42485
> URL: https://issues.apache.org/jira/browse/SPARK-42485
> Project: Spark
>  Issue Type: New Feature
>  Components: Structured Streaming
>Affects Versions: 3.3.2
>Reporter: Mich Talebzadeh
>Priority: Major
>  Labels: SPIP
>
> Spark Structured Streaming AKA SSS is a very useful tool in dealing with 
> Event Driven Architecture. In an Event Driven Architecture, there is 
> generally a main loop that listens for events and then triggers a call-back 
> function when one of those events is detected. In a streaming application the 
> application waits to receive the source messages in a set interval or 
> whenever they happen and reacts accordingly.
> There are occasions that you may want to stop the Spark program gracefully. 
> Gracefully meaning that Spark application handles the last streaming message 
> completely and terminates the application. This is different from invoking 
> interrupts such as CTRL-C.
> Of course one can terminate the process based on the following
>  # query.awaitTermination() # Waits for the termination of this query, with 
> stop() or with error
>  # query.awaitTermination(timeoutMs) # Returns true if this query is 
> terminated within the timeout in milliseconds.
> So the first one above waits until an interrupt signal is received. The 
> second one will count the timeout and will exit when timeout in milliseconds 
> is reached.
> The issue is that one needs to predict how long the streaming job needs to 
> run. Clearly any interrupt at the terminal or OS level (kill process), may 
> end up the processing terminated without a proper completion of the streaming 
> process.
>  
> I have devised a method that allows one to terminate the spark application 
> internally after processing the last received message. Within say 2 seconds 
> of the confirmation of shutdown, the process will invoke a graceful shutdown.
> {color:#00}This new feature proposes a solution to handle the topic doing 
> work for the message being processed gracefully, wait for it to complete and 
> shutdown the streaming process for a given topic without loss of data or 
> orphaned transactions{color}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42485) SPIP: Shutting down spark structured streaming when the streaming process completed current process

2023-02-18 Thread Mich Talebzadeh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mich Talebzadeh updated SPARK-42485:

Description: 
Spark Structured Streaming AKA SSS is a very useful tool in dealing with Event 
Driven Architecture. In an Event Driven Architecture, there is generally a main 
loop that listens for events and then triggers a call-back function when one of 
those events is detected. In a streaming application the application waits to 
receive the source messages in a set interval or whenever they happen and 
reacts accordingly.

There are occasions that you may want to stop the Spark program gracefully. 
Gracefully meaning that Spark application handles the last streaming message 
completely and terminates the application. This is different from invoking 
interrupts such as CTRL-C.

Of course one can terminate the process based on the following
 # query.awaitTermination() # Waits for the termination of this query, with 
stop() or with error

 # query.awaitTermination(timeoutMs) # Returns true if this query is terminated 
within the timeout in milliseconds.

So the first one above waits until an interrupt signal is received. The second 
one will count the timeout and will exit when timeout in milliseconds is 
reached.

The issue is that one needs to predict how long the streaming job needs to run. 
Clearly any interrupt at the terminal or OS level (kill process), may end up 
the processing terminated without a proper completion of the streaming process.

I have devised a method that allows one to terminate the spark application 
internally after processing the last received message. Within say 2 seconds of 
the confirmation of shutdown, the process will invoke a graceful shutdown.

{color:#00}This new feature proposes a solution to handle the topic doing 
work for the message being processed gracefully, wait for it to complete and 
shutdown the streaming process for a given topic without loss of data or 
orphaned transactions{color}

  was:
Spark Structured Streaming AKA SSS is a very useful tool in dealing with Event 
Driven Architecture. In an Event Driven Architecture, there is generally a main 
loop that listens for events and then triggers a call-back function when one of 
those events is detected. In a streaming application the application waits to 
receive the source messages in a set interval or whenever they happen and 
reacts accordingly.

There are occasions that you may want to stop the Spark program gracefully. 
Gracefully meaning that Spark application handles the last streaming message 
completely and terminates the application. This is different from invoking 
interrupts such as CTRL-C.

Of course one can terminate the process based on the following
 # query.awaitTermination() # Waits for the termination of this query, with 
stop() or with error

 # query.awaitTermination(timeoutMs) # Returns true if this query is terminated 
within the timeout in milliseconds.

So the first one above waits until an interrupt signal is received. The second 
one will count the timeout and will exit when timeout in milliseconds is 
reached.

The issue is that one needs to predict how long the streaming job needs to run. 
Clearly any interrupt at the terminal or OS level (kill process), may end up 
the processing terminated without a proper completion of the streaming process.

 

I have devised a method that allows one to terminate the spark application 
internally after processing the last received message. Within say 2 seconds of 
the confirmation of shutdown, the process will invoke a graceful shutdown.

{color:#00}This new feature proposes a solution to handle the topic doing 
work for the message being processed gracefully, wait for it to complete and 
shutdown the streaming process for a given topic without loss of data or 
orphaned transactions{color}


> SPIP: Shutting down spark structured streaming when the streaming process 
> completed current process
> ---
>
> Key: SPARK-42485
> URL: https://issues.apache.org/jira/browse/SPARK-42485
> Project: Spark
>  Issue Type: New Feature
>  Components: Structured Streaming
>Affects Versions: 3.3.2
>Reporter: Mich Talebzadeh
>Priority: Major
>  Labels: SPIP
>
> Spark Structured Streaming AKA SSS is a very useful tool in dealing with 
> Event Driven Architecture. In an Event Driven Architecture, there is 
> generally a main loop that listens for events and then triggers a call-back 
> function when one of those events is detected. In a streaming application the 
> application waits to receive the source messages in a set interval or 
> whenever they happen and reacts accordingly.
> There are occasions that you may want to stop the Spark program gracefully. 
>

[jira] [Updated] (SPARK-42485) SPIP: Shutting down spark structured streaming when the streaming process completed current process

2023-02-18 Thread Mich Talebzadeh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mich Talebzadeh updated SPARK-42485:

Description: 
Spark Structured Streaming is a very useful tool in dealing with Event Driven 
Architecture. In an Event Driven Architecture, there is generally a main loop 
that listens for events and then triggers a call-back function when one of 
those events is detected. In a streaming application the application waits to 
receive the source messages in a set interval or whenever they happen and 
reacts accordingly.

There are occasions that you may want to stop the Spark program gracefully. 
Gracefully meaning that Spark application handles the last streaming message 
completely and terminates the application. This is different from invoking 
interrupts such as CTRL-C.

Of course one can terminate the process based on the following
 # query.awaitTermination() # Waits for the termination of this query, with 
stop() or with error

 # query.awaitTermination(timeoutMs) # Returns true if this query is terminated 
within the timeout in milliseconds.

So the first one above waits until an interrupt signal is received. The second 
one will count the timeout and will exit when timeout in milliseconds is 
reached.

The issue is that one needs to predict how long the streaming job needs to run. 
Clearly any interrupt at the terminal or OS level (kill process), may end up 
the processing terminated without a proper completion of the streaming process.

I have devised a method that allows one to terminate the spark application 
internally after processing the last received message. Within say 2 seconds of 
the confirmation of shutdown, the process will invoke a graceful shutdown.

{color:#00}This new feature proposes a solution to handle the topic doing 
work for the message being processed gracefully, wait for it to complete and 
shutdown the streaming process for a given topic without loss of data or 
orphaned transactions{color}

  was:
Spark Structured Streaming AKA SSS is a very useful tool in dealing with Event 
Driven Architecture. In an Event Driven Architecture, there is generally a main 
loop that listens for events and then triggers a call-back function when one of 
those events is detected. In a streaming application the application waits to 
receive the source messages in a set interval or whenever they happen and 
reacts accordingly.

There are occasions that you may want to stop the Spark program gracefully. 
Gracefully meaning that Spark application handles the last streaming message 
completely and terminates the application. This is different from invoking 
interrupts such as CTRL-C.

Of course one can terminate the process based on the following
 # query.awaitTermination() # Waits for the termination of this query, with 
stop() or with error

 # query.awaitTermination(timeoutMs) # Returns true if this query is terminated 
within the timeout in milliseconds.

So the first one above waits until an interrupt signal is received. The second 
one will count the timeout and will exit when timeout in milliseconds is 
reached.

The issue is that one needs to predict how long the streaming job needs to run. 
Clearly any interrupt at the terminal or OS level (kill process), may end up 
the processing terminated without a proper completion of the streaming process.

I have devised a method that allows one to terminate the spark application 
internally after processing the last received message. Within say 2 seconds of 
the confirmation of shutdown, the process will invoke a graceful shutdown.

{color:#00}This new feature proposes a solution to handle the topic doing 
work for the message being processed gracefully, wait for it to complete and 
shutdown the streaming process for a given topic without loss of data or 
orphaned transactions{color}


> SPIP: Shutting down spark structured streaming when the streaming process 
> completed current process
> ---
>
> Key: SPARK-42485
> URL: https://issues.apache.org/jira/browse/SPARK-42485
> Project: Spark
>  Issue Type: New Feature
>  Components: Structured Streaming
>Affects Versions: 3.3.2
>Reporter: Mich Talebzadeh
>Priority: Major
>  Labels: SPIP
>
> Spark Structured Streaming is a very useful tool in dealing with Event Driven 
> Architecture. In an Event Driven Architecture, there is generally a main loop 
> that listens for events and then triggers a call-back function when one of 
> those events is detected. In a streaming application the application waits to 
> receive the source messages in a set interval or whenever they happen and 
> reacts accordingly.
> There are occasions that you may want to stop the Spark program gracefully. 
> Gracefully meaning

[jira] [Updated] (SPARK-42485) SPIP: Shutting down spark structured streaming when the streaming process completed current process

2023-02-18 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-42485:
--
Shepherd:   (was: Dongjoon Hyun)

> SPIP: Shutting down spark structured streaming when the streaming process 
> completed current process
> ---
>
> Key: SPARK-42485
> URL: https://issues.apache.org/jira/browse/SPARK-42485
> Project: Spark
>  Issue Type: New Feature
>  Components: Structured Streaming
>Affects Versions: 3.3.2
>Reporter: Mich Talebzadeh
>Priority: Major
>  Labels: SPIP
>
> Spark Structured Streaming is a very useful tool in dealing with Event Driven 
> Architecture. In an Event Driven Architecture, there is generally a main loop 
> that listens for events and then triggers a call-back function when one of 
> those events is detected. In a streaming application the application waits to 
> receive the source messages in a set interval or whenever they happen and 
> reacts accordingly.
> There are occasions that you may want to stop the Spark program gracefully. 
> Gracefully meaning that Spark application handles the last streaming message 
> completely and terminates the application. This is different from invoking 
> interrupts such as CTRL-C.
> Of course one can terminate the process based on the following
>  # query.awaitTermination() # Waits for the termination of this query, with 
> stop() or with error
>  # query.awaitTermination(timeoutMs) # Returns true if this query is 
> terminated within the timeout in milliseconds.
> So the first one above waits until an interrupt signal is received. The 
> second one will count the timeout and will exit when timeout in milliseconds 
> is reached.
> The issue is that one needs to predict how long the streaming job needs to 
> run. Clearly any interrupt at the terminal or OS level (kill process), may 
> end up the processing terminated without a proper completion of the streaming 
> process.
> I have devised a method that allows one to terminate the spark application 
> internally after processing the last received message. Within say 2 seconds 
> of the confirmation of shutdown, the process will invoke a graceful shutdown.
> {color:#00}This new feature proposes a solution to handle the topic doing 
> work for the message being processed gracefully, wait for it to complete and 
> shutdown the streaming process for a given topic without loss of data or 
> orphaned transactions{color}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42485) SPIP: Shutting down spark structured streaming when the streaming process completed current process

2023-02-18 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17690828#comment-17690828
 ] 

Dongjoon Hyun commented on SPARK-42485:
---

[~mich.talebza...@gmail.com] Here is a few advice for you.

1. I removed me from Shaphard field. Apache Spark community is a voluntarily 
2. You should remove `Target Version` according to the community policy.
  - https://spark.apache.org/contributing.html
{quote}Do not set the following fields:
- Fix Version. This is assigned by committers only when resolved.
- Target Version. This is assigned by committers to indicate a PR has been 
accepted for possible fix by the target version.{quote}

3. `New Feature` should follow the version of master branch (3.5.0) as of today 
because it cannot affect the release branches (branch-3.4/3.3/..) because 
Apache Spark community policy doesn't allow a feature or improvement 
backporting.

> SPIP: Shutting down spark structured streaming when the streaming process 
> completed current process
> ---
>
> Key: SPARK-42485
> URL: https://issues.apache.org/jira/browse/SPARK-42485
> Project: Spark
>  Issue Type: New Feature
>  Components: Structured Streaming
>Affects Versions: 3.3.2
>Reporter: Mich Talebzadeh
>Priority: Major
>  Labels: SPIP
>
> Spark Structured Streaming is a very useful tool in dealing with Event Driven 
> Architecture. In an Event Driven Architecture, there is generally a main loop 
> that listens for events and then triggers a call-back function when one of 
> those events is detected. In a streaming application the application waits to 
> receive the source messages in a set interval or whenever they happen and 
> reacts accordingly.
> There are occasions that you may want to stop the Spark program gracefully. 
> Gracefully meaning that Spark application handles the last streaming message 
> completely and terminates the application. This is different from invoking 
> interrupts such as CTRL-C.
> Of course one can terminate the process based on the following
>  # query.awaitTermination() # Waits for the termination of this query, with 
> stop() or with error
>  # query.awaitTermination(timeoutMs) # Returns true if this query is 
> terminated within the timeout in milliseconds.
> So the first one above waits until an interrupt signal is received. The 
> second one will count the timeout and will exit when timeout in milliseconds 
> is reached.
> The issue is that one needs to predict how long the streaming job needs to 
> run. Clearly any interrupt at the terminal or OS level (kill process), may 
> end up the processing terminated without a proper completion of the streaming 
> process.
> I have devised a method that allows one to terminate the spark application 
> internally after processing the last received message. Within say 2 seconds 
> of the confirmation of shutdown, the process will invoke a graceful shutdown.
> {color:#00}This new feature proposes a solution to handle the topic doing 
> work for the message being processed gracefully, wait for it to complete and 
> shutdown the streaming process for a given topic without loss of data or 
> orphaned transactions{color}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-42485) SPIP: Shutting down spark structured streaming when the streaming process completed current process

2023-02-18 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17690828#comment-17690828
 ] 

Dongjoon Hyun edited comment on SPARK-42485 at 2/19/23 1:08 AM:


[~mich.talebza...@gmail.com] Here is a few advice for you.

1. I removed me from Shaphard field.
2. You should remove `Target Version` according to the community policy.
 - [https://spark.apache.org/contributing.html]
{quote}Do not set the following fields:
 - Fix Version. This is assigned by committers only when resolved.
 - Target Version. This is assigned by committers to indicate a PR has been 
accepted for possible fix by the target version.{quote}

3. `New Feature` should follow the version of master branch (3.5.0) as of today 
because it cannot affect the release branches (branch-3.4/3.3/..) because 
Apache Spark community policy doesn't allow a feature or improvement 
backporting.


was (Author: dongjoon):
[~mich.talebza...@gmail.com] Here is a few advice for you.

1. I removed me from Shaphard field. Apache Spark community is a voluntarily 
2. You should remove `Target Version` according to the community policy.
  - https://spark.apache.org/contributing.html
{quote}Do not set the following fields:
- Fix Version. This is assigned by committers only when resolved.
- Target Version. This is assigned by committers to indicate a PR has been 
accepted for possible fix by the target version.{quote}

3. `New Feature` should follow the version of master branch (3.5.0) as of today 
because it cannot affect the release branches (branch-3.4/3.3/..) because 
Apache Spark community policy doesn't allow a feature or improvement 
backporting.

> SPIP: Shutting down spark structured streaming when the streaming process 
> completed current process
> ---
>
> Key: SPARK-42485
> URL: https://issues.apache.org/jira/browse/SPARK-42485
> Project: Spark
>  Issue Type: New Feature
>  Components: Structured Streaming
>Affects Versions: 3.3.2
>Reporter: Mich Talebzadeh
>Priority: Major
>  Labels: SPIP
>
> Spark Structured Streaming is a very useful tool in dealing with Event Driven 
> Architecture. In an Event Driven Architecture, there is generally a main loop 
> that listens for events and then triggers a call-back function when one of 
> those events is detected. In a streaming application the application waits to 
> receive the source messages in a set interval or whenever they happen and 
> reacts accordingly.
> There are occasions that you may want to stop the Spark program gracefully. 
> Gracefully meaning that Spark application handles the last streaming message 
> completely and terminates the application. This is different from invoking 
> interrupts such as CTRL-C.
> Of course one can terminate the process based on the following
>  # query.awaitTermination() # Waits for the termination of this query, with 
> stop() or with error
>  # query.awaitTermination(timeoutMs) # Returns true if this query is 
> terminated within the timeout in milliseconds.
> So the first one above waits until an interrupt signal is received. The 
> second one will count the timeout and will exit when timeout in milliseconds 
> is reached.
> The issue is that one needs to predict how long the streaming job needs to 
> run. Clearly any interrupt at the terminal or OS level (kill process), may 
> end up the processing terminated without a proper completion of the streaming 
> process.
> I have devised a method that allows one to terminate the spark application 
> internally after processing the last received message. Within say 2 seconds 
> of the confirmation of shutdown, the process will invoke a graceful shutdown.
> {color:#00}This new feature proposes a solution to handle the topic doing 
> work for the message being processed gracefully, wait for it to complete and 
> shutdown the streaming process for a given topic without loss of data or 
> orphaned transactions{color}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42473) An explicit cast will be needed when INSERT OVERWRITE SELECT UNION ALL

2023-02-18 Thread Yuming Wang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17690833#comment-17690833
 ] 

Yuming Wang commented on SPARK-42473:
-

What is your test.spark33_decimal_orc column type?

> An explicit cast will be needed when INSERT OVERWRITE SELECT UNION ALL
> --
>
> Key: SPARK-42473
> URL: https://issues.apache.org/jira/browse/SPARK-42473
> Project: Spark
>  Issue Type: Bug
>  Components: Optimizer
>Affects Versions: 3.3.1
> Environment: spark 3.3.1
>Reporter: kevinshin
>Priority: Major
>
> *when 'union all' and one select statement use* *Literal as column value , 
> the other* *select statement  has computed expression at the same column , 
> then the whole statement will compile failed. A explicit cast will be needed.*
> for example:
> {color:#4c9aff}explain{color}
> {color:#4c9aff}*INSERT* OVERWRITE *TABLE* test.spark33_decimal_orc{color}
> {color:#4c9aff}*select* *null* *as* amt1, {*}cast{*}('256.99' *as* 
> {*}decimal{*}(20,8)) *as* amt2{color}
> {color:#4c9aff}*union* *all*{color}
> {color:#4c9aff}*select* {*}cast{*}('200.99' *as* 
> {*}decimal{*}(20,8)){*}/{*}100 *as* amt1,{*}cast{*}('256.99' *as* 
> {*}decimal{*}(20,8)) *as* amt2;{color}
> *will got error :* 
> org.apache.spark.{*}sql{*}.catalyst.expressions.Literal cannot be *cast* *to* 
> org.apache.spark.{*}sql{*}.catalyst.expressions.AnsiCast
> The SQL will need to change to : 
> {color:#4c9aff}explain{color}
> {color:#4c9aff}*INSERT* OVERWRITE *TABLE* test.spark33_decimal_orc{color}
> {color:#4c9aff}*select* *null* *as* amt1,{*}cast{*}('256.99' *as* 
> {*}decimal{*}(20,8)) *as* amt2{color}
> {color:#4c9aff}*union* *all*{color}
> {color:#4c9aff}*select* {color:#de350b}{*}cast{*}({color}{*}cast{*}('200.99' 
> *as* {*}decimal{*}(20,8)){*}/{*}100 *as* 
> {*}decimal{*}(20,8){color:#de350b}){color} *as* amt1,{*}cast{*}('256.99' *as* 
> {*}decimal{*}(20,8)) *as* amt2;{color}
>  
> *but this is not need in spark3.2.1 , is this a bug for spark3.3.1 ?* 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42406) [PROTOBUF] Recursive field handling is incompatible with delta

2023-02-18 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17690837#comment-17690837
 ] 

Apache Spark commented on SPARK-42406:
--

User 'rangadi' has created a pull request for this issue:
https://github.com/apache/spark/pull/40080

> [PROTOBUF] Recursive field handling is incompatible with delta
> --
>
> Key: SPARK-42406
> URL: https://issues.apache.org/jira/browse/SPARK-42406
> Project: Spark
>  Issue Type: Bug
>  Components: Protobuf
>Affects Versions: 3.4.0
>Reporter: Raghu Angadi
>Assignee: Raghu Angadi
>Priority: Major
> Fix For: 3.4.0
>
>
> Protobuf deserializer (`from_protobuf()` function()) optionally supports 
> recursive fields by limiting the depth to certain level. See example below. 
> It assigns a 'NullType' for such a field when allowed depth is reached. 
> It causes a few issues. E.g. a repeated field as in the following example 
> results in a Array field with 'NullType'. Delta does not support null type in 
> a complex type.
> Actually `Array[NullType]` is not really useful anyway.
> How about this fix: Drop the recursive field when the limit reached rather 
> than using a NullType. 
> The example below makes it clear:
> Consider a recursive Protobuf:
>  
> {code:python}
> message TreeNode {
>   string value = 1;
>   repeated TreeNode children = 2;
> }
> {code}
> Allow depth of 2: 
>  
> {code:python}
>    df.select(
>     'proto',
>      messageName = 'TreeNode',
>      options = { ... "recursive.fields.max.depth" : "2" }
>   ).printSchema()
> {code}
> Schema looks like this:
> {noformat}
> root
> |– from_protobuf(proto): struct (nullable = true)|
> | |– value: string (nullable = true)|
> | |– children: array (nullable = false)|
> | | |– element: struct (containsNull = false)|
> | | | |– value: string (nullable = true)|
> | | | |– children: array (nullable = false)|
> | | | | |– element: struct (containsNull = false)|
> | | | | | |– value: string (nullable = true)|
> | | | | | |– children: array (nullable = false). [ === Proposed fix: Drop 
> this field === ]|
> | | | | | | |– element: void (containsNull = false) [ === NOTICE 'void' HERE 
> === ] 
> {noformat}
> When we try to write this to a delta table, we get an error:
> {noformat}
> AnalysisException: Found nested NullType in column 
> from_protobuf(proto).children which is of ArrayType. Delta doesn't support 
> writing NullType in complex types.
> {noformat}
>  
> We could just drop the field 'element' when recursion depth is reached. It is 
> simpler and does not need to deal with NullType. We are ignoring the value 
> anyway. There is no use in keeping the field.
> Another issue is setting for 'recursive.fields.max.depth': It is not enforced 
> correctly. '0' does not make sense. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42406) [PROTOBUF] Recursive field handling is incompatible with delta

2023-02-18 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42406:


Assignee: Raghu Angadi  (was: Apache Spark)

> [PROTOBUF] Recursive field handling is incompatible with delta
> --
>
> Key: SPARK-42406
> URL: https://issues.apache.org/jira/browse/SPARK-42406
> Project: Spark
>  Issue Type: Bug
>  Components: Protobuf
>Affects Versions: 3.4.0
>Reporter: Raghu Angadi
>Assignee: Raghu Angadi
>Priority: Major
> Fix For: 3.4.0
>
>
> Protobuf deserializer (`from_protobuf()` function()) optionally supports 
> recursive fields by limiting the depth to certain level. See example below. 
> It assigns a 'NullType' for such a field when allowed depth is reached. 
> It causes a few issues. E.g. a repeated field as in the following example 
> results in a Array field with 'NullType'. Delta does not support null type in 
> a complex type.
> Actually `Array[NullType]` is not really useful anyway.
> How about this fix: Drop the recursive field when the limit reached rather 
> than using a NullType. 
> The example below makes it clear:
> Consider a recursive Protobuf:
>  
> {code:python}
> message TreeNode {
>   string value = 1;
>   repeated TreeNode children = 2;
> }
> {code}
> Allow depth of 2: 
>  
> {code:python}
>    df.select(
>     'proto',
>      messageName = 'TreeNode',
>      options = { ... "recursive.fields.max.depth" : "2" }
>   ).printSchema()
> {code}
> Schema looks like this:
> {noformat}
> root
> |– from_protobuf(proto): struct (nullable = true)|
> | |– value: string (nullable = true)|
> | |– children: array (nullable = false)|
> | | |– element: struct (containsNull = false)|
> | | | |– value: string (nullable = true)|
> | | | |– children: array (nullable = false)|
> | | | | |– element: struct (containsNull = false)|
> | | | | | |– value: string (nullable = true)|
> | | | | | |– children: array (nullable = false). [ === Proposed fix: Drop 
> this field === ]|
> | | | | | | |– element: void (containsNull = false) [ === NOTICE 'void' HERE 
> === ] 
> {noformat}
> When we try to write this to a delta table, we get an error:
> {noformat}
> AnalysisException: Found nested NullType in column 
> from_protobuf(proto).children which is of ArrayType. Delta doesn't support 
> writing NullType in complex types.
> {noformat}
>  
> We could just drop the field 'element' when recursion depth is reached. It is 
> simpler and does not need to deal with NullType. We are ignoring the value 
> anyway. There is no use in keeping the field.
> Another issue is setting for 'recursive.fields.max.depth': It is not enforced 
> correctly. '0' does not make sense. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42406) [PROTOBUF] Recursive field handling is incompatible with delta

2023-02-18 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42406:


Assignee: Apache Spark  (was: Raghu Angadi)

> [PROTOBUF] Recursive field handling is incompatible with delta
> --
>
> Key: SPARK-42406
> URL: https://issues.apache.org/jira/browse/SPARK-42406
> Project: Spark
>  Issue Type: Bug
>  Components: Protobuf
>Affects Versions: 3.4.0
>Reporter: Raghu Angadi
>Assignee: Apache Spark
>Priority: Major
> Fix For: 3.4.0
>
>
> Protobuf deserializer (`from_protobuf()` function()) optionally supports 
> recursive fields by limiting the depth to certain level. See example below. 
> It assigns a 'NullType' for such a field when allowed depth is reached. 
> It causes a few issues. E.g. a repeated field as in the following example 
> results in a Array field with 'NullType'. Delta does not support null type in 
> a complex type.
> Actually `Array[NullType]` is not really useful anyway.
> How about this fix: Drop the recursive field when the limit reached rather 
> than using a NullType. 
> The example below makes it clear:
> Consider a recursive Protobuf:
>  
> {code:python}
> message TreeNode {
>   string value = 1;
>   repeated TreeNode children = 2;
> }
> {code}
> Allow depth of 2: 
>  
> {code:python}
>    df.select(
>     'proto',
>      messageName = 'TreeNode',
>      options = { ... "recursive.fields.max.depth" : "2" }
>   ).printSchema()
> {code}
> Schema looks like this:
> {noformat}
> root
> |– from_protobuf(proto): struct (nullable = true)|
> | |– value: string (nullable = true)|
> | |– children: array (nullable = false)|
> | | |– element: struct (containsNull = false)|
> | | | |– value: string (nullable = true)|
> | | | |– children: array (nullable = false)|
> | | | | |– element: struct (containsNull = false)|
> | | | | | |– value: string (nullable = true)|
> | | | | | |– children: array (nullable = false). [ === Proposed fix: Drop 
> this field === ]|
> | | | | | | |– element: void (containsNull = false) [ === NOTICE 'void' HERE 
> === ] 
> {noformat}
> When we try to write this to a delta table, we get an error:
> {noformat}
> AnalysisException: Found nested NullType in column 
> from_protobuf(proto).children which is of ArrayType. Delta doesn't support 
> writing NullType in complex types.
> {noformat}
>  
> We could just drop the field 'element' when recursion depth is reached. It is 
> simpler and does not need to deal with NullType. We are ignoring the value 
> anyway. There is no use in keeping the field.
> Another issue is setting for 'recursive.fields.max.depth': It is not enforced 
> correctly. '0' does not make sense. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42406) [PROTOBUF] Recursive field handling is incompatible with delta

2023-02-18 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17690838#comment-17690838
 ] 

Apache Spark commented on SPARK-42406:
--

User 'rangadi' has created a pull request for this issue:
https://github.com/apache/spark/pull/40080

> [PROTOBUF] Recursive field handling is incompatible with delta
> --
>
> Key: SPARK-42406
> URL: https://issues.apache.org/jira/browse/SPARK-42406
> Project: Spark
>  Issue Type: Bug
>  Components: Protobuf
>Affects Versions: 3.4.0
>Reporter: Raghu Angadi
>Assignee: Raghu Angadi
>Priority: Major
> Fix For: 3.4.0
>
>
> Protobuf deserializer (`from_protobuf()` function()) optionally supports 
> recursive fields by limiting the depth to certain level. See example below. 
> It assigns a 'NullType' for such a field when allowed depth is reached. 
> It causes a few issues. E.g. a repeated field as in the following example 
> results in a Array field with 'NullType'. Delta does not support null type in 
> a complex type.
> Actually `Array[NullType]` is not really useful anyway.
> How about this fix: Drop the recursive field when the limit reached rather 
> than using a NullType. 
> The example below makes it clear:
> Consider a recursive Protobuf:
>  
> {code:python}
> message TreeNode {
>   string value = 1;
>   repeated TreeNode children = 2;
> }
> {code}
> Allow depth of 2: 
>  
> {code:python}
>    df.select(
>     'proto',
>      messageName = 'TreeNode',
>      options = { ... "recursive.fields.max.depth" : "2" }
>   ).printSchema()
> {code}
> Schema looks like this:
> {noformat}
> root
> |– from_protobuf(proto): struct (nullable = true)|
> | |– value: string (nullable = true)|
> | |– children: array (nullable = false)|
> | | |– element: struct (containsNull = false)|
> | | | |– value: string (nullable = true)|
> | | | |– children: array (nullable = false)|
> | | | | |– element: struct (containsNull = false)|
> | | | | | |– value: string (nullable = true)|
> | | | | | |– children: array (nullable = false). [ === Proposed fix: Drop 
> this field === ]|
> | | | | | | |– element: void (containsNull = false) [ === NOTICE 'void' HERE 
> === ] 
> {noformat}
> When we try to write this to a delta table, we get an error:
> {noformat}
> AnalysisException: Found nested NullType in column 
> from_protobuf(proto).children which is of ArrayType. Delta doesn't support 
> writing NullType in complex types.
> {noformat}
>  
> We could just drop the field 'element' when recursion depth is reached. It is 
> simpler and does not need to deal with NullType. We are ignoring the value 
> anyway. There is no use in keeping the field.
> Another issue is setting for 'recursive.fields.max.depth': It is not enforced 
> correctly. '0' does not make sense. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org