[jira] [Created] (SPARK-42485) SPIP: Shutting down spark structured streaming when the streaming process completed current process
Mich Talebzadeh created SPARK-42485: --- Summary: SPIP: Shutting down spark structured streaming when the streaming process completed current process Key: SPARK-42485 URL: https://issues.apache.org/jira/browse/SPARK-42485 Project: Spark Issue Type: New Feature Components: Structured Streaming Affects Versions: 3.3.2 Reporter: Mich Talebzadeh {color:#00}How to shutdown the topic doing work for the message being processed, wait for it to complete and shutdown the streaming process for a given topic.{color} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42485) SPIP: Shutting down spark structured streaming when the streaming process completed current process
[ https://issues.apache.org/jira/browse/SPARK-42485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mich Talebzadeh updated SPARK-42485: Shepherd: Dongjoon Hyun Target Version/s: 3.3.2 Labels: SPIP (was: ) > SPIP: Shutting down spark structured streaming when the streaming process > completed current process > --- > > Key: SPARK-42485 > URL: https://issues.apache.org/jira/browse/SPARK-42485 > Project: Spark > Issue Type: New Feature > Components: Structured Streaming >Affects Versions: 3.3.2 >Reporter: Mich Talebzadeh >Priority: Major > Labels: SPIP > > {color:#00}How to shutdown the topic doing work for the message being > processed, wait for it to complete and shutdown the streaming process for a > given topic.{color} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42486) Upgrade ZooKeeper from 3.6.3 to 3.6.4
Bjørn Jørgensen created SPARK-42486: --- Summary: Upgrade ZooKeeper from 3.6.3 to 3.6.4 Key: SPARK-42486 URL: https://issues.apache.org/jira/browse/SPARK-42486 Project: Spark Issue Type: Dependency upgrade Components: Build Affects Versions: 3.5.0 Reporter: Bjørn Jørgensen [ZooKeeper 3.6.3 is EoL since 30th December, 2022|https://zookeeper.apache.org/releases.html] [Release notes|https://zookeeper.apache.org/doc/r3.6.4/releasenotes.html] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42486) Upgrade ZooKeeper from 3.6.3 to 3.6.4
[ https://issues.apache.org/jira/browse/SPARK-42486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42486: Assignee: (was: Apache Spark) > Upgrade ZooKeeper from 3.6.3 to 3.6.4 > - > > Key: SPARK-42486 > URL: https://issues.apache.org/jira/browse/SPARK-42486 > Project: Spark > Issue Type: Dependency upgrade > Components: Build >Affects Versions: 3.5.0 >Reporter: Bjørn Jørgensen >Priority: Major > > [ZooKeeper 3.6.3 is EoL since 30th December, > 2022|https://zookeeper.apache.org/releases.html] > [Release notes|https://zookeeper.apache.org/doc/r3.6.4/releasenotes.html] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42486) Upgrade ZooKeeper from 3.6.3 to 3.6.4
[ https://issues.apache.org/jira/browse/SPARK-42486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17690800#comment-17690800 ] Apache Spark commented on SPARK-42486: -- User 'bjornjorgensen' has created a pull request for this issue: https://github.com/apache/spark/pull/40079 > Upgrade ZooKeeper from 3.6.3 to 3.6.4 > - > > Key: SPARK-42486 > URL: https://issues.apache.org/jira/browse/SPARK-42486 > Project: Spark > Issue Type: Dependency upgrade > Components: Build >Affects Versions: 3.5.0 >Reporter: Bjørn Jørgensen >Priority: Major > > [ZooKeeper 3.6.3 is EoL since 30th December, > 2022|https://zookeeper.apache.org/releases.html] > [Release notes|https://zookeeper.apache.org/doc/r3.6.4/releasenotes.html] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42486) Upgrade ZooKeeper from 3.6.3 to 3.6.4
[ https://issues.apache.org/jira/browse/SPARK-42486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42486: Assignee: Apache Spark > Upgrade ZooKeeper from 3.6.3 to 3.6.4 > - > > Key: SPARK-42486 > URL: https://issues.apache.org/jira/browse/SPARK-42486 > Project: Spark > Issue Type: Dependency upgrade > Components: Build >Affects Versions: 3.5.0 >Reporter: Bjørn Jørgensen >Assignee: Apache Spark >Priority: Major > > [ZooKeeper 3.6.3 is EoL since 30th December, > 2022|https://zookeeper.apache.org/releases.html] > [Release notes|https://zookeeper.apache.org/doc/r3.6.4/releasenotes.html] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42486) Upgrade ZooKeeper from 3.6.3 to 3.6.4
[ https://issues.apache.org/jira/browse/SPARK-42486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17690801#comment-17690801 ] Apache Spark commented on SPARK-42486: -- User 'bjornjorgensen' has created a pull request for this issue: https://github.com/apache/spark/pull/40079 > Upgrade ZooKeeper from 3.6.3 to 3.6.4 > - > > Key: SPARK-42486 > URL: https://issues.apache.org/jira/browse/SPARK-42486 > Project: Spark > Issue Type: Dependency upgrade > Components: Build >Affects Versions: 3.5.0 >Reporter: Bjørn Jørgensen >Priority: Major > > [ZooKeeper 3.6.3 is EoL since 30th December, > 2022|https://zookeeper.apache.org/releases.html] > [Release notes|https://zookeeper.apache.org/doc/r3.6.4/releasenotes.html] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42486) Upgrade ZooKeeper from 3.6.3 to 3.6.4
[ https://issues.apache.org/jira/browse/SPARK-42486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bjørn Jørgensen updated SPARK-42486: Description: [ZooKeeper 3.6 is EoL since 30th December, 2022|https://zookeeper.apache.org/releases.html] [Release notes|https://zookeeper.apache.org/doc/r3.6.4/releasenotes.html] was: [ZooKeeper 3.6.3 is EoL since 30th December, 2022|https://zookeeper.apache.org/releases.html] [Release notes|https://zookeeper.apache.org/doc/r3.6.4/releasenotes.html] > Upgrade ZooKeeper from 3.6.3 to 3.6.4 > - > > Key: SPARK-42486 > URL: https://issues.apache.org/jira/browse/SPARK-42486 > Project: Spark > Issue Type: Dependency upgrade > Components: Build >Affects Versions: 3.5.0 >Reporter: Bjørn Jørgensen >Priority: Major > > [ZooKeeper 3.6 is EoL since 30th December, > 2022|https://zookeeper.apache.org/releases.html] > [Release notes|https://zookeeper.apache.org/doc/r3.6.4/releasenotes.html] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42485) SPIP: Shutting down spark structured streaming when the streaming process completed current process
[ https://issues.apache.org/jira/browse/SPARK-42485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mich Talebzadeh updated SPARK-42485: Description: Spark Structured Streaming AKA SSS is a very useful tool in dealing with Event Driven Architecture. In an Event Driven Architecture, there is generally a main loop that listens for events and then triggers a call-back function when one of those events is detected. In a streaming application the application waits to receive the source messages in a set interval or whenever they happen and reacts accordingly. There are occasions that you may want to stop the Spark program gracefully. Gracefully meaning that Spark application handles the last streaming message completely and terminates the application. This is different from invoking interrupts such as CTRL-C. Of course one can terminate the process based on the following # query.awaitTermination() # Waits for the termination of this query, with stop() or with error # query.awaitTermination(timeoutMs) # Returns true if this query is terminated within the timeout in milliseconds. So the first one above waits until an interrupt signal is received. The second one will count the timeout and will exit when timeout in milliseconds is reached. The issue is that one needs to predict how long the streaming job needs to run. Clearly any interrupt at the terminal or OS level (kill process), may end up the processing terminated without a proper completion of the streaming process. I have devised a method that allows one to terminate the spark application internally after processing the last received message. Within say 2 seconds of the confirmation of shutdown, the process will invoke a graceful shutdown. {color:#00}This new feature proposes a solution to handle the topic doing work for the message being processed gracefully, wait for it to complete and shutdown the streaming process for a given topic without loss of data or orphaned transactions{color} was:{color:#00}How to shutdown the topic doing work for the message being processed, wait for it to complete and shutdown the streaming process for a given topic.{color} > SPIP: Shutting down spark structured streaming when the streaming process > completed current process > --- > > Key: SPARK-42485 > URL: https://issues.apache.org/jira/browse/SPARK-42485 > Project: Spark > Issue Type: New Feature > Components: Structured Streaming >Affects Versions: 3.3.2 >Reporter: Mich Talebzadeh >Priority: Major > Labels: SPIP > > Spark Structured Streaming AKA SSS is a very useful tool in dealing with > Event Driven Architecture. In an Event Driven Architecture, there is > generally a main loop that listens for events and then triggers a call-back > function when one of those events is detected. In a streaming application the > application waits to receive the source messages in a set interval or > whenever they happen and reacts accordingly. > There are occasions that you may want to stop the Spark program gracefully. > Gracefully meaning that Spark application handles the last streaming message > completely and terminates the application. This is different from invoking > interrupts such as CTRL-C. > Of course one can terminate the process based on the following > # query.awaitTermination() # Waits for the termination of this query, with > stop() or with error > # query.awaitTermination(timeoutMs) # Returns true if this query is > terminated within the timeout in milliseconds. > So the first one above waits until an interrupt signal is received. The > second one will count the timeout and will exit when timeout in milliseconds > is reached. > The issue is that one needs to predict how long the streaming job needs to > run. Clearly any interrupt at the terminal or OS level (kill process), may > end up the processing terminated without a proper completion of the streaming > process. > > I have devised a method that allows one to terminate the spark application > internally after processing the last received message. Within say 2 seconds > of the confirmation of shutdown, the process will invoke a graceful shutdown. > {color:#00}This new feature proposes a solution to handle the topic doing > work for the message being processed gracefully, wait for it to complete and > shutdown the streaming process for a given topic without loss of data or > orphaned transactions{color} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42485) SPIP: Shutting down spark structured streaming when the streaming process completed current process
[ https://issues.apache.org/jira/browse/SPARK-42485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mich Talebzadeh updated SPARK-42485: Description: Spark Structured Streaming AKA SSS is a very useful tool in dealing with Event Driven Architecture. In an Event Driven Architecture, there is generally a main loop that listens for events and then triggers a call-back function when one of those events is detected. In a streaming application the application waits to receive the source messages in a set interval or whenever they happen and reacts accordingly. There are occasions that you may want to stop the Spark program gracefully. Gracefully meaning that Spark application handles the last streaming message completely and terminates the application. This is different from invoking interrupts such as CTRL-C. Of course one can terminate the process based on the following # query.awaitTermination() # Waits for the termination of this query, with stop() or with error # query.awaitTermination(timeoutMs) # Returns true if this query is terminated within the timeout in milliseconds. So the first one above waits until an interrupt signal is received. The second one will count the timeout and will exit when timeout in milliseconds is reached. The issue is that one needs to predict how long the streaming job needs to run. Clearly any interrupt at the terminal or OS level (kill process), may end up the processing terminated without a proper completion of the streaming process. I have devised a method that allows one to terminate the spark application internally after processing the last received message. Within say 2 seconds of the confirmation of shutdown, the process will invoke a graceful shutdown. {color:#00}This new feature proposes a solution to handle the topic doing work for the message being processed gracefully, wait for it to complete and shutdown the streaming process for a given topic without loss of data or orphaned transactions{color} was: Spark Structured Streaming AKA SSS is a very useful tool in dealing with Event Driven Architecture. In an Event Driven Architecture, there is generally a main loop that listens for events and then triggers a call-back function when one of those events is detected. In a streaming application the application waits to receive the source messages in a set interval or whenever they happen and reacts accordingly. There are occasions that you may want to stop the Spark program gracefully. Gracefully meaning that Spark application handles the last streaming message completely and terminates the application. This is different from invoking interrupts such as CTRL-C. Of course one can terminate the process based on the following # query.awaitTermination() # Waits for the termination of this query, with stop() or with error # query.awaitTermination(timeoutMs) # Returns true if this query is terminated within the timeout in milliseconds. So the first one above waits until an interrupt signal is received. The second one will count the timeout and will exit when timeout in milliseconds is reached. The issue is that one needs to predict how long the streaming job needs to run. Clearly any interrupt at the terminal or OS level (kill process), may end up the processing terminated without a proper completion of the streaming process. I have devised a method that allows one to terminate the spark application internally after processing the last received message. Within say 2 seconds of the confirmation of shutdown, the process will invoke a graceful shutdown. {color:#00}This new feature proposes a solution to handle the topic doing work for the message being processed gracefully, wait for it to complete and shutdown the streaming process for a given topic without loss of data or orphaned transactions{color} > SPIP: Shutting down spark structured streaming when the streaming process > completed current process > --- > > Key: SPARK-42485 > URL: https://issues.apache.org/jira/browse/SPARK-42485 > Project: Spark > Issue Type: New Feature > Components: Structured Streaming >Affects Versions: 3.3.2 >Reporter: Mich Talebzadeh >Priority: Major > Labels: SPIP > > Spark Structured Streaming AKA SSS is a very useful tool in dealing with > Event Driven Architecture. In an Event Driven Architecture, there is > generally a main loop that listens for events and then triggers a call-back > function when one of those events is detected. In a streaming application the > application waits to receive the source messages in a set interval or > whenever they happen and reacts accordingly. > There are occasions that you may want to stop the Spark program gracefully. >
[jira] [Updated] (SPARK-42485) SPIP: Shutting down spark structured streaming when the streaming process completed current process
[ https://issues.apache.org/jira/browse/SPARK-42485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mich Talebzadeh updated SPARK-42485: Description: Spark Structured Streaming is a very useful tool in dealing with Event Driven Architecture. In an Event Driven Architecture, there is generally a main loop that listens for events and then triggers a call-back function when one of those events is detected. In a streaming application the application waits to receive the source messages in a set interval or whenever they happen and reacts accordingly. There are occasions that you may want to stop the Spark program gracefully. Gracefully meaning that Spark application handles the last streaming message completely and terminates the application. This is different from invoking interrupts such as CTRL-C. Of course one can terminate the process based on the following # query.awaitTermination() # Waits for the termination of this query, with stop() or with error # query.awaitTermination(timeoutMs) # Returns true if this query is terminated within the timeout in milliseconds. So the first one above waits until an interrupt signal is received. The second one will count the timeout and will exit when timeout in milliseconds is reached. The issue is that one needs to predict how long the streaming job needs to run. Clearly any interrupt at the terminal or OS level (kill process), may end up the processing terminated without a proper completion of the streaming process. I have devised a method that allows one to terminate the spark application internally after processing the last received message. Within say 2 seconds of the confirmation of shutdown, the process will invoke a graceful shutdown. {color:#00}This new feature proposes a solution to handle the topic doing work for the message being processed gracefully, wait for it to complete and shutdown the streaming process for a given topic without loss of data or orphaned transactions{color} was: Spark Structured Streaming AKA SSS is a very useful tool in dealing with Event Driven Architecture. In an Event Driven Architecture, there is generally a main loop that listens for events and then triggers a call-back function when one of those events is detected. In a streaming application the application waits to receive the source messages in a set interval or whenever they happen and reacts accordingly. There are occasions that you may want to stop the Spark program gracefully. Gracefully meaning that Spark application handles the last streaming message completely and terminates the application. This is different from invoking interrupts such as CTRL-C. Of course one can terminate the process based on the following # query.awaitTermination() # Waits for the termination of this query, with stop() or with error # query.awaitTermination(timeoutMs) # Returns true if this query is terminated within the timeout in milliseconds. So the first one above waits until an interrupt signal is received. The second one will count the timeout and will exit when timeout in milliseconds is reached. The issue is that one needs to predict how long the streaming job needs to run. Clearly any interrupt at the terminal or OS level (kill process), may end up the processing terminated without a proper completion of the streaming process. I have devised a method that allows one to terminate the spark application internally after processing the last received message. Within say 2 seconds of the confirmation of shutdown, the process will invoke a graceful shutdown. {color:#00}This new feature proposes a solution to handle the topic doing work for the message being processed gracefully, wait for it to complete and shutdown the streaming process for a given topic without loss of data or orphaned transactions{color} > SPIP: Shutting down spark structured streaming when the streaming process > completed current process > --- > > Key: SPARK-42485 > URL: https://issues.apache.org/jira/browse/SPARK-42485 > Project: Spark > Issue Type: New Feature > Components: Structured Streaming >Affects Versions: 3.3.2 >Reporter: Mich Talebzadeh >Priority: Major > Labels: SPIP > > Spark Structured Streaming is a very useful tool in dealing with Event Driven > Architecture. In an Event Driven Architecture, there is generally a main loop > that listens for events and then triggers a call-back function when one of > those events is detected. In a streaming application the application waits to > receive the source messages in a set interval or whenever they happen and > reacts accordingly. > There are occasions that you may want to stop the Spark program gracefully. > Gracefully meaning
[jira] [Updated] (SPARK-42485) SPIP: Shutting down spark structured streaming when the streaming process completed current process
[ https://issues.apache.org/jira/browse/SPARK-42485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-42485: -- Shepherd: (was: Dongjoon Hyun) > SPIP: Shutting down spark structured streaming when the streaming process > completed current process > --- > > Key: SPARK-42485 > URL: https://issues.apache.org/jira/browse/SPARK-42485 > Project: Spark > Issue Type: New Feature > Components: Structured Streaming >Affects Versions: 3.3.2 >Reporter: Mich Talebzadeh >Priority: Major > Labels: SPIP > > Spark Structured Streaming is a very useful tool in dealing with Event Driven > Architecture. In an Event Driven Architecture, there is generally a main loop > that listens for events and then triggers a call-back function when one of > those events is detected. In a streaming application the application waits to > receive the source messages in a set interval or whenever they happen and > reacts accordingly. > There are occasions that you may want to stop the Spark program gracefully. > Gracefully meaning that Spark application handles the last streaming message > completely and terminates the application. This is different from invoking > interrupts such as CTRL-C. > Of course one can terminate the process based on the following > # query.awaitTermination() # Waits for the termination of this query, with > stop() or with error > # query.awaitTermination(timeoutMs) # Returns true if this query is > terminated within the timeout in milliseconds. > So the first one above waits until an interrupt signal is received. The > second one will count the timeout and will exit when timeout in milliseconds > is reached. > The issue is that one needs to predict how long the streaming job needs to > run. Clearly any interrupt at the terminal or OS level (kill process), may > end up the processing terminated without a proper completion of the streaming > process. > I have devised a method that allows one to terminate the spark application > internally after processing the last received message. Within say 2 seconds > of the confirmation of shutdown, the process will invoke a graceful shutdown. > {color:#00}This new feature proposes a solution to handle the topic doing > work for the message being processed gracefully, wait for it to complete and > shutdown the streaming process for a given topic without loss of data or > orphaned transactions{color} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42485) SPIP: Shutting down spark structured streaming when the streaming process completed current process
[ https://issues.apache.org/jira/browse/SPARK-42485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17690828#comment-17690828 ] Dongjoon Hyun commented on SPARK-42485: --- [~mich.talebza...@gmail.com] Here is a few advice for you. 1. I removed me from Shaphard field. Apache Spark community is a voluntarily 2. You should remove `Target Version` according to the community policy. - https://spark.apache.org/contributing.html {quote}Do not set the following fields: - Fix Version. This is assigned by committers only when resolved. - Target Version. This is assigned by committers to indicate a PR has been accepted for possible fix by the target version.{quote} 3. `New Feature` should follow the version of master branch (3.5.0) as of today because it cannot affect the release branches (branch-3.4/3.3/..) because Apache Spark community policy doesn't allow a feature or improvement backporting. > SPIP: Shutting down spark structured streaming when the streaming process > completed current process > --- > > Key: SPARK-42485 > URL: https://issues.apache.org/jira/browse/SPARK-42485 > Project: Spark > Issue Type: New Feature > Components: Structured Streaming >Affects Versions: 3.3.2 >Reporter: Mich Talebzadeh >Priority: Major > Labels: SPIP > > Spark Structured Streaming is a very useful tool in dealing with Event Driven > Architecture. In an Event Driven Architecture, there is generally a main loop > that listens for events and then triggers a call-back function when one of > those events is detected. In a streaming application the application waits to > receive the source messages in a set interval or whenever they happen and > reacts accordingly. > There are occasions that you may want to stop the Spark program gracefully. > Gracefully meaning that Spark application handles the last streaming message > completely and terminates the application. This is different from invoking > interrupts such as CTRL-C. > Of course one can terminate the process based on the following > # query.awaitTermination() # Waits for the termination of this query, with > stop() or with error > # query.awaitTermination(timeoutMs) # Returns true if this query is > terminated within the timeout in milliseconds. > So the first one above waits until an interrupt signal is received. The > second one will count the timeout and will exit when timeout in milliseconds > is reached. > The issue is that one needs to predict how long the streaming job needs to > run. Clearly any interrupt at the terminal or OS level (kill process), may > end up the processing terminated without a proper completion of the streaming > process. > I have devised a method that allows one to terminate the spark application > internally after processing the last received message. Within say 2 seconds > of the confirmation of shutdown, the process will invoke a graceful shutdown. > {color:#00}This new feature proposes a solution to handle the topic doing > work for the message being processed gracefully, wait for it to complete and > shutdown the streaming process for a given topic without loss of data or > orphaned transactions{color} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-42485) SPIP: Shutting down spark structured streaming when the streaming process completed current process
[ https://issues.apache.org/jira/browse/SPARK-42485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17690828#comment-17690828 ] Dongjoon Hyun edited comment on SPARK-42485 at 2/19/23 1:08 AM: [~mich.talebza...@gmail.com] Here is a few advice for you. 1. I removed me from Shaphard field. 2. You should remove `Target Version` according to the community policy. - [https://spark.apache.org/contributing.html] {quote}Do not set the following fields: - Fix Version. This is assigned by committers only when resolved. - Target Version. This is assigned by committers to indicate a PR has been accepted for possible fix by the target version.{quote} 3. `New Feature` should follow the version of master branch (3.5.0) as of today because it cannot affect the release branches (branch-3.4/3.3/..) because Apache Spark community policy doesn't allow a feature or improvement backporting. was (Author: dongjoon): [~mich.talebza...@gmail.com] Here is a few advice for you. 1. I removed me from Shaphard field. Apache Spark community is a voluntarily 2. You should remove `Target Version` according to the community policy. - https://spark.apache.org/contributing.html {quote}Do not set the following fields: - Fix Version. This is assigned by committers only when resolved. - Target Version. This is assigned by committers to indicate a PR has been accepted for possible fix by the target version.{quote} 3. `New Feature` should follow the version of master branch (3.5.0) as of today because it cannot affect the release branches (branch-3.4/3.3/..) because Apache Spark community policy doesn't allow a feature or improvement backporting. > SPIP: Shutting down spark structured streaming when the streaming process > completed current process > --- > > Key: SPARK-42485 > URL: https://issues.apache.org/jira/browse/SPARK-42485 > Project: Spark > Issue Type: New Feature > Components: Structured Streaming >Affects Versions: 3.3.2 >Reporter: Mich Talebzadeh >Priority: Major > Labels: SPIP > > Spark Structured Streaming is a very useful tool in dealing with Event Driven > Architecture. In an Event Driven Architecture, there is generally a main loop > that listens for events and then triggers a call-back function when one of > those events is detected. In a streaming application the application waits to > receive the source messages in a set interval or whenever they happen and > reacts accordingly. > There are occasions that you may want to stop the Spark program gracefully. > Gracefully meaning that Spark application handles the last streaming message > completely and terminates the application. This is different from invoking > interrupts such as CTRL-C. > Of course one can terminate the process based on the following > # query.awaitTermination() # Waits for the termination of this query, with > stop() or with error > # query.awaitTermination(timeoutMs) # Returns true if this query is > terminated within the timeout in milliseconds. > So the first one above waits until an interrupt signal is received. The > second one will count the timeout and will exit when timeout in milliseconds > is reached. > The issue is that one needs to predict how long the streaming job needs to > run. Clearly any interrupt at the terminal or OS level (kill process), may > end up the processing terminated without a proper completion of the streaming > process. > I have devised a method that allows one to terminate the spark application > internally after processing the last received message. Within say 2 seconds > of the confirmation of shutdown, the process will invoke a graceful shutdown. > {color:#00}This new feature proposes a solution to handle the topic doing > work for the message being processed gracefully, wait for it to complete and > shutdown the streaming process for a given topic without loss of data or > orphaned transactions{color} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42473) An explicit cast will be needed when INSERT OVERWRITE SELECT UNION ALL
[ https://issues.apache.org/jira/browse/SPARK-42473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17690833#comment-17690833 ] Yuming Wang commented on SPARK-42473: - What is your test.spark33_decimal_orc column type? > An explicit cast will be needed when INSERT OVERWRITE SELECT UNION ALL > -- > > Key: SPARK-42473 > URL: https://issues.apache.org/jira/browse/SPARK-42473 > Project: Spark > Issue Type: Bug > Components: Optimizer >Affects Versions: 3.3.1 > Environment: spark 3.3.1 >Reporter: kevinshin >Priority: Major > > *when 'union all' and one select statement use* *Literal as column value , > the other* *select statement has computed expression at the same column , > then the whole statement will compile failed. A explicit cast will be needed.* > for example: > {color:#4c9aff}explain{color} > {color:#4c9aff}*INSERT* OVERWRITE *TABLE* test.spark33_decimal_orc{color} > {color:#4c9aff}*select* *null* *as* amt1, {*}cast{*}('256.99' *as* > {*}decimal{*}(20,8)) *as* amt2{color} > {color:#4c9aff}*union* *all*{color} > {color:#4c9aff}*select* {*}cast{*}('200.99' *as* > {*}decimal{*}(20,8)){*}/{*}100 *as* amt1,{*}cast{*}('256.99' *as* > {*}decimal{*}(20,8)) *as* amt2;{color} > *will got error :* > org.apache.spark.{*}sql{*}.catalyst.expressions.Literal cannot be *cast* *to* > org.apache.spark.{*}sql{*}.catalyst.expressions.AnsiCast > The SQL will need to change to : > {color:#4c9aff}explain{color} > {color:#4c9aff}*INSERT* OVERWRITE *TABLE* test.spark33_decimal_orc{color} > {color:#4c9aff}*select* *null* *as* amt1,{*}cast{*}('256.99' *as* > {*}decimal{*}(20,8)) *as* amt2{color} > {color:#4c9aff}*union* *all*{color} > {color:#4c9aff}*select* {color:#de350b}{*}cast{*}({color}{*}cast{*}('200.99' > *as* {*}decimal{*}(20,8)){*}/{*}100 *as* > {*}decimal{*}(20,8){color:#de350b}){color} *as* amt1,{*}cast{*}('256.99' *as* > {*}decimal{*}(20,8)) *as* amt2;{color} > > *but this is not need in spark3.2.1 , is this a bug for spark3.3.1 ?* -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42406) [PROTOBUF] Recursive field handling is incompatible with delta
[ https://issues.apache.org/jira/browse/SPARK-42406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17690837#comment-17690837 ] Apache Spark commented on SPARK-42406: -- User 'rangadi' has created a pull request for this issue: https://github.com/apache/spark/pull/40080 > [PROTOBUF] Recursive field handling is incompatible with delta > -- > > Key: SPARK-42406 > URL: https://issues.apache.org/jira/browse/SPARK-42406 > Project: Spark > Issue Type: Bug > Components: Protobuf >Affects Versions: 3.4.0 >Reporter: Raghu Angadi >Assignee: Raghu Angadi >Priority: Major > Fix For: 3.4.0 > > > Protobuf deserializer (`from_protobuf()` function()) optionally supports > recursive fields by limiting the depth to certain level. See example below. > It assigns a 'NullType' for such a field when allowed depth is reached. > It causes a few issues. E.g. a repeated field as in the following example > results in a Array field with 'NullType'. Delta does not support null type in > a complex type. > Actually `Array[NullType]` is not really useful anyway. > How about this fix: Drop the recursive field when the limit reached rather > than using a NullType. > The example below makes it clear: > Consider a recursive Protobuf: > > {code:python} > message TreeNode { > string value = 1; > repeated TreeNode children = 2; > } > {code} > Allow depth of 2: > > {code:python} > df.select( > 'proto', > messageName = 'TreeNode', > options = { ... "recursive.fields.max.depth" : "2" } > ).printSchema() > {code} > Schema looks like this: > {noformat} > root > |– from_protobuf(proto): struct (nullable = true)| > | |– value: string (nullable = true)| > | |– children: array (nullable = false)| > | | |– element: struct (containsNull = false)| > | | | |– value: string (nullable = true)| > | | | |– children: array (nullable = false)| > | | | | |– element: struct (containsNull = false)| > | | | | | |– value: string (nullable = true)| > | | | | | |– children: array (nullable = false). [ === Proposed fix: Drop > this field === ]| > | | | | | | |– element: void (containsNull = false) [ === NOTICE 'void' HERE > === ] > {noformat} > When we try to write this to a delta table, we get an error: > {noformat} > AnalysisException: Found nested NullType in column > from_protobuf(proto).children which is of ArrayType. Delta doesn't support > writing NullType in complex types. > {noformat} > > We could just drop the field 'element' when recursion depth is reached. It is > simpler and does not need to deal with NullType. We are ignoring the value > anyway. There is no use in keeping the field. > Another issue is setting for 'recursive.fields.max.depth': It is not enforced > correctly. '0' does not make sense. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42406) [PROTOBUF] Recursive field handling is incompatible with delta
[ https://issues.apache.org/jira/browse/SPARK-42406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42406: Assignee: Raghu Angadi (was: Apache Spark) > [PROTOBUF] Recursive field handling is incompatible with delta > -- > > Key: SPARK-42406 > URL: https://issues.apache.org/jira/browse/SPARK-42406 > Project: Spark > Issue Type: Bug > Components: Protobuf >Affects Versions: 3.4.0 >Reporter: Raghu Angadi >Assignee: Raghu Angadi >Priority: Major > Fix For: 3.4.0 > > > Protobuf deserializer (`from_protobuf()` function()) optionally supports > recursive fields by limiting the depth to certain level. See example below. > It assigns a 'NullType' for such a field when allowed depth is reached. > It causes a few issues. E.g. a repeated field as in the following example > results in a Array field with 'NullType'. Delta does not support null type in > a complex type. > Actually `Array[NullType]` is not really useful anyway. > How about this fix: Drop the recursive field when the limit reached rather > than using a NullType. > The example below makes it clear: > Consider a recursive Protobuf: > > {code:python} > message TreeNode { > string value = 1; > repeated TreeNode children = 2; > } > {code} > Allow depth of 2: > > {code:python} > df.select( > 'proto', > messageName = 'TreeNode', > options = { ... "recursive.fields.max.depth" : "2" } > ).printSchema() > {code} > Schema looks like this: > {noformat} > root > |– from_protobuf(proto): struct (nullable = true)| > | |– value: string (nullable = true)| > | |– children: array (nullable = false)| > | | |– element: struct (containsNull = false)| > | | | |– value: string (nullable = true)| > | | | |– children: array (nullable = false)| > | | | | |– element: struct (containsNull = false)| > | | | | | |– value: string (nullable = true)| > | | | | | |– children: array (nullable = false). [ === Proposed fix: Drop > this field === ]| > | | | | | | |– element: void (containsNull = false) [ === NOTICE 'void' HERE > === ] > {noformat} > When we try to write this to a delta table, we get an error: > {noformat} > AnalysisException: Found nested NullType in column > from_protobuf(proto).children which is of ArrayType. Delta doesn't support > writing NullType in complex types. > {noformat} > > We could just drop the field 'element' when recursion depth is reached. It is > simpler and does not need to deal with NullType. We are ignoring the value > anyway. There is no use in keeping the field. > Another issue is setting for 'recursive.fields.max.depth': It is not enforced > correctly. '0' does not make sense. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42406) [PROTOBUF] Recursive field handling is incompatible with delta
[ https://issues.apache.org/jira/browse/SPARK-42406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42406: Assignee: Apache Spark (was: Raghu Angadi) > [PROTOBUF] Recursive field handling is incompatible with delta > -- > > Key: SPARK-42406 > URL: https://issues.apache.org/jira/browse/SPARK-42406 > Project: Spark > Issue Type: Bug > Components: Protobuf >Affects Versions: 3.4.0 >Reporter: Raghu Angadi >Assignee: Apache Spark >Priority: Major > Fix For: 3.4.0 > > > Protobuf deserializer (`from_protobuf()` function()) optionally supports > recursive fields by limiting the depth to certain level. See example below. > It assigns a 'NullType' for such a field when allowed depth is reached. > It causes a few issues. E.g. a repeated field as in the following example > results in a Array field with 'NullType'. Delta does not support null type in > a complex type. > Actually `Array[NullType]` is not really useful anyway. > How about this fix: Drop the recursive field when the limit reached rather > than using a NullType. > The example below makes it clear: > Consider a recursive Protobuf: > > {code:python} > message TreeNode { > string value = 1; > repeated TreeNode children = 2; > } > {code} > Allow depth of 2: > > {code:python} > df.select( > 'proto', > messageName = 'TreeNode', > options = { ... "recursive.fields.max.depth" : "2" } > ).printSchema() > {code} > Schema looks like this: > {noformat} > root > |– from_protobuf(proto): struct (nullable = true)| > | |– value: string (nullable = true)| > | |– children: array (nullable = false)| > | | |– element: struct (containsNull = false)| > | | | |– value: string (nullable = true)| > | | | |– children: array (nullable = false)| > | | | | |– element: struct (containsNull = false)| > | | | | | |– value: string (nullable = true)| > | | | | | |– children: array (nullable = false). [ === Proposed fix: Drop > this field === ]| > | | | | | | |– element: void (containsNull = false) [ === NOTICE 'void' HERE > === ] > {noformat} > When we try to write this to a delta table, we get an error: > {noformat} > AnalysisException: Found nested NullType in column > from_protobuf(proto).children which is of ArrayType. Delta doesn't support > writing NullType in complex types. > {noformat} > > We could just drop the field 'element' when recursion depth is reached. It is > simpler and does not need to deal with NullType. We are ignoring the value > anyway. There is no use in keeping the field. > Another issue is setting for 'recursive.fields.max.depth': It is not enforced > correctly. '0' does not make sense. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42406) [PROTOBUF] Recursive field handling is incompatible with delta
[ https://issues.apache.org/jira/browse/SPARK-42406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17690838#comment-17690838 ] Apache Spark commented on SPARK-42406: -- User 'rangadi' has created a pull request for this issue: https://github.com/apache/spark/pull/40080 > [PROTOBUF] Recursive field handling is incompatible with delta > -- > > Key: SPARK-42406 > URL: https://issues.apache.org/jira/browse/SPARK-42406 > Project: Spark > Issue Type: Bug > Components: Protobuf >Affects Versions: 3.4.0 >Reporter: Raghu Angadi >Assignee: Raghu Angadi >Priority: Major > Fix For: 3.4.0 > > > Protobuf deserializer (`from_protobuf()` function()) optionally supports > recursive fields by limiting the depth to certain level. See example below. > It assigns a 'NullType' for such a field when allowed depth is reached. > It causes a few issues. E.g. a repeated field as in the following example > results in a Array field with 'NullType'. Delta does not support null type in > a complex type. > Actually `Array[NullType]` is not really useful anyway. > How about this fix: Drop the recursive field when the limit reached rather > than using a NullType. > The example below makes it clear: > Consider a recursive Protobuf: > > {code:python} > message TreeNode { > string value = 1; > repeated TreeNode children = 2; > } > {code} > Allow depth of 2: > > {code:python} > df.select( > 'proto', > messageName = 'TreeNode', > options = { ... "recursive.fields.max.depth" : "2" } > ).printSchema() > {code} > Schema looks like this: > {noformat} > root > |– from_protobuf(proto): struct (nullable = true)| > | |– value: string (nullable = true)| > | |– children: array (nullable = false)| > | | |– element: struct (containsNull = false)| > | | | |– value: string (nullable = true)| > | | | |– children: array (nullable = false)| > | | | | |– element: struct (containsNull = false)| > | | | | | |– value: string (nullable = true)| > | | | | | |– children: array (nullable = false). [ === Proposed fix: Drop > this field === ]| > | | | | | | |– element: void (containsNull = false) [ === NOTICE 'void' HERE > === ] > {noformat} > When we try to write this to a delta table, we get an error: > {noformat} > AnalysisException: Found nested NullType in column > from_protobuf(proto).children which is of ArrayType. Delta doesn't support > writing NullType in complex types. > {noformat} > > We could just drop the field 'element' when recursion depth is reached. It is > simpler and does not need to deal with NullType. We are ignoring the value > anyway. There is no use in keeping the field. > Another issue is setting for 'recursive.fields.max.depth': It is not enforced > correctly. '0' does not make sense. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org