[jira] [Assigned] (SPARK-42817) Spark driver logs are filled with Initializing service data for shuffle service using name
[ https://issues.apache.org/jira/browse/SPARK-42817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42817: Assignee: Apache Spark > Spark driver logs are filled with Initializing service data for shuffle > service using name > -- > > Key: SPARK-42817 > URL: https://issues.apache.org/jira/browse/SPARK-42817 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.0 >Reporter: Chandni Singh >Assignee: Apache Spark >Priority: Major > > With SPARK-34828, we added the ability to make the shuffle service name > configurable and we added a log > [here|https://github.com/apache/spark/blob/8860f69455e5a722626194c4797b4b42cccd4510/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ExecutorRunnable.scala#L118] > that will log the shuffle service name. However, this log is printed in the > driver logs whenever there is new executor launched and pollutes the log. > {code} > 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for > shuffle service using name 'spark_shuffle_311' > 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for > shuffle service using name 'spark_shuffle_311' > 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for > shuffle service using name 'spark_shuffle_311' > 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for > shuffle service using name 'spark_shuffle_311' > 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for > shuffle service using name 'spark_shuffle_311' > 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for > shuffle service using name 'spark_shuffle_311' > 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for > shuffle service using name 'spark_shuffle_311' > 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for > shuffle service using name 'spark_shuffle_311' > 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for > shuffle service using name 'spark_shuffle_311' > 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for > shuffle service using name 'spark_shuffle_311' > 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for > shuffle service using name 'spark_shuffle_311' > 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for > shuffle service using name 'spark_shuffle_311' > 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for > shuffle service using name 'spark_shuffle_311' > 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for > shuffle service using name 'spark_shuffle_311' > 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for > shuffle service using name 'spark_shuffle_311' > 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for > shuffle service using name 'spark_shuffle_311' > 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for > shuffle service using name 'spark_shuffle_311' > 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for > shuffle service using name 'spark_shuffle_311' > 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for > shuffle service using name 'spark_shuffle_311' > 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for > shuffle service using name 'spark_shuffle_311' > {code} > We can just log this once in the driver. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42816) Increase max message size to 128MB
[ https://issues.apache.org/jira/browse/SPARK-42816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42816: Assignee: Apache Spark > Increase max message size to 128MB > -- > > Key: SPARK-42816 > URL: https://issues.apache.org/jira/browse/SPARK-42816 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Martin Grund >Assignee: Apache Spark >Priority: Major > > Support messages up to 128MB -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42816) Increase max message size to 128MB
[ https://issues.apache.org/jira/browse/SPARK-42816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700782#comment-17700782 ] Apache Spark commented on SPARK-42816: -- User 'grundprinzip' has created a pull request for this issue: https://github.com/apache/spark/pull/40447 > Increase max message size to 128MB > -- > > Key: SPARK-42816 > URL: https://issues.apache.org/jira/browse/SPARK-42816 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Martin Grund >Priority: Major > > Support messages up to 128MB -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42816) Increase max message size to 128MB
[ https://issues.apache.org/jira/browse/SPARK-42816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700780#comment-17700780 ] Apache Spark commented on SPARK-42816: -- User 'grundprinzip' has created a pull request for this issue: https://github.com/apache/spark/pull/40447 > Increase max message size to 128MB > -- > > Key: SPARK-42816 > URL: https://issues.apache.org/jira/browse/SPARK-42816 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Martin Grund >Priority: Major > > Support messages up to 128MB -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42816) Increase max message size to 128MB
[ https://issues.apache.org/jira/browse/SPARK-42816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42816: Assignee: (was: Apache Spark) > Increase max message size to 128MB > -- > > Key: SPARK-42816 > URL: https://issues.apache.org/jira/browse/SPARK-42816 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Martin Grund >Priority: Major > > Support messages up to 128MB -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42815) Subexpression elimination support shortcut conditional expression
[ https://issues.apache.org/jira/browse/SPARK-42815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700753#comment-17700753 ] Apache Spark commented on SPARK-42815: -- User 'ulysses-you' has created a pull request for this issue: https://github.com/apache/spark/pull/40446 > Subexpression elimination support shortcut conditional expression > - > > Key: SPARK-42815 > URL: https://issues.apache.org/jira/browse/SPARK-42815 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: XiDuo You >Priority: Minor > > The subexpression in conditional expression may not need to eval even if it > appears more than once. > e.g., `if(or(a, and(b, b)))`, the expression `b` would be skipped if `a` is > true. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42815) Subexpression elimination support shortcut conditional expression
[ https://issues.apache.org/jira/browse/SPARK-42815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700751#comment-17700751 ] Apache Spark commented on SPARK-42815: -- User 'ulysses-you' has created a pull request for this issue: https://github.com/apache/spark/pull/40446 > Subexpression elimination support shortcut conditional expression > - > > Key: SPARK-42815 > URL: https://issues.apache.org/jira/browse/SPARK-42815 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: XiDuo You >Priority: Minor > > The subexpression in conditional expression may not need to eval even if it > appears more than once. > e.g., `if(or(a, and(b, b)))`, the expression `b` would be skipped if `a` is > true. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42815) Subexpression elimination support shortcut conditional expression
[ https://issues.apache.org/jira/browse/SPARK-42815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42815: Assignee: Apache Spark > Subexpression elimination support shortcut conditional expression > - > > Key: SPARK-42815 > URL: https://issues.apache.org/jira/browse/SPARK-42815 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: XiDuo You >Assignee: Apache Spark >Priority: Minor > > The subexpression in conditional expression may not need to eval even if it > appears more than once. > e.g., `if(or(a, and(b, b)))`, the expression `b` would be skipped if `a` is > true. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42815) Subexpression elimination support shortcut conditional expression
[ https://issues.apache.org/jira/browse/SPARK-42815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42815: Assignee: (was: Apache Spark) > Subexpression elimination support shortcut conditional expression > - > > Key: SPARK-42815 > URL: https://issues.apache.org/jira/browse/SPARK-42815 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: XiDuo You >Priority: Minor > > The subexpression in conditional expression may not need to eval even if it > appears more than once. > e.g., `if(or(a, and(b, b)))`, the expression `b` would be skipped if `a` is > true. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42814) Upgrade some maven-plugins
[ https://issues.apache.org/jira/browse/SPARK-42814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700736#comment-17700736 ] Apache Spark commented on SPARK-42814: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/40445 > Upgrade some maven-plugins > -- > > Key: SPARK-42814 > URL: https://issues.apache.org/jira/browse/SPARK-42814 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: Yang Jie >Priority: Minor > > maven-enforcer-plugin 3.0.0-M2 -> 3.2.1 > - [https://github.com/apache/maven-enforcer/releases/tag/enforcer-3.2.1] > - [https://github.com/apache/maven-enforcer/releases/tag/enforcer-3.1.0] > build-helper-maven-plugin 3.2.0 -> 3.3.0 > - > [https://github.com/mojohaus/build-helper-maven-plugin/releases/tag/build-helper-maven-plugin-3.3.0] > maven-compiler-plugin 3.10.1 -> 3.11.0 > - > [https://github.com/apache/maven-compiler-plugin/releases/tag/maven-compiler-plugin-3.11.0] > maven-surefire-plugin 3.0.0-M9 -> 3.0.0 > - [https://github.com/apache/maven-surefire/releases/tag/surefire-3.0.0] > maven-javadoc-plugin 3.4.1 -> 3.5.0 > - > [https://github.com/apache/maven-javadoc-plugin/releases/tag/maven-javadoc-plugin-3.5.0] > maven-deploy-plugin 3.0.0 -> 3.1.0 > - > [https://github.com/apache/maven-deploy-plugin/releases/tag/maven-deploy-plugin-3.1.0] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42814) Upgrade some maven-plugins
[ https://issues.apache.org/jira/browse/SPARK-42814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42814: Assignee: (was: Apache Spark) > Upgrade some maven-plugins > -- > > Key: SPARK-42814 > URL: https://issues.apache.org/jira/browse/SPARK-42814 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: Yang Jie >Priority: Minor > > maven-enforcer-plugin 3.0.0-M2 -> 3.2.1 > - [https://github.com/apache/maven-enforcer/releases/tag/enforcer-3.2.1] > - [https://github.com/apache/maven-enforcer/releases/tag/enforcer-3.1.0] > build-helper-maven-plugin 3.2.0 -> 3.3.0 > - > [https://github.com/mojohaus/build-helper-maven-plugin/releases/tag/build-helper-maven-plugin-3.3.0] > maven-compiler-plugin 3.10.1 -> 3.11.0 > - > [https://github.com/apache/maven-compiler-plugin/releases/tag/maven-compiler-plugin-3.11.0] > maven-surefire-plugin 3.0.0-M9 -> 3.0.0 > - [https://github.com/apache/maven-surefire/releases/tag/surefire-3.0.0] > maven-javadoc-plugin 3.4.1 -> 3.5.0 > - > [https://github.com/apache/maven-javadoc-plugin/releases/tag/maven-javadoc-plugin-3.5.0] > maven-deploy-plugin 3.0.0 -> 3.1.0 > - > [https://github.com/apache/maven-deploy-plugin/releases/tag/maven-deploy-plugin-3.1.0] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42814) Upgrade some maven-plugins
[ https://issues.apache.org/jira/browse/SPARK-42814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42814: Assignee: Apache Spark > Upgrade some maven-plugins > -- > > Key: SPARK-42814 > URL: https://issues.apache.org/jira/browse/SPARK-42814 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Minor > > maven-enforcer-plugin 3.0.0-M2 -> 3.2.1 > - [https://github.com/apache/maven-enforcer/releases/tag/enforcer-3.2.1] > - [https://github.com/apache/maven-enforcer/releases/tag/enforcer-3.1.0] > build-helper-maven-plugin 3.2.0 -> 3.3.0 > - > [https://github.com/mojohaus/build-helper-maven-plugin/releases/tag/build-helper-maven-plugin-3.3.0] > maven-compiler-plugin 3.10.1 -> 3.11.0 > - > [https://github.com/apache/maven-compiler-plugin/releases/tag/maven-compiler-plugin-3.11.0] > maven-surefire-plugin 3.0.0-M9 -> 3.0.0 > - [https://github.com/apache/maven-surefire/releases/tag/surefire-3.0.0] > maven-javadoc-plugin 3.4.1 -> 3.5.0 > - > [https://github.com/apache/maven-javadoc-plugin/releases/tag/maven-javadoc-plugin-3.5.0] > maven-deploy-plugin 3.0.0 -> 3.1.0 > - > [https://github.com/apache/maven-deploy-plugin/releases/tag/maven-deploy-plugin-3.1.0] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42813) Print application info when waitAppCompletion is false
[ https://issues.apache.org/jira/browse/SPARK-42813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700705#comment-17700705 ] Apache Spark commented on SPARK-42813: -- User 'pan3793' has created a pull request for this issue: https://github.com/apache/spark/pull/40444 > Print application info when waitAppCompletion is false > -- > > Key: SPARK-42813 > URL: https://issues.apache.org/jira/browse/SPARK-42813 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.3.2 >Reporter: Cheng Pan >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42813) Print application info when waitAppCompletion is false
[ https://issues.apache.org/jira/browse/SPARK-42813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42813: Assignee: Apache Spark > Print application info when waitAppCompletion is false > -- > > Key: SPARK-42813 > URL: https://issues.apache.org/jira/browse/SPARK-42813 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.3.2 >Reporter: Cheng Pan >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42813) Print application info when waitAppCompletion is false
[ https://issues.apache.org/jira/browse/SPARK-42813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700704#comment-17700704 ] Apache Spark commented on SPARK-42813: -- User 'pan3793' has created a pull request for this issue: https://github.com/apache/spark/pull/40444 > Print application info when waitAppCompletion is false > -- > > Key: SPARK-42813 > URL: https://issues.apache.org/jira/browse/SPARK-42813 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.3.2 >Reporter: Cheng Pan >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42813) Print application info when waitAppCompletion is false
[ https://issues.apache.org/jira/browse/SPARK-42813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42813: Assignee: (was: Apache Spark) > Print application info when waitAppCompletion is false > -- > > Key: SPARK-42813 > URL: https://issues.apache.org/jira/browse/SPARK-42813 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.3.2 >Reporter: Cheng Pan >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42812) client_type is missing from AddArtifactsRequest proto message
[ https://issues.apache.org/jira/browse/SPARK-42812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700682#comment-17700682 ] Apache Spark commented on SPARK-42812: -- User 'vicennial' has created a pull request for this issue: https://github.com/apache/spark/pull/40443 > client_type is missing from AddArtifactsRequest proto message > - > > Key: SPARK-42812 > URL: https://issues.apache.org/jira/browse/SPARK-42812 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.4.0 >Reporter: Venkata Sai Akhil Gudesa >Priority: Major > > The client_type is missing from AddArtifactsRequest proto message -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42812) client_type is missing from AddArtifactsRequest proto message
[ https://issues.apache.org/jira/browse/SPARK-42812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42812: Assignee: Apache Spark > client_type is missing from AddArtifactsRequest proto message > - > > Key: SPARK-42812 > URL: https://issues.apache.org/jira/browse/SPARK-42812 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.4.0 >Reporter: Venkata Sai Akhil Gudesa >Assignee: Apache Spark >Priority: Major > > The client_type is missing from AddArtifactsRequest proto message -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42812) client_type is missing from AddArtifactsRequest proto message
[ https://issues.apache.org/jira/browse/SPARK-42812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42812: Assignee: (was: Apache Spark) > client_type is missing from AddArtifactsRequest proto message > - > > Key: SPARK-42812 > URL: https://issues.apache.org/jira/browse/SPARK-42812 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.4.0 >Reporter: Venkata Sai Akhil Gudesa >Priority: Major > > The client_type is missing from AddArtifactsRequest proto message -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42812) client_type is missing from AddArtifactsRequest proto message
[ https://issues.apache.org/jira/browse/SPARK-42812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700681#comment-17700681 ] Apache Spark commented on SPARK-42812: -- User 'vicennial' has created a pull request for this issue: https://github.com/apache/spark/pull/40443 > client_type is missing from AddArtifactsRequest proto message > - > > Key: SPARK-42812 > URL: https://issues.apache.org/jira/browse/SPARK-42812 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.4.0 >Reporter: Venkata Sai Akhil Gudesa >Priority: Major > > The client_type is missing from AddArtifactsRequest proto message -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42809) Upgrade scala-maven-plugin from 4.8.0 to 4.8.1
[ https://issues.apache.org/jira/browse/SPARK-42809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42809: Assignee: Apache Spark > Upgrade scala-maven-plugin from 4.8.0 to 4.8.1 > -- > > Key: SPARK-42809 > URL: https://issues.apache.org/jira/browse/SPARK-42809 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: BingKun Pan >Assignee: Apache Spark >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42809) Upgrade scala-maven-plugin from 4.8.0 to 4.8.1
[ https://issues.apache.org/jira/browse/SPARK-42809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42809: Assignee: (was: Apache Spark) > Upgrade scala-maven-plugin from 4.8.0 to 4.8.1 > -- > > Key: SPARK-42809 > URL: https://issues.apache.org/jira/browse/SPARK-42809 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: BingKun Pan >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42808) Avoid getting availableProcessors every time in MapOutputTrackerMaster#getStatistics
[ https://issues.apache.org/jira/browse/SPARK-42808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42808: Assignee: Apache Spark > Avoid getting availableProcessors every time in > MapOutputTrackerMaster#getStatistics > > > Key: SPARK-42808 > URL: https://issues.apache.org/jira/browse/SPARK-42808 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.3.2 >Reporter: dzcxzl >Assignee: Apache Spark >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42808) Avoid getting availableProcessors every time in MapOutputTrackerMaster#getStatistics
[ https://issues.apache.org/jira/browse/SPARK-42808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42808: Assignee: (was: Apache Spark) > Avoid getting availableProcessors every time in > MapOutputTrackerMaster#getStatistics > > > Key: SPARK-42808 > URL: https://issues.apache.org/jira/browse/SPARK-42808 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.3.2 >Reporter: dzcxzl >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42808) Avoid getting availableProcessors every time in MapOutputTrackerMaster#getStatistics
[ https://issues.apache.org/jira/browse/SPARK-42808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700617#comment-17700617 ] Apache Spark commented on SPARK-42808: -- User 'cxzl25' has created a pull request for this issue: https://github.com/apache/spark/pull/40440 > Avoid getting availableProcessors every time in > MapOutputTrackerMaster#getStatistics > > > Key: SPARK-42808 > URL: https://issues.apache.org/jira/browse/SPARK-42808 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.3.2 >Reporter: dzcxzl >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42809) Upgrade scala-maven-plugin from 4.8.0 to 4.8.1
[ https://issues.apache.org/jira/browse/SPARK-42809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700619#comment-17700619 ] Apache Spark commented on SPARK-42809: -- User 'panbingkun' has created a pull request for this issue: https://github.com/apache/spark/pull/40442 > Upgrade scala-maven-plugin from 4.8.0 to 4.8.1 > -- > > Key: SPARK-42809 > URL: https://issues.apache.org/jira/browse/SPARK-42809 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: BingKun Pan >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42807) Apply custom log URL pattern for yarn-client AM log URL in SHS
[ https://issues.apache.org/jira/browse/SPARK-42807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700609#comment-17700609 ] Apache Spark commented on SPARK-42807: -- User 'cxzl25' has created a pull request for this issue: https://github.com/apache/spark/pull/40439 > Apply custom log URL pattern for yarn-client AM log URL in SHS > -- > > Key: SPARK-42807 > URL: https://issues.apache.org/jira/browse/SPARK-42807 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.3.2 >Reporter: dzcxzl >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42807) Apply custom log URL pattern for yarn-client AM log URL in SHS
[ https://issues.apache.org/jira/browse/SPARK-42807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700608#comment-17700608 ] Apache Spark commented on SPARK-42807: -- User 'cxzl25' has created a pull request for this issue: https://github.com/apache/spark/pull/40439 > Apply custom log URL pattern for yarn-client AM log URL in SHS > -- > > Key: SPARK-42807 > URL: https://issues.apache.org/jira/browse/SPARK-42807 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.3.2 >Reporter: dzcxzl >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42807) Apply custom log URL pattern for yarn-client AM log URL in SHS
[ https://issues.apache.org/jira/browse/SPARK-42807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42807: Assignee: (was: Apache Spark) > Apply custom log URL pattern for yarn-client AM log URL in SHS > -- > > Key: SPARK-42807 > URL: https://issues.apache.org/jira/browse/SPARK-42807 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.3.2 >Reporter: dzcxzl >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42807) Apply custom log URL pattern for yarn-client AM log URL in SHS
[ https://issues.apache.org/jira/browse/SPARK-42807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42807: Assignee: Apache Spark > Apply custom log URL pattern for yarn-client AM log URL in SHS > -- > > Key: SPARK-42807 > URL: https://issues.apache.org/jira/browse/SPARK-42807 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.3.2 >Reporter: dzcxzl >Assignee: Apache Spark >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42806) Add Catalog
[ https://issues.apache.org/jira/browse/SPARK-42806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42806: Assignee: (was: Apache Spark) > Add Catalog > --- > > Key: SPARK-42806 > URL: https://issues.apache.org/jira/browse/SPARK-42806 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.5.0 >Reporter: Yang Jie >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42806) Add Catalog
[ https://issues.apache.org/jira/browse/SPARK-42806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700599#comment-17700599 ] Apache Spark commented on SPARK-42806: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/40438 > Add Catalog > --- > > Key: SPARK-42806 > URL: https://issues.apache.org/jira/browse/SPARK-42806 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.5.0 >Reporter: Yang Jie >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42806) Add Catalog
[ https://issues.apache.org/jira/browse/SPARK-42806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42806: Assignee: Apache Spark > Add Catalog > --- > > Key: SPARK-42806 > URL: https://issues.apache.org/jira/browse/SPARK-42806 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.5.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41259) Spark-sql cli query results should correspond to schema
[ https://issues.apache.org/jira/browse/SPARK-41259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700580#comment-17700580 ] Apache Spark commented on SPARK-41259: -- User 'Yikf' has created a pull request for this issue: https://github.com/apache/spark/pull/40437 > Spark-sql cli query results should correspond to schema > --- > > Key: SPARK-41259 > URL: https://issues.apache.org/jira/browse/SPARK-41259 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: yikaifei >Priority: Minor > Fix For: 3.4.0 > > > When using the spark-sql cli, Spark outputs only one column in the `show > tables` and `show views` commands to be compatible with Hive output, but the > output schema is still the three columns of Spark -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42619) Add `show_counts` parameter for DataFrame.info
[ https://issues.apache.org/jira/browse/SPARK-42619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700563#comment-17700563 ] Apache Spark commented on SPARK-42619: -- User 'dzhigimont' has created a pull request for this issue: https://github.com/apache/spark/pull/40436 > Add `show_counts` parameter for DataFrame.info > -- > > Key: SPARK-42619 > URL: https://issues.apache.org/jira/browse/SPARK-42619 > Project: Spark > Issue Type: Sub-task > Components: Pandas API on Spark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Priority: Major > > See https://github.com/pandas-dev/pandas/pull/37999 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42619) Add `show_counts` parameter for DataFrame.info
[ https://issues.apache.org/jira/browse/SPARK-42619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42619: Assignee: Apache Spark > Add `show_counts` parameter for DataFrame.info > -- > > Key: SPARK-42619 > URL: https://issues.apache.org/jira/browse/SPARK-42619 > Project: Spark > Issue Type: Sub-task > Components: Pandas API on Spark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Assignee: Apache Spark >Priority: Major > > See https://github.com/pandas-dev/pandas/pull/37999 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42619) Add `show_counts` parameter for DataFrame.info
[ https://issues.apache.org/jira/browse/SPARK-42619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700562#comment-17700562 ] Apache Spark commented on SPARK-42619: -- User 'dzhigimont' has created a pull request for this issue: https://github.com/apache/spark/pull/40436 > Add `show_counts` parameter for DataFrame.info > -- > > Key: SPARK-42619 > URL: https://issues.apache.org/jira/browse/SPARK-42619 > Project: Spark > Issue Type: Sub-task > Components: Pandas API on Spark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Priority: Major > > See https://github.com/pandas-dev/pandas/pull/37999 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42619) Add `show_counts` parameter for DataFrame.info
[ https://issues.apache.org/jira/browse/SPARK-42619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42619: Assignee: (was: Apache Spark) > Add `show_counts` parameter for DataFrame.info > -- > > Key: SPARK-42619 > URL: https://issues.apache.org/jira/browse/SPARK-42619 > Project: Spark > Issue Type: Sub-task > Components: Pandas API on Spark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Priority: Major > > See https://github.com/pandas-dev/pandas/pull/37999 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42496) Introducing Spark Connect on the main page and adding Spark Connect Overview page
[ https://issues.apache.org/jira/browse/SPARK-42496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700539#comment-17700539 ] Apache Spark commented on SPARK-42496: -- User 'allanf-db' has created a pull request for this issue: https://github.com/apache/spark/pull/40435 > Introducing Spark Connect on the main page and adding Spark Connect Overview > page > - > > Key: SPARK-42496 > URL: https://issues.apache.org/jira/browse/SPARK-42496 > Project: Spark > Issue Type: Sub-task > Components: Connect, Documentation >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Assignee: Allan Folting >Priority: Major > Fix For: 3.4.1 > > > We should document the introduction of Spark Connect at PySpark main > documentation page to give a summary to users. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42803) Use getParameterCount function instead of getParameterTypes.length
[ https://issues.apache.org/jira/browse/SPARK-42803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42803: Assignee: (was: Apache Spark) > Use getParameterCount function instead of getParameterTypes.length > -- > > Key: SPARK-42803 > URL: https://issues.apache.org/jira/browse/SPARK-42803 > Project: Spark > Issue Type: Improvement > Components: ML, Spark Core, SQL >Affects Versions: 3.3.3 >Reporter: Narek Karapetian >Priority: Minor > Fix For: 3.3.2 > > > Since jdk1.8 there is an additional function in reflection API > {{{}getParameterCount{}}}, it is better to use that function instead of > {{getParameterTypes.length}} because {{getParameterTypes}} function makes a > copy of the parameter types array every invocation. > This will help to avoid redundant arrays creation. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42803) Use getParameterCount function instead of getParameterTypes.length
[ https://issues.apache.org/jira/browse/SPARK-42803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700529#comment-17700529 ] Apache Spark commented on SPARK-42803: -- User 'NarekDW' has created a pull request for this issue: https://github.com/apache/spark/pull/40422 > Use getParameterCount function instead of getParameterTypes.length > -- > > Key: SPARK-42803 > URL: https://issues.apache.org/jira/browse/SPARK-42803 > Project: Spark > Issue Type: Improvement > Components: ML, Spark Core, SQL >Affects Versions: 3.3.3 >Reporter: Narek Karapetian >Priority: Minor > Fix For: 3.3.2 > > > Since jdk1.8 there is an additional function in reflection API > {{{}getParameterCount{}}}, it is better to use that function instead of > {{getParameterTypes.length}} because {{getParameterTypes}} function makes a > copy of the parameter types array every invocation. > This will help to avoid redundant arrays creation. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42803) Use getParameterCount function instead of getParameterTypes.length
[ https://issues.apache.org/jira/browse/SPARK-42803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42803: Assignee: Apache Spark > Use getParameterCount function instead of getParameterTypes.length > -- > > Key: SPARK-42803 > URL: https://issues.apache.org/jira/browse/SPARK-42803 > Project: Spark > Issue Type: Improvement > Components: ML, Spark Core, SQL >Affects Versions: 3.3.3 >Reporter: Narek Karapetian >Assignee: Apache Spark >Priority: Minor > Fix For: 3.3.2 > > > Since jdk1.8 there is an additional function in reflection API > {{{}getParameterCount{}}}, it is better to use that function instead of > {{getParameterTypes.length}} because {{getParameterTypes}} function makes a > copy of the parameter types array every invocation. > This will help to avoid redundant arrays creation. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42801) Fix Flaky ClientE2ETestSuite
[ https://issues.apache.org/jira/browse/SPARK-42801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42801: Assignee: (was: Apache Spark) > Fix Flaky ClientE2ETestSuite > > > Key: SPARK-42801 > URL: https://issues.apache.org/jira/browse/SPARK-42801 > Project: Spark > Issue Type: Bug > Components: Connect, Tests >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42801) Fix Flaky ClientE2ETestSuite
[ https://issues.apache.org/jira/browse/SPARK-42801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42801: Assignee: Apache Spark > Fix Flaky ClientE2ETestSuite > > > Key: SPARK-42801 > URL: https://issues.apache.org/jira/browse/SPARK-42801 > Project: Spark > Issue Type: Bug > Components: Connect, Tests >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42801) Fix Flaky ClientE2ETestSuite
[ https://issues.apache.org/jira/browse/SPARK-42801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700513#comment-17700513 ] Apache Spark commented on SPARK-42801: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/40434 > Fix Flaky ClientE2ETestSuite > > > Key: SPARK-42801 > URL: https://issues.apache.org/jira/browse/SPARK-42801 > Project: Spark > Issue Type: Bug > Components: Connect, Tests >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42706) Document the Spark SQL error classes in user-facing documentation.
[ https://issues.apache.org/jira/browse/SPARK-42706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700505#comment-17700505 ] Apache Spark commented on SPARK-42706: -- User 'itholic' has created a pull request for this issue: https://github.com/apache/spark/pull/40433 > Document the Spark SQL error classes in user-facing documentation. > -- > > Key: SPARK-42706 > URL: https://issues.apache.org/jira/browse/SPARK-42706 > Project: Spark > Issue Type: Sub-task > Components: Documentation, SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Fix For: 3.5.0 > > > We need to have an error class list to user facing documents. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42800) Implement ml function {array_to_vector, vector_to_array}
[ https://issues.apache.org/jira/browse/SPARK-42800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42800: Assignee: Apache Spark > Implement ml function {array_to_vector, vector_to_array} > > > Key: SPARK-42800 > URL: https://issues.apache.org/jira/browse/SPARK-42800 > Project: Spark > Issue Type: Sub-task > Components: Connect, ML, PySpark >Affects Versions: 3.5.0 >Reporter: Ruifeng Zheng >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42800) Implement ml function {array_to_vector, vector_to_array}
[ https://issues.apache.org/jira/browse/SPARK-42800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42800: Assignee: (was: Apache Spark) > Implement ml function {array_to_vector, vector_to_array} > > > Key: SPARK-42800 > URL: https://issues.apache.org/jira/browse/SPARK-42800 > Project: Spark > Issue Type: Sub-task > Components: Connect, ML, PySpark >Affects Versions: 3.5.0 >Reporter: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42800) Implement ml function {array_to_vector, vector_to_array}
[ https://issues.apache.org/jira/browse/SPARK-42800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700496#comment-17700496 ] Apache Spark commented on SPARK-42800: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/40432 > Implement ml function {array_to_vector, vector_to_array} > > > Key: SPARK-42800 > URL: https://issues.apache.org/jira/browse/SPARK-42800 > Project: Spark > Issue Type: Sub-task > Components: Connect, ML, PySpark >Affects Versions: 3.5.0 >Reporter: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42799) Update SBT build `xercesImpl` version to match with pom.xml
[ https://issues.apache.org/jira/browse/SPARK-42799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42799: Assignee: Apache Spark > Update SBT build `xercesImpl` version to match with pom.xml > --- > > Key: SPARK-42799 > URL: https://issues.apache.org/jira/browse/SPARK-42799 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.2.2 >Reporter: Dongjoon Hyun >Assignee: Apache Spark >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42799) Update SBT build `xercesImpl` version to match with pom.xml
[ https://issues.apache.org/jira/browse/SPARK-42799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42799: Assignee: (was: Apache Spark) > Update SBT build `xercesImpl` version to match with pom.xml > --- > > Key: SPARK-42799 > URL: https://issues.apache.org/jira/browse/SPARK-42799 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.2.2 >Reporter: Dongjoon Hyun >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42799) Update SBT build `xercesImpl` version to match with pom.xml
[ https://issues.apache.org/jira/browse/SPARK-42799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700492#comment-17700492 ] Apache Spark commented on SPARK-42799: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/40431 > Update SBT build `xercesImpl` version to match with pom.xml > --- > > Key: SPARK-42799 > URL: https://issues.apache.org/jira/browse/SPARK-42799 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.2.2 >Reporter: Dongjoon Hyun >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42798) Upgrade protobuf-java to 3.22.2
[ https://issues.apache.org/jira/browse/SPARK-42798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42798: Assignee: Apache Spark > Upgrade protobuf-java to 3.22.2 > --- > > Key: SPARK-42798 > URL: https://issues.apache.org/jira/browse/SPARK-42798 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Major > > * [https://github.com/protocolbuffers/protobuf/releases/tag/v22.1] > * [https://github.com/protocolbuffers/protobuf/releases/tag/v22.2] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42798) Upgrade protobuf-java to 3.22.2
[ https://issues.apache.org/jira/browse/SPARK-42798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42798: Assignee: (was: Apache Spark) > Upgrade protobuf-java to 3.22.2 > --- > > Key: SPARK-42798 > URL: https://issues.apache.org/jira/browse/SPARK-42798 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: Yang Jie >Priority: Major > > * [https://github.com/protocolbuffers/protobuf/releases/tag/v22.1] > * [https://github.com/protocolbuffers/protobuf/releases/tag/v22.2] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42798) Upgrade protobuf-java to 3.22.2
[ https://issues.apache.org/jira/browse/SPARK-42798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700484#comment-17700484 ] Apache Spark commented on SPARK-42798: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/40430 > Upgrade protobuf-java to 3.22.2 > --- > > Key: SPARK-42798 > URL: https://issues.apache.org/jira/browse/SPARK-42798 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: Yang Jie >Priority: Major > > * [https://github.com/protocolbuffers/protobuf/releases/tag/v22.1] > * [https://github.com/protocolbuffers/protobuf/releases/tag/v22.2] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42775) approx_percentile produces wrong results for large decimals.
[ https://issues.apache.org/jira/browse/SPARK-42775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700448#comment-17700448 ] Apache Spark commented on SPARK-42775: -- User 'chenhao-db' has created a pull request for this issue: https://github.com/apache/spark/pull/40429 > approx_percentile produces wrong results for large decimals. > > > Key: SPARK-42775 > URL: https://issues.apache.org/jira/browse/SPARK-42775 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0, 2.2.0, 2.3.0, 2.4.0, 3.0.0, 3.1.0, 3.2.0, 3.3.0, > 3.4.0 >Reporter: Chenhao Li >Priority: Major > > In the {{approx_percentile}} expression, Spark casts decimal to double to > update the aggregation state > ([ApproximatePercentile.scala#L181|https://github.com/apache/spark/blob/933dc0c42f0caf74aaa077fd4f2c2e7208452b9b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala#L181]) > and casts the result double back to decimal > ([ApproximatePercentile.scala#L206|https://github.com/apache/spark/blob/933dc0c42f0caf74aaa077fd4f2c2e7208452b9b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala#L206]). > The precision loss in the casts can make the result decimal out of its > precision range. This can lead to the following counter-intuitive results: > {code:sql} > spark-sql> select approx_percentile(col, 0.5) from values > (999) as tab(col); > NULL > spark-sql> select approx_percentile(col, 0.5) is null from values > (999) as tab(col); > false > spark-sql> select cast(approx_percentile(col, 0.5) as string) from values > (999) as tab(col); > 1000 > spark-sql> desc select approx_percentile(col, 0.5) from values > (999) as tab(col); > approx_percentile(col, 0.5, 1)decimal(19,0) > {code} > The result is actually not null, so the second query returns false. The first > query returns null because the result cannot fit into {{{}decimal(19, 0){}}}. > A suggested fix is to use {{Decimal.changePrecision}} here to ensure the > result fits, and really returns a null or throws an exception when the result > doesn't fit. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42775) approx_percentile produces wrong results for large decimals.
[ https://issues.apache.org/jira/browse/SPARK-42775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42775: Assignee: (was: Apache Spark) > approx_percentile produces wrong results for large decimals. > > > Key: SPARK-42775 > URL: https://issues.apache.org/jira/browse/SPARK-42775 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0, 2.2.0, 2.3.0, 2.4.0, 3.0.0, 3.1.0, 3.2.0, 3.3.0, > 3.4.0 >Reporter: Chenhao Li >Priority: Major > > In the {{approx_percentile}} expression, Spark casts decimal to double to > update the aggregation state > ([ApproximatePercentile.scala#L181|https://github.com/apache/spark/blob/933dc0c42f0caf74aaa077fd4f2c2e7208452b9b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala#L181]) > and casts the result double back to decimal > ([ApproximatePercentile.scala#L206|https://github.com/apache/spark/blob/933dc0c42f0caf74aaa077fd4f2c2e7208452b9b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala#L206]). > The precision loss in the casts can make the result decimal out of its > precision range. This can lead to the following counter-intuitive results: > {code:sql} > spark-sql> select approx_percentile(col, 0.5) from values > (999) as tab(col); > NULL > spark-sql> select approx_percentile(col, 0.5) is null from values > (999) as tab(col); > false > spark-sql> select cast(approx_percentile(col, 0.5) as string) from values > (999) as tab(col); > 1000 > spark-sql> desc select approx_percentile(col, 0.5) from values > (999) as tab(col); > approx_percentile(col, 0.5, 1)decimal(19,0) > {code} > The result is actually not null, so the second query returns false. The first > query returns null because the result cannot fit into {{{}decimal(19, 0){}}}. > A suggested fix is to use {{Decimal.changePrecision}} here to ensure the > result fits, and really returns a null or throws an exception when the result > doesn't fit. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42775) approx_percentile produces wrong results for large decimals.
[ https://issues.apache.org/jira/browse/SPARK-42775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42775: Assignee: Apache Spark > approx_percentile produces wrong results for large decimals. > > > Key: SPARK-42775 > URL: https://issues.apache.org/jira/browse/SPARK-42775 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0, 2.2.0, 2.3.0, 2.4.0, 3.0.0, 3.1.0, 3.2.0, 3.3.0, > 3.4.0 >Reporter: Chenhao Li >Assignee: Apache Spark >Priority: Major > > In the {{approx_percentile}} expression, Spark casts decimal to double to > update the aggregation state > ([ApproximatePercentile.scala#L181|https://github.com/apache/spark/blob/933dc0c42f0caf74aaa077fd4f2c2e7208452b9b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala#L181]) > and casts the result double back to decimal > ([ApproximatePercentile.scala#L206|https://github.com/apache/spark/blob/933dc0c42f0caf74aaa077fd4f2c2e7208452b9b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala#L206]). > The precision loss in the casts can make the result decimal out of its > precision range. This can lead to the following counter-intuitive results: > {code:sql} > spark-sql> select approx_percentile(col, 0.5) from values > (999) as tab(col); > NULL > spark-sql> select approx_percentile(col, 0.5) is null from values > (999) as tab(col); > false > spark-sql> select cast(approx_percentile(col, 0.5) as string) from values > (999) as tab(col); > 1000 > spark-sql> desc select approx_percentile(col, 0.5) from values > (999) as tab(col); > approx_percentile(col, 0.5, 1)decimal(19,0) > {code} > The result is actually not null, so the second query returns false. The first > query returns null because the result cannot fit into {{{}decimal(19, 0){}}}. > A suggested fix is to use {{Decimal.changePrecision}} here to ensure the > result fits, and really returns a null or throws an exception when the result > doesn't fit. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42797) Spark Connect - Grammatical improvements to Spark Overview and Spark Connect Overview doc pages
[ https://issues.apache.org/jira/browse/SPARK-42797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42797: Assignee: Apache Spark > Spark Connect - Grammatical improvements to Spark Overview and Spark Connect > Overview doc pages > --- > > Key: SPARK-42797 > URL: https://issues.apache.org/jira/browse/SPARK-42797 > Project: Spark > Issue Type: Documentation > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Allan Folting >Assignee: Apache Spark >Priority: Major > > Grammatical improvements, this is a follow-up to this ticket: > Introducing Spark Connect on the main page and adding Spark Connect Overview > page > https://issues.apache.org/jira/browse/SPARK-42496 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42797) Spark Connect - Grammatical improvements to Spark Overview and Spark Connect Overview doc pages
[ https://issues.apache.org/jira/browse/SPARK-42797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700444#comment-17700444 ] Apache Spark commented on SPARK-42797: -- User 'allanf-db' has created a pull request for this issue: https://github.com/apache/spark/pull/40428 > Spark Connect - Grammatical improvements to Spark Overview and Spark Connect > Overview doc pages > --- > > Key: SPARK-42797 > URL: https://issues.apache.org/jira/browse/SPARK-42797 > Project: Spark > Issue Type: Documentation > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Allan Folting >Priority: Major > > Grammatical improvements, this is a follow-up to this ticket: > Introducing Spark Connect on the main page and adding Spark Connect Overview > page > https://issues.apache.org/jira/browse/SPARK-42496 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42797) Spark Connect - Grammatical improvements to Spark Overview and Spark Connect Overview doc pages
[ https://issues.apache.org/jira/browse/SPARK-42797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42797: Assignee: (was: Apache Spark) > Spark Connect - Grammatical improvements to Spark Overview and Spark Connect > Overview doc pages > --- > > Key: SPARK-42797 > URL: https://issues.apache.org/jira/browse/SPARK-42797 > Project: Spark > Issue Type: Documentation > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Allan Folting >Priority: Major > > Grammatical improvements, this is a follow-up to this ticket: > Introducing Spark Connect on the main page and adding Spark Connect Overview > page > https://issues.apache.org/jira/browse/SPARK-42496 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42792) Add support to track FLUSH_WRITE_BYTES for RocksDB state store for streaming
[ https://issues.apache.org/jira/browse/SPARK-42792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700413#comment-17700413 ] Apache Spark commented on SPARK-42792: -- User 'anishshri-db' has created a pull request for this issue: https://github.com/apache/spark/pull/40427 > Add support to track FLUSH_WRITE_BYTES for RocksDB state store for streaming > > > Key: SPARK-42792 > URL: https://issues.apache.org/jira/browse/SPARK-42792 > Project: Spark > Issue Type: Task > Components: Structured Streaming >Affects Versions: 3.4.0 >Reporter: Anish Shrigondekar >Priority: Major > > Add support to track FLUSH_WRITE_BYTES for RocksDB state store for streaming > > Its useful to get this metric for bytes written during flush from RocksDB as > part of the DB custom metrics. We propose to add this to the existing metrics > that are collected. There is no additional overhead since we are just > querying the internal ticker guage, similar to other metrics. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42792) Add support to track FLUSH_WRITE_BYTES for RocksDB state store for streaming
[ https://issues.apache.org/jira/browse/SPARK-42792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42792: Assignee: (was: Apache Spark) > Add support to track FLUSH_WRITE_BYTES for RocksDB state store for streaming > > > Key: SPARK-42792 > URL: https://issues.apache.org/jira/browse/SPARK-42792 > Project: Spark > Issue Type: Task > Components: Structured Streaming >Affects Versions: 3.4.0 >Reporter: Anish Shrigondekar >Priority: Major > > Add support to track FLUSH_WRITE_BYTES for RocksDB state store for streaming > > Its useful to get this metric for bytes written during flush from RocksDB as > part of the DB custom metrics. We propose to add this to the existing metrics > that are collected. There is no additional overhead since we are just > querying the internal ticker guage, similar to other metrics. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42792) Add support to track FLUSH_WRITE_BYTES for RocksDB state store for streaming
[ https://issues.apache.org/jira/browse/SPARK-42792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42792: Assignee: Apache Spark > Add support to track FLUSH_WRITE_BYTES for RocksDB state store for streaming > > > Key: SPARK-42792 > URL: https://issues.apache.org/jira/browse/SPARK-42792 > Project: Spark > Issue Type: Task > Components: Structured Streaming >Affects Versions: 3.4.0 >Reporter: Anish Shrigondekar >Assignee: Apache Spark >Priority: Major > > Add support to track FLUSH_WRITE_BYTES for RocksDB state store for streaming > > Its useful to get this metric for bytes written during flush from RocksDB as > part of the DB custom metrics. We propose to add this to the existing metrics > that are collected. There is no additional overhead since we are just > querying the internal ticker guage, similar to other metrics. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42796) Support TimestampNTZ in Cached Batch
[ https://issues.apache.org/jira/browse/SPARK-42796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700414#comment-17700414 ] Apache Spark commented on SPARK-42796: -- User 'gengliangwang' has created a pull request for this issue: https://github.com/apache/spark/pull/40426 > Support TimestampNTZ in Cached Batch > > > Key: SPARK-42796 > URL: https://issues.apache.org/jira/browse/SPARK-42796 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.1 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42796) Support TimestampNTZ in Cached Batch
[ https://issues.apache.org/jira/browse/SPARK-42796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42796: Assignee: Apache Spark (was: Gengliang Wang) > Support TimestampNTZ in Cached Batch > > > Key: SPARK-42796 > URL: https://issues.apache.org/jira/browse/SPARK-42796 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.1 >Reporter: Gengliang Wang >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42796) Support TimestampNTZ in Cached Batch
[ https://issues.apache.org/jira/browse/SPARK-42796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700412#comment-17700412 ] Apache Spark commented on SPARK-42796: -- User 'gengliangwang' has created a pull request for this issue: https://github.com/apache/spark/pull/40426 > Support TimestampNTZ in Cached Batch > > > Key: SPARK-42796 > URL: https://issues.apache.org/jira/browse/SPARK-42796 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.1 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42796) Support TimestampNTZ in Cached Batch
[ https://issues.apache.org/jira/browse/SPARK-42796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42796: Assignee: Gengliang Wang (was: Apache Spark) > Support TimestampNTZ in Cached Batch > > > Key: SPARK-42796 > URL: https://issues.apache.org/jira/browse/SPARK-42796 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.1 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42794) Increase the lockAcquireTimeoutMs for acquiring the RocksDB state store in Structure Streaming
[ https://issues.apache.org/jira/browse/SPARK-42794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700402#comment-17700402 ] Apache Spark commented on SPARK-42794: -- User 'huanliwang-db' has created a pull request for this issue: https://github.com/apache/spark/pull/40425 > Increase the lockAcquireTimeoutMs for acquiring the RocksDB state store in > Structure Streaming > -- > > Key: SPARK-42794 > URL: https://issues.apache.org/jira/browse/SPARK-42794 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.5.0 >Reporter: Huanli Wang >Priority: Minor > > We are seeing query failure which is caused by RocksDB acquisition failure > for the retry tasks. > * at t1, we shrink the cluster to only have one executor > {code:java} > 23/03/05 22:47:21 INFO StandaloneAppClient$ClientEndpoint: Executor updated: > app-20230305224215-/2 is now DECOMMISSIONED (worker decommissioned > because of kill request from HTTP endpoint (data migration disabled)) > 23/03/05 22:47:21 INFO StandaloneAppClient$ClientEndpoint: Executor updated: > app-20230305224215-/3 is now DECOMMISSIONED (worker decommissioned > because of kill request from HTTP endpoint (data migration disabled)) > {code} > > * at t1+2min, task 7 at its first attempt (i.e. task 7.0) is scheduled to > the alive executor > {code:java} > 23/03/05 22:49:58 INFO TaskSetManager: Starting task 7.0 in stage 133.0 (TID > 685) (10.166.225.249, executor 0, partition 7, ANY, {code} > > It seems that task 7.0 is able to pass *{{dataRDD.iterator(partition, > ctxt)}}* and acquires the rocksdb lock as we are seeing > {code:java} > 23/03/05 22:51:59 WARN TaskSetManager: Lost task 4.1 in stage 133.1 (TID 700) > (10.166.225.249 executor 0): java.lang.IllegalStateException: > StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be > acquired by [ThreadId: Some(50), task: partition 7.1 in stage 133.1, TID 700] > as it was not released by [ThreadId: Some(449), task: partition 7.0 in stage > 133.0, TID 685] after 60003 ms. > 23/03/05 22:52:59 WARN TaskSetManager: Lost task 4.2 in stage 133.1 (TID 702) > (10.166.225.249 executor 0): java.lang.IllegalStateException: > StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be > acquired by [ThreadId: Some(1495), task: partition 7.2 in stage 133.1, TID > 702] as it was not released by [ThreadId: Some(449), task: partition 7.0 in > stage 133.0, TID 685] after 60006 ms. > 23/03/05 22:53:59 WARN TaskSetManager: Lost task 4.3 in stage 133.1 (TID 704) > (10.166.225.249 executor 0): java.lang.IllegalStateException: > StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be > acquired by [ThreadId: Some(46), task: partition 7.3 in stage 133.1, TID 704] > as it was not released by [ThreadId: Some(449), task: partition 7.0 in stage > 133.0, TID 685] after 60003 ms. > {code} > > Increasing the *lockAcquireTimeoutMs* to 2 minutes such that 4 task retries > will give us 8 minutes to acquire the lock and it is larger than > connectionTimeout with retries (3 * 120s). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42794) Increase the lockAcquireTimeoutMs for acquiring the RocksDB state store in Structure Streaming
[ https://issues.apache.org/jira/browse/SPARK-42794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42794: Assignee: (was: Apache Spark) > Increase the lockAcquireTimeoutMs for acquiring the RocksDB state store in > Structure Streaming > -- > > Key: SPARK-42794 > URL: https://issues.apache.org/jira/browse/SPARK-42794 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.5.0 >Reporter: Huanli Wang >Priority: Minor > > We are seeing query failure which is caused by RocksDB acquisition failure > for the retry tasks. > * at t1, we shrink the cluster to only have one executor > {code:java} > 23/03/05 22:47:21 INFO StandaloneAppClient$ClientEndpoint: Executor updated: > app-20230305224215-/2 is now DECOMMISSIONED (worker decommissioned > because of kill request from HTTP endpoint (data migration disabled)) > 23/03/05 22:47:21 INFO StandaloneAppClient$ClientEndpoint: Executor updated: > app-20230305224215-/3 is now DECOMMISSIONED (worker decommissioned > because of kill request from HTTP endpoint (data migration disabled)) > {code} > > * at t1+2min, task 7 at its first attempt (i.e. task 7.0) is scheduled to > the alive executor > {code:java} > 23/03/05 22:49:58 INFO TaskSetManager: Starting task 7.0 in stage 133.0 (TID > 685) (10.166.225.249, executor 0, partition 7, ANY, {code} > > It seems that task 7.0 is able to pass *{{dataRDD.iterator(partition, > ctxt)}}* and acquires the rocksdb lock as we are seeing > {code:java} > 23/03/05 22:51:59 WARN TaskSetManager: Lost task 4.1 in stage 133.1 (TID 700) > (10.166.225.249 executor 0): java.lang.IllegalStateException: > StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be > acquired by [ThreadId: Some(50), task: partition 7.1 in stage 133.1, TID 700] > as it was not released by [ThreadId: Some(449), task: partition 7.0 in stage > 133.0, TID 685] after 60003 ms. > 23/03/05 22:52:59 WARN TaskSetManager: Lost task 4.2 in stage 133.1 (TID 702) > (10.166.225.249 executor 0): java.lang.IllegalStateException: > StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be > acquired by [ThreadId: Some(1495), task: partition 7.2 in stage 133.1, TID > 702] as it was not released by [ThreadId: Some(449), task: partition 7.0 in > stage 133.0, TID 685] after 60006 ms. > 23/03/05 22:53:59 WARN TaskSetManager: Lost task 4.3 in stage 133.1 (TID 704) > (10.166.225.249 executor 0): java.lang.IllegalStateException: > StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be > acquired by [ThreadId: Some(46), task: partition 7.3 in stage 133.1, TID 704] > as it was not released by [ThreadId: Some(449), task: partition 7.0 in stage > 133.0, TID 685] after 60003 ms. > {code} > > Increasing the *lockAcquireTimeoutMs* to 2 minutes such that 4 task retries > will give us 8 minutes to acquire the lock and it is larger than > connectionTimeout with retries (3 * 120s). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42794) Increase the lockAcquireTimeoutMs for acquiring the RocksDB state store in Structure Streaming
[ https://issues.apache.org/jira/browse/SPARK-42794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42794: Assignee: Apache Spark > Increase the lockAcquireTimeoutMs for acquiring the RocksDB state store in > Structure Streaming > -- > > Key: SPARK-42794 > URL: https://issues.apache.org/jira/browse/SPARK-42794 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.5.0 >Reporter: Huanli Wang >Assignee: Apache Spark >Priority: Minor > > We are seeing query failure which is caused by RocksDB acquisition failure > for the retry tasks. > * at t1, we shrink the cluster to only have one executor > {code:java} > 23/03/05 22:47:21 INFO StandaloneAppClient$ClientEndpoint: Executor updated: > app-20230305224215-/2 is now DECOMMISSIONED (worker decommissioned > because of kill request from HTTP endpoint (data migration disabled)) > 23/03/05 22:47:21 INFO StandaloneAppClient$ClientEndpoint: Executor updated: > app-20230305224215-/3 is now DECOMMISSIONED (worker decommissioned > because of kill request from HTTP endpoint (data migration disabled)) > {code} > > * at t1+2min, task 7 at its first attempt (i.e. task 7.0) is scheduled to > the alive executor > {code:java} > 23/03/05 22:49:58 INFO TaskSetManager: Starting task 7.0 in stage 133.0 (TID > 685) (10.166.225.249, executor 0, partition 7, ANY, {code} > > It seems that task 7.0 is able to pass *{{dataRDD.iterator(partition, > ctxt)}}* and acquires the rocksdb lock as we are seeing > {code:java} > 23/03/05 22:51:59 WARN TaskSetManager: Lost task 4.1 in stage 133.1 (TID 700) > (10.166.225.249 executor 0): java.lang.IllegalStateException: > StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be > acquired by [ThreadId: Some(50), task: partition 7.1 in stage 133.1, TID 700] > as it was not released by [ThreadId: Some(449), task: partition 7.0 in stage > 133.0, TID 685] after 60003 ms. > 23/03/05 22:52:59 WARN TaskSetManager: Lost task 4.2 in stage 133.1 (TID 702) > (10.166.225.249 executor 0): java.lang.IllegalStateException: > StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be > acquired by [ThreadId: Some(1495), task: partition 7.2 in stage 133.1, TID > 702] as it was not released by [ThreadId: Some(449), task: partition 7.0 in > stage 133.0, TID 685] after 60006 ms. > 23/03/05 22:53:59 WARN TaskSetManager: Lost task 4.3 in stage 133.1 (TID 704) > (10.166.225.249 executor 0): java.lang.IllegalStateException: > StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be > acquired by [ThreadId: Some(46), task: partition 7.3 in stage 133.1, TID 704] > as it was not released by [ThreadId: Some(449), task: partition 7.0 in stage > 133.0, TID 685] after 60003 ms. > {code} > > Increasing the *lockAcquireTimeoutMs* to 2 minutes such that 4 task retries > will give us 8 minutes to acquire the lock and it is larger than > connectionTimeout with retries (3 * 120s). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42793) `connect` module requires `build_profile_flags`
[ https://issues.apache.org/jira/browse/SPARK-42793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700398#comment-17700398 ] Apache Spark commented on SPARK-42793: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/40424 > `connect` module requires `build_profile_flags` > --- > > Key: SPARK-42793 > URL: https://issues.apache.org/jira/browse/SPARK-42793 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42793) `connect` module requires `build_profile_flags`
[ https://issues.apache.org/jira/browse/SPARK-42793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42793: Assignee: (was: Apache Spark) > `connect` module requires `build_profile_flags` > --- > > Key: SPARK-42793 > URL: https://issues.apache.org/jira/browse/SPARK-42793 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42793) `connect` module requires `build_profile_flags`
[ https://issues.apache.org/jira/browse/SPARK-42793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42793: Assignee: Apache Spark > `connect` module requires `build_profile_flags` > --- > > Key: SPARK-42793 > URL: https://issues.apache.org/jira/browse/SPARK-42793 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42793) `connect` module requires `build_profile_flags`
[ https://issues.apache.org/jira/browse/SPARK-42793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700397#comment-17700397 ] Apache Spark commented on SPARK-42793: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/40424 > `connect` module requires `build_profile_flags` > --- > > Key: SPARK-42793 > URL: https://issues.apache.org/jira/browse/SPARK-42793 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41775) Implement training functions as input
[ https://issues.apache.org/jira/browse/SPARK-41775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700332#comment-17700332 ] Apache Spark commented on SPARK-41775: -- User 'rithwik-db' has created a pull request for this issue: https://github.com/apache/spark/pull/40423 > Implement training functions as input > - > > Key: SPARK-41775 > URL: https://issues.apache.org/jira/browse/SPARK-41775 > Project: Spark > Issue Type: Sub-task > Components: ML, PySpark >Affects Versions: 3.4.0 >Reporter: Rithwik Ediga Lakhamsani >Assignee: Rithwik Ediga Lakhamsani >Priority: Major > Fix For: 3.4.0 > > > Sidenote: make formatting updates described in > https://github.com/apache/spark/pull/39188 > > Currently, `Distributor().run(...)` takes only files as input. Now we will > add in additional functionality to take in functions as well. This will > require us to go through the following process on each task in the executor > nodes: > 1. take the input function and args and pickle them > 2. Create a temp train.py file that looks like > {code:java} > import cloudpickle > import os > if _name_ == "_main_": > train, args = cloudpickle.load(f"{tempdir}/train_input.pkl") > output = train(*args) > if output and os.environ.get("RANK", "") == "0": # this is for > partitionId == 0 > cloudpickle.dump(f"{tempdir}/train_output.pkl") {code} > 3. Run that train.py file with `torchrun` > 4. Check if `train_output.pkl` has been created on process on partitionId == > 0, if it has, then deserialize it and return that output through `.collect()` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41775) Implement training functions as input
[ https://issues.apache.org/jira/browse/SPARK-41775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700333#comment-17700333 ] Apache Spark commented on SPARK-41775: -- User 'rithwik-db' has created a pull request for this issue: https://github.com/apache/spark/pull/40423 > Implement training functions as input > - > > Key: SPARK-41775 > URL: https://issues.apache.org/jira/browse/SPARK-41775 > Project: Spark > Issue Type: Sub-task > Components: ML, PySpark >Affects Versions: 3.4.0 >Reporter: Rithwik Ediga Lakhamsani >Assignee: Rithwik Ediga Lakhamsani >Priority: Major > Fix For: 3.4.0 > > > Sidenote: make formatting updates described in > https://github.com/apache/spark/pull/39188 > > Currently, `Distributor().run(...)` takes only files as input. Now we will > add in additional functionality to take in functions as well. This will > require us to go through the following process on each task in the executor > nodes: > 1. take the input function and args and pickle them > 2. Create a temp train.py file that looks like > {code:java} > import cloudpickle > import os > if _name_ == "_main_": > train, args = cloudpickle.load(f"{tempdir}/train_input.pkl") > output = train(*args) > if output and os.environ.get("RANK", "") == "0": # this is for > partitionId == 0 > cloudpickle.dump(f"{tempdir}/train_output.pkl") {code} > 3. Run that train.py file with `torchrun` > 4. Check if `train_output.pkl` has been created on process on partitionId == > 0, if it has, then deserialize it and return that output through `.collect()` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42779) Allow V2 writes to indicate advisory partition size
[ https://issues.apache.org/jira/browse/SPARK-42779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42779: Assignee: Apache Spark > Allow V2 writes to indicate advisory partition size > --- > > Key: SPARK-42779 > URL: https://issues.apache.org/jira/browse/SPARK-42779 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: Anton Okolnychyi >Assignee: Apache Spark >Priority: Major > > Data sources may request a particular distribution and ordering of data for > V2 writes. If AQE is enabled, the default session advisory partition size > (64MB) will be used as guidance. Unfortunately, this default value can still > lead to small files because the written data can be compressed nicely using > columnar file formats. Spark should allow data sources to indicate the > advisory shuffle partition size, just like it lets data sources request a > particular number of partitions. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42779) Allow V2 writes to indicate advisory partition size
[ https://issues.apache.org/jira/browse/SPARK-42779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42779: Assignee: (was: Apache Spark) > Allow V2 writes to indicate advisory partition size > --- > > Key: SPARK-42779 > URL: https://issues.apache.org/jira/browse/SPARK-42779 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: Anton Okolnychyi >Priority: Major > > Data sources may request a particular distribution and ordering of data for > V2 writes. If AQE is enabled, the default session advisory partition size > (64MB) will be used as guidance. Unfortunately, this default value can still > lead to small files because the written data can be compressed nicely using > columnar file formats. Spark should allow data sources to indicate the > advisory shuffle partition size, just like it lets data sources request a > particular number of partitions. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42779) Allow V2 writes to indicate advisory partition size
[ https://issues.apache.org/jira/browse/SPARK-42779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700316#comment-17700316 ] Apache Spark commented on SPARK-42779: -- User 'aokolnychyi' has created a pull request for this issue: https://github.com/apache/spark/pull/40421 > Allow V2 writes to indicate advisory partition size > --- > > Key: SPARK-42779 > URL: https://issues.apache.org/jira/browse/SPARK-42779 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: Anton Okolnychyi >Priority: Major > > Data sources may request a particular distribution and ordering of data for > V2 writes. If AQE is enabled, the default session advisory partition size > (64MB) will be used as guidance. Unfortunately, this default value can still > lead to small files because the written data can be compressed nicely using > columnar file formats. Spark should allow data sources to indicate the > advisory shuffle partition size, just like it lets data sources request a > particular number of partitions. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42617) Support `isocalendar`
[ https://issues.apache.org/jira/browse/SPARK-42617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42617: Assignee: (was: Apache Spark) > Support `isocalendar` > - > > Key: SPARK-42617 > URL: https://issues.apache.org/jira/browse/SPARK-42617 > Project: Spark > Issue Type: Sub-task > Components: Pandas API on Spark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Priority: Major > > We should support `isocalendar` to match pandas behavior > (https://pandas.pydata.org/pandas-docs/dev/reference/api/pandas.Series.dt.isocalendar.html) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42617) Support `isocalendar`
[ https://issues.apache.org/jira/browse/SPARK-42617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42617: Assignee: Apache Spark > Support `isocalendar` > - > > Key: SPARK-42617 > URL: https://issues.apache.org/jira/browse/SPARK-42617 > Project: Spark > Issue Type: Sub-task > Components: Pandas API on Spark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Assignee: Apache Spark >Priority: Major > > We should support `isocalendar` to match pandas behavior > (https://pandas.pydata.org/pandas-docs/dev/reference/api/pandas.Series.dt.isocalendar.html) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42617) Support `isocalendar`
[ https://issues.apache.org/jira/browse/SPARK-42617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700282#comment-17700282 ] Apache Spark commented on SPARK-42617: -- User 'dzhigimont' has created a pull request for this issue: https://github.com/apache/spark/pull/40420 > Support `isocalendar` > - > > Key: SPARK-42617 > URL: https://issues.apache.org/jira/browse/SPARK-42617 > Project: Spark > Issue Type: Sub-task > Components: Pandas API on Spark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Priority: Major > > We should support `isocalendar` to match pandas behavior > (https://pandas.pydata.org/pandas-docs/dev/reference/api/pandas.Series.dt.isocalendar.html) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42789) rewrites multiple GetJsonObjects to a JsonTuple if their json expression is the same
[ https://issues.apache.org/jira/browse/SPARK-42789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700241#comment-17700241 ] Apache Spark commented on SPARK-42789: -- User 'wangyum' has created a pull request for this issue: https://github.com/apache/spark/pull/40419 > rewrites multiple GetJsonObjects to a JsonTuple if their json expression is > the same > > > Key: SPARK-42789 > URL: https://issues.apache.org/jira/browse/SPARK-42789 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: Yuming Wang >Priority: Major > > Benchmark result: > {noformat} > Running benchmark: Benchmark rewrite GetJsonObjects > Running case: Default: 2 > Stopped after 2 iterations, 80787 ms > Running case: Rewrite: 2 > Stopped after 2 iterations, 48900 ms > Java HotSpot(TM) 64-Bit Server VM 17.0.4.1+1-LTS-2 on Mac OS X 13.2.1 > Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz > Benchmark rewrite GetJsonObjects: Best Time(ms) Avg Time(ms) > Stdev(ms)Rate(M/s) Per Row(ns) Relative > > Default: 239026 40394 > 1935 0.25397.8 1.0X > Rewrite: 224354 24450 > 137 0.33368.4 1.6X > Running benchmark: Benchmark rewrite GetJsonObjects > Running case: Default: 3 > Stopped after 2 iterations, 115055 ms > Running case: Rewrite: 3 > Stopped after 2 iterations, 62297 ms > Java HotSpot(TM) 64-Bit Server VM 17.0.4.1+1-LTS-2 on Mac OS X 13.2.1 > Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz > Benchmark rewrite GetJsonObjects: Best Time(ms) Avg Time(ms) > Stdev(ms)Rate(M/s) Per Row(ns) Relative > > Default: 354652 57528 > NaN 0.17559.1 1.0X > Rewrite: 330702 31149 > 631 0.24246.6 1.8X > Running benchmark: Benchmark rewrite GetJsonObjects > Running case: Default: 4 > Stopped after 2 iterations, 155392 ms > Running case: Rewrite: 4 > Stopped after 2 iterations, 54776 ms > Java HotSpot(TM) 64-Bit Server VM 17.0.4.1+1-LTS-2 on Mac OS X 13.2.1 > Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz > Benchmark rewrite GetJsonObjects: Best Time(ms) Avg Time(ms) > Stdev(ms)Rate(M/s) Per Row(ns) Relative > > Default: 475503 77696 > NaN 0.1 10443.1 1.0X > Rewrite: 426962 27388 > 602 0.33729.3 2.8X > Running benchmark: Benchmark rewrite GetJsonObjects > Running case: Default: 5 > Stopped after 2 iterations, 192836 ms > Running case: Rewrite: 5 > Stopped after 2 iterations, 51967 ms > Java HotSpot(TM) 64-Bit Server VM 17.0.4.1+1-LTS-2 on Mac OS X 13.2.1 > Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz > Benchmark rewrite GetJsonObjects: Best Time(ms) Avg Time(ms) > Stdev(ms)Rate(M/s) Per Row(ns) Relative > > Default: 594923 96418 > 2115 0.1 13129.1 1.0X > Rewrite: 525362 25984 > 880 0.33507.8 3.7X > Running benchmark: Benchmark rewrite GetJsonObjects > Running case: Default: 10 > Stopped after 2 iterations, 317246 ms > Running case: Rewrite: 10 > Stopped after 2 iterations, 56734 ms > Java HotSpot(TM) 64-Bit Server VM 17.0.4.1+1-LTS-2 on Mac OS X 13.2.1 > Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz > Benchmark rewrite GetJsonObjects: Best Time(ms) Avg Time(ms) > Stdev(ms)Rate(M/s) Per Row(ns) Relative > > Default: 10 157458 158623 > 1648 0.0 21778.6 1.0X > Rewrite: 10 28296 28367 > 100 0.33913.8 5.6X > Running benchmark: Benchmark rewrite GetJsonObjects > Running case: Default: 20 >
[jira] [Assigned] (SPARK-42789) rewrites multiple GetJsonObjects to a JsonTuple if their json expression is the same
[ https://issues.apache.org/jira/browse/SPARK-42789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42789: Assignee: Apache Spark > rewrites multiple GetJsonObjects to a JsonTuple if their json expression is > the same > > > Key: SPARK-42789 > URL: https://issues.apache.org/jira/browse/SPARK-42789 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: Yuming Wang >Assignee: Apache Spark >Priority: Major > > Benchmark result: > {noformat} > Running benchmark: Benchmark rewrite GetJsonObjects > Running case: Default: 2 > Stopped after 2 iterations, 80787 ms > Running case: Rewrite: 2 > Stopped after 2 iterations, 48900 ms > Java HotSpot(TM) 64-Bit Server VM 17.0.4.1+1-LTS-2 on Mac OS X 13.2.1 > Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz > Benchmark rewrite GetJsonObjects: Best Time(ms) Avg Time(ms) > Stdev(ms)Rate(M/s) Per Row(ns) Relative > > Default: 239026 40394 > 1935 0.25397.8 1.0X > Rewrite: 224354 24450 > 137 0.33368.4 1.6X > Running benchmark: Benchmark rewrite GetJsonObjects > Running case: Default: 3 > Stopped after 2 iterations, 115055 ms > Running case: Rewrite: 3 > Stopped after 2 iterations, 62297 ms > Java HotSpot(TM) 64-Bit Server VM 17.0.4.1+1-LTS-2 on Mac OS X 13.2.1 > Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz > Benchmark rewrite GetJsonObjects: Best Time(ms) Avg Time(ms) > Stdev(ms)Rate(M/s) Per Row(ns) Relative > > Default: 354652 57528 > NaN 0.17559.1 1.0X > Rewrite: 330702 31149 > 631 0.24246.6 1.8X > Running benchmark: Benchmark rewrite GetJsonObjects > Running case: Default: 4 > Stopped after 2 iterations, 155392 ms > Running case: Rewrite: 4 > Stopped after 2 iterations, 54776 ms > Java HotSpot(TM) 64-Bit Server VM 17.0.4.1+1-LTS-2 on Mac OS X 13.2.1 > Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz > Benchmark rewrite GetJsonObjects: Best Time(ms) Avg Time(ms) > Stdev(ms)Rate(M/s) Per Row(ns) Relative > > Default: 475503 77696 > NaN 0.1 10443.1 1.0X > Rewrite: 426962 27388 > 602 0.33729.3 2.8X > Running benchmark: Benchmark rewrite GetJsonObjects > Running case: Default: 5 > Stopped after 2 iterations, 192836 ms > Running case: Rewrite: 5 > Stopped after 2 iterations, 51967 ms > Java HotSpot(TM) 64-Bit Server VM 17.0.4.1+1-LTS-2 on Mac OS X 13.2.1 > Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz > Benchmark rewrite GetJsonObjects: Best Time(ms) Avg Time(ms) > Stdev(ms)Rate(M/s) Per Row(ns) Relative > > Default: 594923 96418 > 2115 0.1 13129.1 1.0X > Rewrite: 525362 25984 > 880 0.33507.8 3.7X > Running benchmark: Benchmark rewrite GetJsonObjects > Running case: Default: 10 > Stopped after 2 iterations, 317246 ms > Running case: Rewrite: 10 > Stopped after 2 iterations, 56734 ms > Java HotSpot(TM) 64-Bit Server VM 17.0.4.1+1-LTS-2 on Mac OS X 13.2.1 > Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz > Benchmark rewrite GetJsonObjects: Best Time(ms) Avg Time(ms) > Stdev(ms)Rate(M/s) Per Row(ns) Relative > > Default: 10 157458 158623 > 1648 0.0 21778.6 1.0X > Rewrite: 10 28296 28367 > 100 0.33913.8 5.6X > Running benchmark: Benchmark rewrite GetJsonObjects > Running case: Default: 20 > Stopped after 2 iterations, 618089 ms > Running case: Rewrite: 20 >
[jira] [Assigned] (SPARK-42789) rewrites multiple GetJsonObjects to a JsonTuple if their json expression is the same
[ https://issues.apache.org/jira/browse/SPARK-42789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42789: Assignee: (was: Apache Spark) > rewrites multiple GetJsonObjects to a JsonTuple if their json expression is > the same > > > Key: SPARK-42789 > URL: https://issues.apache.org/jira/browse/SPARK-42789 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: Yuming Wang >Priority: Major > > Benchmark result: > {noformat} > Running benchmark: Benchmark rewrite GetJsonObjects > Running case: Default: 2 > Stopped after 2 iterations, 80787 ms > Running case: Rewrite: 2 > Stopped after 2 iterations, 48900 ms > Java HotSpot(TM) 64-Bit Server VM 17.0.4.1+1-LTS-2 on Mac OS X 13.2.1 > Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz > Benchmark rewrite GetJsonObjects: Best Time(ms) Avg Time(ms) > Stdev(ms)Rate(M/s) Per Row(ns) Relative > > Default: 239026 40394 > 1935 0.25397.8 1.0X > Rewrite: 224354 24450 > 137 0.33368.4 1.6X > Running benchmark: Benchmark rewrite GetJsonObjects > Running case: Default: 3 > Stopped after 2 iterations, 115055 ms > Running case: Rewrite: 3 > Stopped after 2 iterations, 62297 ms > Java HotSpot(TM) 64-Bit Server VM 17.0.4.1+1-LTS-2 on Mac OS X 13.2.1 > Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz > Benchmark rewrite GetJsonObjects: Best Time(ms) Avg Time(ms) > Stdev(ms)Rate(M/s) Per Row(ns) Relative > > Default: 354652 57528 > NaN 0.17559.1 1.0X > Rewrite: 330702 31149 > 631 0.24246.6 1.8X > Running benchmark: Benchmark rewrite GetJsonObjects > Running case: Default: 4 > Stopped after 2 iterations, 155392 ms > Running case: Rewrite: 4 > Stopped after 2 iterations, 54776 ms > Java HotSpot(TM) 64-Bit Server VM 17.0.4.1+1-LTS-2 on Mac OS X 13.2.1 > Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz > Benchmark rewrite GetJsonObjects: Best Time(ms) Avg Time(ms) > Stdev(ms)Rate(M/s) Per Row(ns) Relative > > Default: 475503 77696 > NaN 0.1 10443.1 1.0X > Rewrite: 426962 27388 > 602 0.33729.3 2.8X > Running benchmark: Benchmark rewrite GetJsonObjects > Running case: Default: 5 > Stopped after 2 iterations, 192836 ms > Running case: Rewrite: 5 > Stopped after 2 iterations, 51967 ms > Java HotSpot(TM) 64-Bit Server VM 17.0.4.1+1-LTS-2 on Mac OS X 13.2.1 > Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz > Benchmark rewrite GetJsonObjects: Best Time(ms) Avg Time(ms) > Stdev(ms)Rate(M/s) Per Row(ns) Relative > > Default: 594923 96418 > 2115 0.1 13129.1 1.0X > Rewrite: 525362 25984 > 880 0.33507.8 3.7X > Running benchmark: Benchmark rewrite GetJsonObjects > Running case: Default: 10 > Stopped after 2 iterations, 317246 ms > Running case: Rewrite: 10 > Stopped after 2 iterations, 56734 ms > Java HotSpot(TM) 64-Bit Server VM 17.0.4.1+1-LTS-2 on Mac OS X 13.2.1 > Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz > Benchmark rewrite GetJsonObjects: Best Time(ms) Avg Time(ms) > Stdev(ms)Rate(M/s) Per Row(ns) Relative > > Default: 10 157458 158623 > 1648 0.0 21778.6 1.0X > Rewrite: 10 28296 28367 > 100 0.33913.8 5.6X > Running benchmark: Benchmark rewrite GetJsonObjects > Running case: Default: 20 > Stopped after 2 iterations, 618089 ms > Running case: Rewrite: 20 > Stopped after 2 iterations,
[jira] [Assigned] (SPARK-42790) Abstract the excluded method for better test for JDBC docker tests.
[ https://issues.apache.org/jira/browse/SPARK-42790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42790: Assignee: Apache Spark > Abstract the excluded method for better test for JDBC docker tests. > --- > > Key: SPARK-42790 > URL: https://issues.apache.org/jira/browse/SPARK-42790 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: jiaan.geng >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42790) Abstract the excluded method for better test for JDBC docker tests.
[ https://issues.apache.org/jira/browse/SPARK-42790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42790: Assignee: (was: Apache Spark) > Abstract the excluded method for better test for JDBC docker tests. > --- > > Key: SPARK-42790 > URL: https://issues.apache.org/jira/browse/SPARK-42790 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: jiaan.geng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42790) Abstract the excluded method for better test for JDBC docker tests.
[ https://issues.apache.org/jira/browse/SPARK-42790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700220#comment-17700220 ] Apache Spark commented on SPARK-42790: -- User 'beliefer' has created a pull request for this issue: https://github.com/apache/spark/pull/40418 > Abstract the excluded method for better test for JDBC docker tests. > --- > > Key: SPARK-42790 > URL: https://issues.apache.org/jira/browse/SPARK-42790 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: jiaan.geng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42778) QueryStageExec should respect supportsRowBased
[ https://issues.apache.org/jira/browse/SPARK-42778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700214#comment-17700214 ] Apache Spark commented on SPARK-42778: -- User 'ulysses-you' has created a pull request for this issue: https://github.com/apache/spark/pull/40417 > QueryStageExec should respect supportsRowBased > -- > > Key: SPARK-42778 > URL: https://issues.apache.org/jira/browse/SPARK-42778 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: XiDuo You >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42786) Impl typed select in Dataset
[ https://issues.apache.org/jira/browse/SPARK-42786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42786: Assignee: Apache Spark > Impl typed select in Dataset > > > Key: SPARK-42786 > URL: https://issues.apache.org/jira/browse/SPARK-42786 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.4.0 >Reporter: Zhen Li >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42786) Impl typed select in Dataset
[ https://issues.apache.org/jira/browse/SPARK-42786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700106#comment-17700106 ] Apache Spark commented on SPARK-42786: -- User 'zhenlineo' has created a pull request for this issue: https://github.com/apache/spark/pull/40413 > Impl typed select in Dataset > > > Key: SPARK-42786 > URL: https://issues.apache.org/jira/browse/SPARK-42786 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.4.0 >Reporter: Zhen Li >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42786) Impl typed select in Dataset
[ https://issues.apache.org/jira/browse/SPARK-42786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42786: Assignee: (was: Apache Spark) > Impl typed select in Dataset > > > Key: SPARK-42786 > URL: https://issues.apache.org/jira/browse/SPARK-42786 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.4.0 >Reporter: Zhen Li >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42731) Update Spark Configuration
[ https://issues.apache.org/jira/browse/SPARK-42731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700101#comment-17700101 ] Apache Spark commented on SPARK-42731: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/40416 > Update Spark Configuration > -- > > Key: SPARK-42731 > URL: https://issues.apache.org/jira/browse/SPARK-42731 > Project: Spark > Issue Type: Sub-task > Components: Connect, Documentation >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > > https://spark.apache.org/docs/latest/configuration.html > Add a section for Spark Connect configurations -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42731) Update Spark Configuration
[ https://issues.apache.org/jira/browse/SPARK-42731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42731: Assignee: Apache Spark > Update Spark Configuration > -- > > Key: SPARK-42731 > URL: https://issues.apache.org/jira/browse/SPARK-42731 > Project: Spark > Issue Type: Sub-task > Components: Connect, Documentation >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Apache Spark >Priority: Major > > https://spark.apache.org/docs/latest/configuration.html > Add a section for Spark Connect configurations -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42731) Update Spark Configuration
[ https://issues.apache.org/jira/browse/SPARK-42731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42731: Assignee: (was: Apache Spark) > Update Spark Configuration > -- > > Key: SPARK-42731 > URL: https://issues.apache.org/jira/browse/SPARK-42731 > Project: Spark > Issue Type: Sub-task > Components: Connect, Documentation >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > > https://spark.apache.org/docs/latest/configuration.html > Add a section for Spark Connect configurations -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42785) [K8S][Core] When spark submit without --deploy-mode, will face NPE in Kubernetes Case
[ https://issues.apache.org/jira/browse/SPARK-42785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700096#comment-17700096 ] Apache Spark commented on SPARK-42785: -- User 'zwangsheng' has created a pull request for this issue: https://github.com/apache/spark/pull/40414 > [K8S][Core] When spark submit without --deploy-mode, will face NPE in > Kubernetes Case > - > > Key: SPARK-42785 > URL: https://issues.apache.org/jira/browse/SPARK-42785 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.3.2 >Reporter: binjie yang >Priority: Major > > According to this PR > [https://github.com/apache/spark/pull/37880#issuecomment-134890,] when > user spark submit without `--deploy-mode XXX` or `–conf > spark.submit.deployMode=`, may face NPE with this code > > args.deployMode.equals("client") > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42785) [K8S][Core] When spark submit without --deploy-mode, will face NPE in Kubernetes Case
[ https://issues.apache.org/jira/browse/SPARK-42785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42785: Assignee: (was: Apache Spark) > [K8S][Core] When spark submit without --deploy-mode, will face NPE in > Kubernetes Case > - > > Key: SPARK-42785 > URL: https://issues.apache.org/jira/browse/SPARK-42785 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.3.2 >Reporter: binjie yang >Priority: Major > > According to this PR > [https://github.com/apache/spark/pull/37880#issuecomment-134890,] when > user spark submit without `--deploy-mode XXX` or `–conf > spark.submit.deployMode=`, may face NPE with this code > > args.deployMode.equals("client") > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42785) [K8S][Core] When spark submit without --deploy-mode, will face NPE in Kubernetes Case
[ https://issues.apache.org/jira/browse/SPARK-42785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42785: Assignee: Apache Spark > [K8S][Core] When spark submit without --deploy-mode, will face NPE in > Kubernetes Case > - > > Key: SPARK-42785 > URL: https://issues.apache.org/jira/browse/SPARK-42785 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.3.2 >Reporter: binjie yang >Assignee: Apache Spark >Priority: Major > > According to this PR > [https://github.com/apache/spark/pull/37880#issuecomment-134890,] when > user spark submit without `--deploy-mode XXX` or `–conf > spark.submit.deployMode=`, may face NPE with this code > > args.deployMode.equals("client") > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42784) Fix the problem of incomplete creation of subdirectories in push merged localDir
[ https://issues.apache.org/jira/browse/SPARK-42784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42784: Assignee: (was: Apache Spark) > Fix the problem of incomplete creation of subdirectories in push merged > localDir > > > Key: SPARK-42784 > URL: https://issues.apache.org/jira/browse/SPARK-42784 > Project: Spark > Issue Type: Bug > Components: Shuffle, Spark Core >Affects Versions: 3.3.2 >Reporter: Fencheng Mei >Priority: Major > > After we massively enabled push-based shuffle in our production environment, > we found some warn messages appearing in the server-side log messages. > the warning log like: > ShuffleBlockPusher: Pushing block shufflePush_3_0_5352_935 to > BlockManagerId(shuffle-push-merger, zw06-data-hdp-dn08251.mt, 7337, None) > failed. > java.lang.RuntimeException: java.lang.RuntimeException: Cannot initialize > merged shuffle partition for appId application_1671244879475_44020960 > shuffleId 3 shuffleMergeId 0 reduceId 935. > After investigation, we identified the triggering mechanism of the bug。 > The driver requested two different containers on the same physical machine. > During the creation of the 'push-merged' directory in the first container > (container_1), the mergeDir was created first, then the subDir were created > based on the value of the "spark.diskStore.subDirectories" parameter. > However, the resources of container_1 were preempted during the creation of > the sub-directories, resulting in subDir not being created (only part of it > was created ). As the mergeDir still existed, the second container > (container_2) was unable to create further subDir (as it assumed that all > directories had already been created). > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org