from:"Apache Spark \(Jira\)"

[jira] [Assigned] (SPARK-42817) Spark driver logs are filled with Initializing service data for shuffle service using name

2023-03-15 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42817:


Assignee: Apache Spark

> Spark driver logs are filled with Initializing service data for shuffle 
> service using name
> --
>
> Key: SPARK-42817
> URL: https://issues.apache.org/jira/browse/SPARK-42817
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: Chandni Singh
>Assignee: Apache Spark
>Priority: Major
>
> With SPARK-34828, we added the ability to make the shuffle service name 
> configurable and we added a log 
> [here|https://github.com/apache/spark/blob/8860f69455e5a722626194c4797b4b42cccd4510/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ExecutorRunnable.scala#L118]
>  that will log the shuffle service name. However, this log is printed in the 
> driver logs whenever there is new executor launched and pollutes the log. 
> {code}
> 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for 
> shuffle service using name 'spark_shuffle_311'
> 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for 
> shuffle service using name 'spark_shuffle_311'
> 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for 
> shuffle service using name 'spark_shuffle_311'
> 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for 
> shuffle service using name 'spark_shuffle_311'
> 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for 
> shuffle service using name 'spark_shuffle_311'
> 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for 
> shuffle service using name 'spark_shuffle_311'
> 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for 
> shuffle service using name 'spark_shuffle_311'
> 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for 
> shuffle service using name 'spark_shuffle_311'
> 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for 
> shuffle service using name 'spark_shuffle_311'
> 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for 
> shuffle service using name 'spark_shuffle_311'
> 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for 
> shuffle service using name 'spark_shuffle_311'
> 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for 
> shuffle service using name 'spark_shuffle_311'
> 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for 
> shuffle service using name 'spark_shuffle_311'
> 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for 
> shuffle service using name 'spark_shuffle_311'
> 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for 
> shuffle service using name 'spark_shuffle_311'
> 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for 
> shuffle service using name 'spark_shuffle_311'
> 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for 
> shuffle service using name 'spark_shuffle_311'
> 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for 
> shuffle service using name 'spark_shuffle_311'
> 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for 
> shuffle service using name 'spark_shuffle_311'
> 22/08/03 20:42:07 INFO ExecutorRunnable: Initializing service data for 
> shuffle service using name 'spark_shuffle_311'
> {code}
> We can just log this once in the driver.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42816) Increase max message size to 128MB

2023-03-15 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42816:


Assignee: Apache Spark

> Increase max message size to 128MB
> --
>
> Key: SPARK-42816
> URL: https://issues.apache.org/jira/browse/SPARK-42816
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Assignee: Apache Spark
>Priority: Major
>
> Support messages up to 128MB



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42816) Increase max message size to 128MB

2023-03-15 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700782#comment-17700782
 ] 

Apache Spark commented on SPARK-42816:
--

User 'grundprinzip' has created a pull request for this issue:
https://github.com/apache/spark/pull/40447

> Increase max message size to 128MB
> --
>
> Key: SPARK-42816
> URL: https://issues.apache.org/jira/browse/SPARK-42816
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Priority: Major
>
> Support messages up to 128MB



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42816) Increase max message size to 128MB

2023-03-15 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700780#comment-17700780
 ] 

Apache Spark commented on SPARK-42816:
--

User 'grundprinzip' has created a pull request for this issue:
https://github.com/apache/spark/pull/40447

> Increase max message size to 128MB
> --
>
> Key: SPARK-42816
> URL: https://issues.apache.org/jira/browse/SPARK-42816
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Priority: Major
>
> Support messages up to 128MB



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42816) Increase max message size to 128MB

2023-03-15 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42816:


Assignee: (was: Apache Spark)

> Increase max message size to 128MB
> --
>
> Key: SPARK-42816
> URL: https://issues.apache.org/jira/browse/SPARK-42816
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Priority: Major
>
> Support messages up to 128MB



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42815) Subexpression elimination support shortcut conditional expression

2023-03-15 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700753#comment-17700753
 ] 

Apache Spark commented on SPARK-42815:
--

User 'ulysses-you' has created a pull request for this issue:
https://github.com/apache/spark/pull/40446

> Subexpression elimination support shortcut conditional expression
> -
>
> Key: SPARK-42815
> URL: https://issues.apache.org/jira/browse/SPARK-42815
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: XiDuo You
>Priority: Minor
>
> The subexpression in conditional expression may not need to eval even if it 
> appears more than once.
> e.g., `if(or(a, and(b, b)))`, the expression `b` would be skipped if `a` is 
> true.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42815) Subexpression elimination support shortcut conditional expression

2023-03-15 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700751#comment-17700751
 ] 

Apache Spark commented on SPARK-42815:
--

User 'ulysses-you' has created a pull request for this issue:
https://github.com/apache/spark/pull/40446

> Subexpression elimination support shortcut conditional expression
> -
>
> Key: SPARK-42815
> URL: https://issues.apache.org/jira/browse/SPARK-42815
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: XiDuo You
>Priority: Minor
>
> The subexpression in conditional expression may not need to eval even if it 
> appears more than once.
> e.g., `if(or(a, and(b, b)))`, the expression `b` would be skipped if `a` is 
> true.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42815) Subexpression elimination support shortcut conditional expression

2023-03-15 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42815:


Assignee: Apache Spark

> Subexpression elimination support shortcut conditional expression
> -
>
> Key: SPARK-42815
> URL: https://issues.apache.org/jira/browse/SPARK-42815
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: XiDuo You
>Assignee: Apache Spark
>Priority: Minor
>
> The subexpression in conditional expression may not need to eval even if it 
> appears more than once.
> e.g., `if(or(a, and(b, b)))`, the expression `b` would be skipped if `a` is 
> true.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42815) Subexpression elimination support shortcut conditional expression

2023-03-15 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42815:


Assignee: (was: Apache Spark)

> Subexpression elimination support shortcut conditional expression
> -
>
> Key: SPARK-42815
> URL: https://issues.apache.org/jira/browse/SPARK-42815
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: XiDuo You
>Priority: Minor
>
> The subexpression in conditional expression may not need to eval even if it 
> appears more than once.
> e.g., `if(or(a, and(b, b)))`, the expression `b` would be skipped if `a` is 
> true.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42814) Upgrade some maven-plugins

2023-03-15 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700736#comment-17700736
 ] 

Apache Spark commented on SPARK-42814:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/40445

> Upgrade some maven-plugins
> --
>
> Key: SPARK-42814
> URL: https://issues.apache.org/jira/browse/SPARK-42814
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Priority: Minor
>
> maven-enforcer-plugin 3.0.0-M2 -> 3.2.1
>  - [https://github.com/apache/maven-enforcer/releases/tag/enforcer-3.2.1]
>  - [https://github.com/apache/maven-enforcer/releases/tag/enforcer-3.1.0]
> build-helper-maven-plugin 3.2.0 -> 3.3.0
>  - 
> [https://github.com/mojohaus/build-helper-maven-plugin/releases/tag/build-helper-maven-plugin-3.3.0]
> maven-compiler-plugin 3.10.1 -> 3.11.0
>  - 
> [https://github.com/apache/maven-compiler-plugin/releases/tag/maven-compiler-plugin-3.11.0]
> maven-surefire-plugin 3.0.0-M9 -> 3.0.0
>  - [https://github.com/apache/maven-surefire/releases/tag/surefire-3.0.0]
> maven-javadoc-plugin 3.4.1 -> 3.5.0
>  - 
> [https://github.com/apache/maven-javadoc-plugin/releases/tag/maven-javadoc-plugin-3.5.0]
> maven-deploy-plugin 3.0.0 -> 3.1.0
>  - 
> [https://github.com/apache/maven-deploy-plugin/releases/tag/maven-deploy-plugin-3.1.0]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42814) Upgrade some maven-plugins

2023-03-15 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42814:


Assignee: (was: Apache Spark)

> Upgrade some maven-plugins
> --
>
> Key: SPARK-42814
> URL: https://issues.apache.org/jira/browse/SPARK-42814
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Priority: Minor
>
> maven-enforcer-plugin 3.0.0-M2 -> 3.2.1
>  - [https://github.com/apache/maven-enforcer/releases/tag/enforcer-3.2.1]
>  - [https://github.com/apache/maven-enforcer/releases/tag/enforcer-3.1.0]
> build-helper-maven-plugin 3.2.0 -> 3.3.0
>  - 
> [https://github.com/mojohaus/build-helper-maven-plugin/releases/tag/build-helper-maven-plugin-3.3.0]
> maven-compiler-plugin 3.10.1 -> 3.11.0
>  - 
> [https://github.com/apache/maven-compiler-plugin/releases/tag/maven-compiler-plugin-3.11.0]
> maven-surefire-plugin 3.0.0-M9 -> 3.0.0
>  - [https://github.com/apache/maven-surefire/releases/tag/surefire-3.0.0]
> maven-javadoc-plugin 3.4.1 -> 3.5.0
>  - 
> [https://github.com/apache/maven-javadoc-plugin/releases/tag/maven-javadoc-plugin-3.5.0]
> maven-deploy-plugin 3.0.0 -> 3.1.0
>  - 
> [https://github.com/apache/maven-deploy-plugin/releases/tag/maven-deploy-plugin-3.1.0]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42814) Upgrade some maven-plugins

2023-03-15 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42814:


Assignee: Apache Spark

> Upgrade some maven-plugins
> --
>
> Key: SPARK-42814
> URL: https://issues.apache.org/jira/browse/SPARK-42814
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Minor
>
> maven-enforcer-plugin 3.0.0-M2 -> 3.2.1
>  - [https://github.com/apache/maven-enforcer/releases/tag/enforcer-3.2.1]
>  - [https://github.com/apache/maven-enforcer/releases/tag/enforcer-3.1.0]
> build-helper-maven-plugin 3.2.0 -> 3.3.0
>  - 
> [https://github.com/mojohaus/build-helper-maven-plugin/releases/tag/build-helper-maven-plugin-3.3.0]
> maven-compiler-plugin 3.10.1 -> 3.11.0
>  - 
> [https://github.com/apache/maven-compiler-plugin/releases/tag/maven-compiler-plugin-3.11.0]
> maven-surefire-plugin 3.0.0-M9 -> 3.0.0
>  - [https://github.com/apache/maven-surefire/releases/tag/surefire-3.0.0]
> maven-javadoc-plugin 3.4.1 -> 3.5.0
>  - 
> [https://github.com/apache/maven-javadoc-plugin/releases/tag/maven-javadoc-plugin-3.5.0]
> maven-deploy-plugin 3.0.0 -> 3.1.0
>  - 
> [https://github.com/apache/maven-deploy-plugin/releases/tag/maven-deploy-plugin-3.1.0]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42813) Print application info when waitAppCompletion is false

2023-03-15 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700705#comment-17700705
 ] 

Apache Spark commented on SPARK-42813:
--

User 'pan3793' has created a pull request for this issue:
https://github.com/apache/spark/pull/40444

> Print application info when waitAppCompletion is false
> --
>
> Key: SPARK-42813
> URL: https://issues.apache.org/jira/browse/SPARK-42813
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.3.2
>Reporter: Cheng Pan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42813) Print application info when waitAppCompletion is false

2023-03-15 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42813:


Assignee: Apache Spark

> Print application info when waitAppCompletion is false
> --
>
> Key: SPARK-42813
> URL: https://issues.apache.org/jira/browse/SPARK-42813
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.3.2
>Reporter: Cheng Pan
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42813) Print application info when waitAppCompletion is false

2023-03-15 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700704#comment-17700704
 ] 

Apache Spark commented on SPARK-42813:
--

User 'pan3793' has created a pull request for this issue:
https://github.com/apache/spark/pull/40444

> Print application info when waitAppCompletion is false
> --
>
> Key: SPARK-42813
> URL: https://issues.apache.org/jira/browse/SPARK-42813
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.3.2
>Reporter: Cheng Pan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42813) Print application info when waitAppCompletion is false

2023-03-15 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42813:


Assignee: (was: Apache Spark)

> Print application info when waitAppCompletion is false
> --
>
> Key: SPARK-42813
> URL: https://issues.apache.org/jira/browse/SPARK-42813
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.3.2
>Reporter: Cheng Pan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42812) client_type is missing from AddArtifactsRequest proto message

2023-03-15 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700682#comment-17700682
 ] 

Apache Spark commented on SPARK-42812:
--

User 'vicennial' has created a pull request for this issue:
https://github.com/apache/spark/pull/40443

> client_type is missing from AddArtifactsRequest proto message
> -
>
> Key: SPARK-42812
> URL: https://issues.apache.org/jira/browse/SPARK-42812
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Venkata Sai Akhil Gudesa
>Priority: Major
>
> The client_type is missing from AddArtifactsRequest proto message



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42812) client_type is missing from AddArtifactsRequest proto message

2023-03-15 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42812:


Assignee: Apache Spark

> client_type is missing from AddArtifactsRequest proto message
> -
>
> Key: SPARK-42812
> URL: https://issues.apache.org/jira/browse/SPARK-42812
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Venkata Sai Akhil Gudesa
>Assignee: Apache Spark
>Priority: Major
>
> The client_type is missing from AddArtifactsRequest proto message



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42812) client_type is missing from AddArtifactsRequest proto message

2023-03-15 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42812:


Assignee: (was: Apache Spark)

> client_type is missing from AddArtifactsRequest proto message
> -
>
> Key: SPARK-42812
> URL: https://issues.apache.org/jira/browse/SPARK-42812
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Venkata Sai Akhil Gudesa
>Priority: Major
>
> The client_type is missing from AddArtifactsRequest proto message



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42812) client_type is missing from AddArtifactsRequest proto message

2023-03-15 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700681#comment-17700681
 ] 

Apache Spark commented on SPARK-42812:
--

User 'vicennial' has created a pull request for this issue:
https://github.com/apache/spark/pull/40443

> client_type is missing from AddArtifactsRequest proto message
> -
>
> Key: SPARK-42812
> URL: https://issues.apache.org/jira/browse/SPARK-42812
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Venkata Sai Akhil Gudesa
>Priority: Major
>
> The client_type is missing from AddArtifactsRequest proto message



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42809) Upgrade scala-maven-plugin from 4.8.0 to 4.8.1

2023-03-15 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42809:


Assignee: Apache Spark

> Upgrade scala-maven-plugin from 4.8.0 to 4.8.1
> --
>
> Key: SPARK-42809
> URL: https://issues.apache.org/jira/browse/SPARK-42809
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Assignee: Apache Spark
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42809) Upgrade scala-maven-plugin from 4.8.0 to 4.8.1

2023-03-15 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42809:


Assignee: (was: Apache Spark)

> Upgrade scala-maven-plugin from 4.8.0 to 4.8.1
> --
>
> Key: SPARK-42809
> URL: https://issues.apache.org/jira/browse/SPARK-42809
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42808) Avoid getting availableProcessors every time in MapOutputTrackerMaster#getStatistics

2023-03-15 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42808:


Assignee: Apache Spark

> Avoid getting availableProcessors every time in 
> MapOutputTrackerMaster#getStatistics
> 
>
> Key: SPARK-42808
> URL: https://issues.apache.org/jira/browse/SPARK-42808
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.2
>Reporter: dzcxzl
>Assignee: Apache Spark
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42808) Avoid getting availableProcessors every time in MapOutputTrackerMaster#getStatistics

2023-03-15 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42808:


Assignee: (was: Apache Spark)

> Avoid getting availableProcessors every time in 
> MapOutputTrackerMaster#getStatistics
> 
>
> Key: SPARK-42808
> URL: https://issues.apache.org/jira/browse/SPARK-42808
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.2
>Reporter: dzcxzl
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42808) Avoid getting availableProcessors every time in MapOutputTrackerMaster#getStatistics

2023-03-15 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700617#comment-17700617
 ] 

Apache Spark commented on SPARK-42808:
--

User 'cxzl25' has created a pull request for this issue:
https://github.com/apache/spark/pull/40440

> Avoid getting availableProcessors every time in 
> MapOutputTrackerMaster#getStatistics
> 
>
> Key: SPARK-42808
> URL: https://issues.apache.org/jira/browse/SPARK-42808
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.2
>Reporter: dzcxzl
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42809) Upgrade scala-maven-plugin from 4.8.0 to 4.8.1

2023-03-15 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700619#comment-17700619
 ] 

Apache Spark commented on SPARK-42809:
--

User 'panbingkun' has created a pull request for this issue:
https://github.com/apache/spark/pull/40442

> Upgrade scala-maven-plugin from 4.8.0 to 4.8.1
> --
>
> Key: SPARK-42809
> URL: https://issues.apache.org/jira/browse/SPARK-42809
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42807) Apply custom log URL pattern for yarn-client AM log URL in SHS

2023-03-15 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700609#comment-17700609
 ] 

Apache Spark commented on SPARK-42807:
--

User 'cxzl25' has created a pull request for this issue:
https://github.com/apache/spark/pull/40439

> Apply custom log URL pattern for yarn-client AM log URL in SHS
> --
>
> Key: SPARK-42807
> URL: https://issues.apache.org/jira/browse/SPARK-42807
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.2
>Reporter: dzcxzl
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42807) Apply custom log URL pattern for yarn-client AM log URL in SHS

2023-03-15 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700608#comment-17700608
 ] 

Apache Spark commented on SPARK-42807:
--

User 'cxzl25' has created a pull request for this issue:
https://github.com/apache/spark/pull/40439

> Apply custom log URL pattern for yarn-client AM log URL in SHS
> --
>
> Key: SPARK-42807
> URL: https://issues.apache.org/jira/browse/SPARK-42807
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.2
>Reporter: dzcxzl
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42807) Apply custom log URL pattern for yarn-client AM log URL in SHS

2023-03-15 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42807:


Assignee: (was: Apache Spark)

> Apply custom log URL pattern for yarn-client AM log URL in SHS
> --
>
> Key: SPARK-42807
> URL: https://issues.apache.org/jira/browse/SPARK-42807
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.2
>Reporter: dzcxzl
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42807) Apply custom log URL pattern for yarn-client AM log URL in SHS

2023-03-15 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42807:


Assignee: Apache Spark

> Apply custom log URL pattern for yarn-client AM log URL in SHS
> --
>
> Key: SPARK-42807
> URL: https://issues.apache.org/jira/browse/SPARK-42807
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.2
>Reporter: dzcxzl
>Assignee: Apache Spark
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42806) Add Catalog

2023-03-15 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42806:


Assignee: (was: Apache Spark)

> Add Catalog
> ---
>
> Key: SPARK-42806
> URL: https://issues.apache.org/jira/browse/SPARK-42806
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42806) Add Catalog

2023-03-15 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700599#comment-17700599
 ] 

Apache Spark commented on SPARK-42806:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/40438

> Add Catalog
> ---
>
> Key: SPARK-42806
> URL: https://issues.apache.org/jira/browse/SPARK-42806
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42806) Add Catalog

2023-03-15 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42806:


Assignee: Apache Spark

> Add Catalog
> ---
>
> Key: SPARK-42806
> URL: https://issues.apache.org/jira/browse/SPARK-42806
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41259) Spark-sql cli query results should correspond to schema

2023-03-15 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700580#comment-17700580
 ] 

Apache Spark commented on SPARK-41259:
--

User 'Yikf' has created a pull request for this issue:
https://github.com/apache/spark/pull/40437

> Spark-sql cli query results should correspond to schema
> ---
>
> Key: SPARK-41259
> URL: https://issues.apache.org/jira/browse/SPARK-41259
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: yikaifei
>Priority: Minor
> Fix For: 3.4.0
>
>
> When using the spark-sql cli, Spark outputs only one column in the `show 
> tables` and `show views` commands to be compatible with Hive output, but the 
> output schema is still the three columns of Spark



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42619) Add `show_counts` parameter for DataFrame.info

2023-03-15 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700563#comment-17700563
 ] 

Apache Spark commented on SPARK-42619:
--

User 'dzhigimont' has created a pull request for this issue:
https://github.com/apache/spark/pull/40436

> Add `show_counts` parameter for DataFrame.info
> --
>
> Key: SPARK-42619
> URL: https://issues.apache.org/jira/browse/SPARK-42619
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Priority: Major
>
> See https://github.com/pandas-dev/pandas/pull/37999



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42619) Add `show_counts` parameter for DataFrame.info

2023-03-15 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42619:


Assignee: Apache Spark

> Add `show_counts` parameter for DataFrame.info
> --
>
> Key: SPARK-42619
> URL: https://issues.apache.org/jira/browse/SPARK-42619
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Assignee: Apache Spark
>Priority: Major
>
> See https://github.com/pandas-dev/pandas/pull/37999



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42619) Add `show_counts` parameter for DataFrame.info

2023-03-15 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700562#comment-17700562
 ] 

Apache Spark commented on SPARK-42619:
--

User 'dzhigimont' has created a pull request for this issue:
https://github.com/apache/spark/pull/40436

> Add `show_counts` parameter for DataFrame.info
> --
>
> Key: SPARK-42619
> URL: https://issues.apache.org/jira/browse/SPARK-42619
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Priority: Major
>
> See https://github.com/pandas-dev/pandas/pull/37999



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42619) Add `show_counts` parameter for DataFrame.info

2023-03-15 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42619:


Assignee: (was: Apache Spark)

> Add `show_counts` parameter for DataFrame.info
> --
>
> Key: SPARK-42619
> URL: https://issues.apache.org/jira/browse/SPARK-42619
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Priority: Major
>
> See https://github.com/pandas-dev/pandas/pull/37999



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42496) Introducing Spark Connect on the main page and adding Spark Connect Overview page

2023-03-15 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700539#comment-17700539
 ] 

Apache Spark commented on SPARK-42496:
--

User 'allanf-db' has created a pull request for this issue:
https://github.com/apache/spark/pull/40435

> Introducing Spark Connect on the main page and adding Spark Connect Overview 
> page
> -
>
> Key: SPARK-42496
> URL: https://issues.apache.org/jira/browse/SPARK-42496
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Documentation
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Allan Folting
>Priority: Major
> Fix For: 3.4.1
>
>
> We should document the introduction of Spark Connect at PySpark main 
> documentation page to give a summary to users.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42803) Use getParameterCount function instead of getParameterTypes.length

2023-03-15 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42803:


Assignee: (was: Apache Spark)

> Use getParameterCount function instead of getParameterTypes.length
> --
>
> Key: SPARK-42803
> URL: https://issues.apache.org/jira/browse/SPARK-42803
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, Spark Core, SQL
>Affects Versions: 3.3.3
>Reporter: Narek Karapetian
>Priority: Minor
> Fix For: 3.3.2
>
>
> Since jdk1.8 there is an additional function in reflection API 
> {{{}getParameterCount{}}}, it is better to use that function instead of 
> {{getParameterTypes.length}} because {{getParameterTypes}} function makes a 
> copy of the parameter types array every invocation.
> This will help to avoid redundant arrays creation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42803) Use getParameterCount function instead of getParameterTypes.length

2023-03-15 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700529#comment-17700529
 ] 

Apache Spark commented on SPARK-42803:
--

User 'NarekDW' has created a pull request for this issue:
https://github.com/apache/spark/pull/40422

> Use getParameterCount function instead of getParameterTypes.length
> --
>
> Key: SPARK-42803
> URL: https://issues.apache.org/jira/browse/SPARK-42803
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, Spark Core, SQL
>Affects Versions: 3.3.3
>Reporter: Narek Karapetian
>Priority: Minor
> Fix For: 3.3.2
>
>
> Since jdk1.8 there is an additional function in reflection API 
> {{{}getParameterCount{}}}, it is better to use that function instead of 
> {{getParameterTypes.length}} because {{getParameterTypes}} function makes a 
> copy of the parameter types array every invocation.
> This will help to avoid redundant arrays creation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42803) Use getParameterCount function instead of getParameterTypes.length

2023-03-15 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42803:


Assignee: Apache Spark

> Use getParameterCount function instead of getParameterTypes.length
> --
>
> Key: SPARK-42803
> URL: https://issues.apache.org/jira/browse/SPARK-42803
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, Spark Core, SQL
>Affects Versions: 3.3.3
>Reporter: Narek Karapetian
>Assignee: Apache Spark
>Priority: Minor
> Fix For: 3.3.2
>
>
> Since jdk1.8 there is an additional function in reflection API 
> {{{}getParameterCount{}}}, it is better to use that function instead of 
> {{getParameterTypes.length}} because {{getParameterTypes}} function makes a 
> copy of the parameter types array every invocation.
> This will help to avoid redundant arrays creation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42801) Fix Flaky ClientE2ETestSuite

2023-03-15 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42801:


Assignee: (was: Apache Spark)

> Fix Flaky ClientE2ETestSuite
> 
>
> Key: SPARK-42801
> URL: https://issues.apache.org/jira/browse/SPARK-42801
> Project: Spark
>  Issue Type: Bug
>  Components: Connect, Tests
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42801) Fix Flaky ClientE2ETestSuite

2023-03-15 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42801:


Assignee: Apache Spark

> Fix Flaky ClientE2ETestSuite
> 
>
> Key: SPARK-42801
> URL: https://issues.apache.org/jira/browse/SPARK-42801
> Project: Spark
>  Issue Type: Bug
>  Components: Connect, Tests
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42801) Fix Flaky ClientE2ETestSuite

2023-03-15 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700513#comment-17700513
 ] 

Apache Spark commented on SPARK-42801:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/40434

> Fix Flaky ClientE2ETestSuite
> 
>
> Key: SPARK-42801
> URL: https://issues.apache.org/jira/browse/SPARK-42801
> Project: Spark
>  Issue Type: Bug
>  Components: Connect, Tests
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42706) Document the Spark SQL error classes in user-facing documentation.

2023-03-14 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700505#comment-17700505
 ] 

Apache Spark commented on SPARK-42706:
--

User 'itholic' has created a pull request for this issue:
https://github.com/apache/spark/pull/40433

> Document the Spark SQL error classes in user-facing documentation.
> --
>
> Key: SPARK-42706
> URL: https://issues.apache.org/jira/browse/SPARK-42706
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
> Fix For: 3.5.0
>
>
> We need to have an error class list to user facing documents.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42800) Implement ml function {array_to_vector, vector_to_array}

2023-03-14 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42800:


Assignee: Apache Spark

> Implement ml function {array_to_vector, vector_to_array}
> 
>
> Key: SPARK-42800
> URL: https://issues.apache.org/jira/browse/SPARK-42800
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, ML, PySpark
>Affects Versions: 3.5.0
>Reporter: Ruifeng Zheng
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42800) Implement ml function {array_to_vector, vector_to_array}

2023-03-14 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42800:


Assignee: (was: Apache Spark)

> Implement ml function {array_to_vector, vector_to_array}
> 
>
> Key: SPARK-42800
> URL: https://issues.apache.org/jira/browse/SPARK-42800
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, ML, PySpark
>Affects Versions: 3.5.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42800) Implement ml function {array_to_vector, vector_to_array}

2023-03-14 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700496#comment-17700496
 ] 

Apache Spark commented on SPARK-42800:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/40432

> Implement ml function {array_to_vector, vector_to_array}
> 
>
> Key: SPARK-42800
> URL: https://issues.apache.org/jira/browse/SPARK-42800
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, ML, PySpark
>Affects Versions: 3.5.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42799) Update SBT build `xercesImpl` version to match with pom.xml

2023-03-14 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42799:


Assignee: Apache Spark

> Update SBT build `xercesImpl` version to match with pom.xml
> ---
>
> Key: SPARK-42799
> URL: https://issues.apache.org/jira/browse/SPARK-42799
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.2.2
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42799) Update SBT build `xercesImpl` version to match with pom.xml

2023-03-14 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42799:


Assignee: (was: Apache Spark)

> Update SBT build `xercesImpl` version to match with pom.xml
> ---
>
> Key: SPARK-42799
> URL: https://issues.apache.org/jira/browse/SPARK-42799
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.2.2
>Reporter: Dongjoon Hyun
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42799) Update SBT build `xercesImpl` version to match with pom.xml

2023-03-14 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700492#comment-17700492
 ] 

Apache Spark commented on SPARK-42799:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/40431

> Update SBT build `xercesImpl` version to match with pom.xml
> ---
>
> Key: SPARK-42799
> URL: https://issues.apache.org/jira/browse/SPARK-42799
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.2.2
>Reporter: Dongjoon Hyun
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42798) Upgrade protobuf-java to 3.22.2

2023-03-14 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42798:


Assignee: Apache Spark

> Upgrade protobuf-java to 3.22.2
> ---
>
> Key: SPARK-42798
> URL: https://issues.apache.org/jira/browse/SPARK-42798
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Major
>
> * [https://github.com/protocolbuffers/protobuf/releases/tag/v22.1]
>  * [https://github.com/protocolbuffers/protobuf/releases/tag/v22.2]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42798) Upgrade protobuf-java to 3.22.2

2023-03-14 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42798:


Assignee: (was: Apache Spark)

> Upgrade protobuf-java to 3.22.2
> ---
>
> Key: SPARK-42798
> URL: https://issues.apache.org/jira/browse/SPARK-42798
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Priority: Major
>
> * [https://github.com/protocolbuffers/protobuf/releases/tag/v22.1]
>  * [https://github.com/protocolbuffers/protobuf/releases/tag/v22.2]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42798) Upgrade protobuf-java to 3.22.2

2023-03-14 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700484#comment-17700484
 ] 

Apache Spark commented on SPARK-42798:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/40430

> Upgrade protobuf-java to 3.22.2
> ---
>
> Key: SPARK-42798
> URL: https://issues.apache.org/jira/browse/SPARK-42798
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Priority: Major
>
> * [https://github.com/protocolbuffers/protobuf/releases/tag/v22.1]
>  * [https://github.com/protocolbuffers/protobuf/releases/tag/v22.2]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42775) approx_percentile produces wrong results for large decimals.

2023-03-14 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700448#comment-17700448
 ] 

Apache Spark commented on SPARK-42775:
--

User 'chenhao-db' has created a pull request for this issue:
https://github.com/apache/spark/pull/40429

> approx_percentile produces wrong results for large decimals.
> 
>
> Key: SPARK-42775
> URL: https://issues.apache.org/jira/browse/SPARK-42775
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0, 2.2.0, 2.3.0, 2.4.0, 3.0.0, 3.1.0, 3.2.0, 3.3.0, 
> 3.4.0
>Reporter: Chenhao Li
>Priority: Major
>
> In the {{approx_percentile}} expression, Spark casts decimal to double to 
> update the aggregation state 
> ([ApproximatePercentile.scala#L181|https://github.com/apache/spark/blob/933dc0c42f0caf74aaa077fd4f2c2e7208452b9b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala#L181])
>  and casts the result double back to decimal 
> ([ApproximatePercentile.scala#L206|https://github.com/apache/spark/blob/933dc0c42f0caf74aaa077fd4f2c2e7208452b9b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala#L206]).
>  The precision loss in the casts can make the result decimal out of its 
> precision range. This can lead to the following counter-intuitive results:
> {code:sql}
> spark-sql> select approx_percentile(col, 0.5) from values 
> (999) as tab(col);
> NULL
> spark-sql> select approx_percentile(col, 0.5) is null from values 
> (999) as tab(col);
> false
> spark-sql> select cast(approx_percentile(col, 0.5) as string) from values 
> (999) as tab(col);
> 1000
> spark-sql> desc select approx_percentile(col, 0.5) from values 
> (999) as tab(col);
> approx_percentile(col, 0.5, 1)decimal(19,0) 
> {code}
> The result is actually not null, so the second query returns false. The first 
> query returns null because the result cannot fit into {{{}decimal(19, 0){}}}.
> A suggested fix is to use {{Decimal.changePrecision}} here to ensure the 
> result fits, and really returns a null or throws an exception when the result 
> doesn't fit.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42775) approx_percentile produces wrong results for large decimals.

2023-03-14 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42775:


Assignee: (was: Apache Spark)

> approx_percentile produces wrong results for large decimals.
> 
>
> Key: SPARK-42775
> URL: https://issues.apache.org/jira/browse/SPARK-42775
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0, 2.2.0, 2.3.0, 2.4.0, 3.0.0, 3.1.0, 3.2.0, 3.3.0, 
> 3.4.0
>Reporter: Chenhao Li
>Priority: Major
>
> In the {{approx_percentile}} expression, Spark casts decimal to double to 
> update the aggregation state 
> ([ApproximatePercentile.scala#L181|https://github.com/apache/spark/blob/933dc0c42f0caf74aaa077fd4f2c2e7208452b9b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala#L181])
>  and casts the result double back to decimal 
> ([ApproximatePercentile.scala#L206|https://github.com/apache/spark/blob/933dc0c42f0caf74aaa077fd4f2c2e7208452b9b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala#L206]).
>  The precision loss in the casts can make the result decimal out of its 
> precision range. This can lead to the following counter-intuitive results:
> {code:sql}
> spark-sql> select approx_percentile(col, 0.5) from values 
> (999) as tab(col);
> NULL
> spark-sql> select approx_percentile(col, 0.5) is null from values 
> (999) as tab(col);
> false
> spark-sql> select cast(approx_percentile(col, 0.5) as string) from values 
> (999) as tab(col);
> 1000
> spark-sql> desc select approx_percentile(col, 0.5) from values 
> (999) as tab(col);
> approx_percentile(col, 0.5, 1)decimal(19,0) 
> {code}
> The result is actually not null, so the second query returns false. The first 
> query returns null because the result cannot fit into {{{}decimal(19, 0){}}}.
> A suggested fix is to use {{Decimal.changePrecision}} here to ensure the 
> result fits, and really returns a null or throws an exception when the result 
> doesn't fit.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42775) approx_percentile produces wrong results for large decimals.

2023-03-14 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42775:


Assignee: Apache Spark

> approx_percentile produces wrong results for large decimals.
> 
>
> Key: SPARK-42775
> URL: https://issues.apache.org/jira/browse/SPARK-42775
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0, 2.2.0, 2.3.0, 2.4.0, 3.0.0, 3.1.0, 3.2.0, 3.3.0, 
> 3.4.0
>Reporter: Chenhao Li
>Assignee: Apache Spark
>Priority: Major
>
> In the {{approx_percentile}} expression, Spark casts decimal to double to 
> update the aggregation state 
> ([ApproximatePercentile.scala#L181|https://github.com/apache/spark/blob/933dc0c42f0caf74aaa077fd4f2c2e7208452b9b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala#L181])
>  and casts the result double back to decimal 
> ([ApproximatePercentile.scala#L206|https://github.com/apache/spark/blob/933dc0c42f0caf74aaa077fd4f2c2e7208452b9b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala#L206]).
>  The precision loss in the casts can make the result decimal out of its 
> precision range. This can lead to the following counter-intuitive results:
> {code:sql}
> spark-sql> select approx_percentile(col, 0.5) from values 
> (999) as tab(col);
> NULL
> spark-sql> select approx_percentile(col, 0.5) is null from values 
> (999) as tab(col);
> false
> spark-sql> select cast(approx_percentile(col, 0.5) as string) from values 
> (999) as tab(col);
> 1000
> spark-sql> desc select approx_percentile(col, 0.5) from values 
> (999) as tab(col);
> approx_percentile(col, 0.5, 1)decimal(19,0) 
> {code}
> The result is actually not null, so the second query returns false. The first 
> query returns null because the result cannot fit into {{{}decimal(19, 0){}}}.
> A suggested fix is to use {{Decimal.changePrecision}} here to ensure the 
> result fits, and really returns a null or throws an exception when the result 
> doesn't fit.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42797) Spark Connect - Grammatical improvements to Spark Overview and Spark Connect Overview doc pages

2023-03-14 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42797:


Assignee: Apache Spark

> Spark Connect - Grammatical improvements to Spark Overview and Spark Connect 
> Overview doc pages
> ---
>
> Key: SPARK-42797
> URL: https://issues.apache.org/jira/browse/SPARK-42797
> Project: Spark
>  Issue Type: Documentation
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Allan Folting
>Assignee: Apache Spark
>Priority: Major
>
> Grammatical improvements, this is a follow-up to this ticket:
> Introducing Spark Connect on the main page and adding Spark Connect Overview 
> page
> https://issues.apache.org/jira/browse/SPARK-42496



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42797) Spark Connect - Grammatical improvements to Spark Overview and Spark Connect Overview doc pages

2023-03-14 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700444#comment-17700444
 ] 

Apache Spark commented on SPARK-42797:
--

User 'allanf-db' has created a pull request for this issue:
https://github.com/apache/spark/pull/40428

> Spark Connect - Grammatical improvements to Spark Overview and Spark Connect 
> Overview doc pages
> ---
>
> Key: SPARK-42797
> URL: https://issues.apache.org/jira/browse/SPARK-42797
> Project: Spark
>  Issue Type: Documentation
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Allan Folting
>Priority: Major
>
> Grammatical improvements, this is a follow-up to this ticket:
> Introducing Spark Connect on the main page and adding Spark Connect Overview 
> page
> https://issues.apache.org/jira/browse/SPARK-42496



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42797) Spark Connect - Grammatical improvements to Spark Overview and Spark Connect Overview doc pages

2023-03-14 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42797:


Assignee: (was: Apache Spark)

> Spark Connect - Grammatical improvements to Spark Overview and Spark Connect 
> Overview doc pages
> ---
>
> Key: SPARK-42797
> URL: https://issues.apache.org/jira/browse/SPARK-42797
> Project: Spark
>  Issue Type: Documentation
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Allan Folting
>Priority: Major
>
> Grammatical improvements, this is a follow-up to this ticket:
> Introducing Spark Connect on the main page and adding Spark Connect Overview 
> page
> https://issues.apache.org/jira/browse/SPARK-42496



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42792) Add support to track FLUSH_WRITE_BYTES for RocksDB state store for streaming

2023-03-14 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700413#comment-17700413
 ] 

Apache Spark commented on SPARK-42792:
--

User 'anishshri-db' has created a pull request for this issue:
https://github.com/apache/spark/pull/40427

> Add support to track FLUSH_WRITE_BYTES for RocksDB state store for streaming
> 
>
> Key: SPARK-42792
> URL: https://issues.apache.org/jira/browse/SPARK-42792
> Project: Spark
>  Issue Type: Task
>  Components: Structured Streaming
>Affects Versions: 3.4.0
>Reporter: Anish Shrigondekar
>Priority: Major
>
> Add support to track FLUSH_WRITE_BYTES for RocksDB state store for streaming
>  
> Its useful to get this metric for bytes written during flush from RocksDB as 
> part of the DB custom metrics. We propose to add this to the existing metrics 
> that are collected. There is no additional overhead since we are just 
> querying the internal ticker guage, similar to other metrics.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42792) Add support to track FLUSH_WRITE_BYTES for RocksDB state store for streaming

2023-03-14 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42792:


Assignee: (was: Apache Spark)

> Add support to track FLUSH_WRITE_BYTES for RocksDB state store for streaming
> 
>
> Key: SPARK-42792
> URL: https://issues.apache.org/jira/browse/SPARK-42792
> Project: Spark
>  Issue Type: Task
>  Components: Structured Streaming
>Affects Versions: 3.4.0
>Reporter: Anish Shrigondekar
>Priority: Major
>
> Add support to track FLUSH_WRITE_BYTES for RocksDB state store for streaming
>  
> Its useful to get this metric for bytes written during flush from RocksDB as 
> part of the DB custom metrics. We propose to add this to the existing metrics 
> that are collected. There is no additional overhead since we are just 
> querying the internal ticker guage, similar to other metrics.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42792) Add support to track FLUSH_WRITE_BYTES for RocksDB state store for streaming

2023-03-14 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42792:


Assignee: Apache Spark

> Add support to track FLUSH_WRITE_BYTES for RocksDB state store for streaming
> 
>
> Key: SPARK-42792
> URL: https://issues.apache.org/jira/browse/SPARK-42792
> Project: Spark
>  Issue Type: Task
>  Components: Structured Streaming
>Affects Versions: 3.4.0
>Reporter: Anish Shrigondekar
>Assignee: Apache Spark
>Priority: Major
>
> Add support to track FLUSH_WRITE_BYTES for RocksDB state store for streaming
>  
> Its useful to get this metric for bytes written during flush from RocksDB as 
> part of the DB custom metrics. We propose to add this to the existing metrics 
> that are collected. There is no additional overhead since we are just 
> querying the internal ticker guage, similar to other metrics.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42796) Support TimestampNTZ in Cached Batch

2023-03-14 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700414#comment-17700414
 ] 

Apache Spark commented on SPARK-42796:
--

User 'gengliangwang' has created a pull request for this issue:
https://github.com/apache/spark/pull/40426

> Support TimestampNTZ in Cached Batch
> 
>
> Key: SPARK-42796
> URL: https://issues.apache.org/jira/browse/SPARK-42796
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.1
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42796) Support TimestampNTZ in Cached Batch

2023-03-14 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42796:


Assignee: Apache Spark  (was: Gengliang Wang)

> Support TimestampNTZ in Cached Batch
> 
>
> Key: SPARK-42796
> URL: https://issues.apache.org/jira/browse/SPARK-42796
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.1
>Reporter: Gengliang Wang
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42796) Support TimestampNTZ in Cached Batch

2023-03-14 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700412#comment-17700412
 ] 

Apache Spark commented on SPARK-42796:
--

User 'gengliangwang' has created a pull request for this issue:
https://github.com/apache/spark/pull/40426

> Support TimestampNTZ in Cached Batch
> 
>
> Key: SPARK-42796
> URL: https://issues.apache.org/jira/browse/SPARK-42796
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.1
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42796) Support TimestampNTZ in Cached Batch

2023-03-14 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42796:


Assignee: Gengliang Wang  (was: Apache Spark)

> Support TimestampNTZ in Cached Batch
> 
>
> Key: SPARK-42796
> URL: https://issues.apache.org/jira/browse/SPARK-42796
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.1
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42794) Increase the lockAcquireTimeoutMs for acquiring the RocksDB state store in Structure Streaming

2023-03-14 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700402#comment-17700402
 ] 

Apache Spark commented on SPARK-42794:
--

User 'huanliwang-db' has created a pull request for this issue:
https://github.com/apache/spark/pull/40425

> Increase the lockAcquireTimeoutMs for acquiring the RocksDB state store in 
> Structure Streaming
> --
>
> Key: SPARK-42794
> URL: https://issues.apache.org/jira/browse/SPARK-42794
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Huanli Wang
>Priority: Minor
>
> We are seeing query failure which is caused by RocksDB acquisition failure 
> for the retry tasks.
>  *  at t1, we shrink the cluster to only have one executor
> {code:java}
> 23/03/05 22:47:21 INFO StandaloneAppClient$ClientEndpoint: Executor updated: 
> app-20230305224215-/2 is now DECOMMISSIONED (worker decommissioned 
> because of kill request from HTTP endpoint (data migration disabled))
> 23/03/05 22:47:21 INFO StandaloneAppClient$ClientEndpoint: Executor updated: 
> app-20230305224215-/3 is now DECOMMISSIONED (worker decommissioned 
> because of kill request from HTTP endpoint (data migration disabled))
> {code}
>  
>  * at t1+2min, task 7 at its first attempt (i.e. task 7.0) is scheduled to 
> the alive executor
> {code:java}
> 23/03/05 22:49:58 INFO TaskSetManager: Starting task 7.0 in stage 133.0 (TID 
> 685) (10.166.225.249, executor 0, partition 7, ANY, {code}
>  
> It seems that task 7.0 is able to pass *{{dataRDD.iterator(partition, 
> ctxt)}}* and acquires the rocksdb lock as we are seeing
> {code:java}
> 23/03/05 22:51:59 WARN TaskSetManager: Lost task 4.1 in stage 133.1 (TID 700) 
> (10.166.225.249 executor 0): java.lang.IllegalStateException: 
> StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be 
> acquired by [ThreadId: Some(50), task: partition 7.1 in stage 133.1, TID 700] 
> as it was not released by [ThreadId: Some(449), task: partition 7.0 in stage 
> 133.0, TID 685] after 60003 ms.
> 23/03/05 22:52:59 WARN TaskSetManager: Lost task 4.2 in stage 133.1 (TID 702) 
> (10.166.225.249 executor 0): java.lang.IllegalStateException: 
> StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be 
> acquired by [ThreadId: Some(1495), task: partition 7.2 in stage 133.1, TID 
> 702] as it was not released by [ThreadId: Some(449), task: partition 7.0 in 
> stage 133.0, TID 685] after 60006 ms.
> 23/03/05 22:53:59 WARN TaskSetManager: Lost task 4.3 in stage 133.1 (TID 704) 
> (10.166.225.249 executor 0): java.lang.IllegalStateException: 
> StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be 
> acquired by [ThreadId: Some(46), task: partition 7.3 in stage 133.1, TID 704] 
> as it was not released by [ThreadId: Some(449), task: partition 7.0 in stage 
> 133.0, TID 685] after 60003 ms.
> {code}
>  
> Increasing the *lockAcquireTimeoutMs* to 2 minutes such that 4 task retries 
> will give us 8 minutes to acquire the lock and it is larger than 
> connectionTimeout with retries (3 * 120s).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42794) Increase the lockAcquireTimeoutMs for acquiring the RocksDB state store in Structure Streaming

2023-03-14 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42794:


Assignee: (was: Apache Spark)

> Increase the lockAcquireTimeoutMs for acquiring the RocksDB state store in 
> Structure Streaming
> --
>
> Key: SPARK-42794
> URL: https://issues.apache.org/jira/browse/SPARK-42794
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Huanli Wang
>Priority: Minor
>
> We are seeing query failure which is caused by RocksDB acquisition failure 
> for the retry tasks.
>  *  at t1, we shrink the cluster to only have one executor
> {code:java}
> 23/03/05 22:47:21 INFO StandaloneAppClient$ClientEndpoint: Executor updated: 
> app-20230305224215-/2 is now DECOMMISSIONED (worker decommissioned 
> because of kill request from HTTP endpoint (data migration disabled))
> 23/03/05 22:47:21 INFO StandaloneAppClient$ClientEndpoint: Executor updated: 
> app-20230305224215-/3 is now DECOMMISSIONED (worker decommissioned 
> because of kill request from HTTP endpoint (data migration disabled))
> {code}
>  
>  * at t1+2min, task 7 at its first attempt (i.e. task 7.0) is scheduled to 
> the alive executor
> {code:java}
> 23/03/05 22:49:58 INFO TaskSetManager: Starting task 7.0 in stage 133.0 (TID 
> 685) (10.166.225.249, executor 0, partition 7, ANY, {code}
>  
> It seems that task 7.0 is able to pass *{{dataRDD.iterator(partition, 
> ctxt)}}* and acquires the rocksdb lock as we are seeing
> {code:java}
> 23/03/05 22:51:59 WARN TaskSetManager: Lost task 4.1 in stage 133.1 (TID 700) 
> (10.166.225.249 executor 0): java.lang.IllegalStateException: 
> StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be 
> acquired by [ThreadId: Some(50), task: partition 7.1 in stage 133.1, TID 700] 
> as it was not released by [ThreadId: Some(449), task: partition 7.0 in stage 
> 133.0, TID 685] after 60003 ms.
> 23/03/05 22:52:59 WARN TaskSetManager: Lost task 4.2 in stage 133.1 (TID 702) 
> (10.166.225.249 executor 0): java.lang.IllegalStateException: 
> StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be 
> acquired by [ThreadId: Some(1495), task: partition 7.2 in stage 133.1, TID 
> 702] as it was not released by [ThreadId: Some(449), task: partition 7.0 in 
> stage 133.0, TID 685] after 60006 ms.
> 23/03/05 22:53:59 WARN TaskSetManager: Lost task 4.3 in stage 133.1 (TID 704) 
> (10.166.225.249 executor 0): java.lang.IllegalStateException: 
> StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be 
> acquired by [ThreadId: Some(46), task: partition 7.3 in stage 133.1, TID 704] 
> as it was not released by [ThreadId: Some(449), task: partition 7.0 in stage 
> 133.0, TID 685] after 60003 ms.
> {code}
>  
> Increasing the *lockAcquireTimeoutMs* to 2 minutes such that 4 task retries 
> will give us 8 minutes to acquire the lock and it is larger than 
> connectionTimeout with retries (3 * 120s).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42794) Increase the lockAcquireTimeoutMs for acquiring the RocksDB state store in Structure Streaming

2023-03-14 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42794:


Assignee: Apache Spark

> Increase the lockAcquireTimeoutMs for acquiring the RocksDB state store in 
> Structure Streaming
> --
>
> Key: SPARK-42794
> URL: https://issues.apache.org/jira/browse/SPARK-42794
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Huanli Wang
>Assignee: Apache Spark
>Priority: Minor
>
> We are seeing query failure which is caused by RocksDB acquisition failure 
> for the retry tasks.
>  *  at t1, we shrink the cluster to only have one executor
> {code:java}
> 23/03/05 22:47:21 INFO StandaloneAppClient$ClientEndpoint: Executor updated: 
> app-20230305224215-/2 is now DECOMMISSIONED (worker decommissioned 
> because of kill request from HTTP endpoint (data migration disabled))
> 23/03/05 22:47:21 INFO StandaloneAppClient$ClientEndpoint: Executor updated: 
> app-20230305224215-/3 is now DECOMMISSIONED (worker decommissioned 
> because of kill request from HTTP endpoint (data migration disabled))
> {code}
>  
>  * at t1+2min, task 7 at its first attempt (i.e. task 7.0) is scheduled to 
> the alive executor
> {code:java}
> 23/03/05 22:49:58 INFO TaskSetManager: Starting task 7.0 in stage 133.0 (TID 
> 685) (10.166.225.249, executor 0, partition 7, ANY, {code}
>  
> It seems that task 7.0 is able to pass *{{dataRDD.iterator(partition, 
> ctxt)}}* and acquires the rocksdb lock as we are seeing
> {code:java}
> 23/03/05 22:51:59 WARN TaskSetManager: Lost task 4.1 in stage 133.1 (TID 700) 
> (10.166.225.249 executor 0): java.lang.IllegalStateException: 
> StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be 
> acquired by [ThreadId: Some(50), task: partition 7.1 in stage 133.1, TID 700] 
> as it was not released by [ThreadId: Some(449), task: partition 7.0 in stage 
> 133.0, TID 685] after 60003 ms.
> 23/03/05 22:52:59 WARN TaskSetManager: Lost task 4.2 in stage 133.1 (TID 702) 
> (10.166.225.249 executor 0): java.lang.IllegalStateException: 
> StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be 
> acquired by [ThreadId: Some(1495), task: partition 7.2 in stage 133.1, TID 
> 702] as it was not released by [ThreadId: Some(449), task: partition 7.0 in 
> stage 133.0, TID 685] after 60006 ms.
> 23/03/05 22:53:59 WARN TaskSetManager: Lost task 4.3 in stage 133.1 (TID 704) 
> (10.166.225.249 executor 0): java.lang.IllegalStateException: 
> StateStoreId(opId=0,partId=7,name=default): RocksDB instance could not be 
> acquired by [ThreadId: Some(46), task: partition 7.3 in stage 133.1, TID 704] 
> as it was not released by [ThreadId: Some(449), task: partition 7.0 in stage 
> 133.0, TID 685] after 60003 ms.
> {code}
>  
> Increasing the *lockAcquireTimeoutMs* to 2 minutes such that 4 task retries 
> will give us 8 minutes to acquire the lock and it is larger than 
> connectionTimeout with retries (3 * 120s).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42793) `connect` module requires `build_profile_flags`

2023-03-14 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700398#comment-17700398
 ] 

Apache Spark commented on SPARK-42793:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/40424

> `connect` module requires `build_profile_flags`
> ---
>
> Key: SPARK-42793
> URL: https://issues.apache.org/jira/browse/SPARK-42793
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42793) `connect` module requires `build_profile_flags`

2023-03-14 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42793:


Assignee: (was: Apache Spark)

> `connect` module requires `build_profile_flags`
> ---
>
> Key: SPARK-42793
> URL: https://issues.apache.org/jira/browse/SPARK-42793
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42793) `connect` module requires `build_profile_flags`

2023-03-14 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42793:


Assignee: Apache Spark

> `connect` module requires `build_profile_flags`
> ---
>
> Key: SPARK-42793
> URL: https://issues.apache.org/jira/browse/SPARK-42793
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42793) `connect` module requires `build_profile_flags`

2023-03-14 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700397#comment-17700397
 ] 

Apache Spark commented on SPARK-42793:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/40424

> `connect` module requires `build_profile_flags`
> ---
>
> Key: SPARK-42793
> URL: https://issues.apache.org/jira/browse/SPARK-42793
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41775) Implement training functions as input

2023-03-14 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700332#comment-17700332
 ] 

Apache Spark commented on SPARK-41775:
--

User 'rithwik-db' has created a pull request for this issue:
https://github.com/apache/spark/pull/40423

> Implement training functions as input
> -
>
> Key: SPARK-41775
> URL: https://issues.apache.org/jira/browse/SPARK-41775
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, PySpark
>Affects Versions: 3.4.0
>Reporter: Rithwik Ediga Lakhamsani
>Assignee: Rithwik Ediga Lakhamsani
>Priority: Major
> Fix For: 3.4.0
>
>
> Sidenote: make formatting updates described in 
> https://github.com/apache/spark/pull/39188
>  
> Currently, `Distributor().run(...)` takes only files as input. Now we will 
> add in additional functionality to take in functions as well. This will 
> require us to go through the following process on each task in the executor 
> nodes:
> 1. take the input function and args and pickle them
> 2. Create a temp train.py file that looks like
> {code:java}
> import cloudpickle
> import os
> if _name_ == "_main_":
>     train, args = cloudpickle.load(f"{tempdir}/train_input.pkl")
>     output = train(*args)
>     if output and os.environ.get("RANK", "") == "0": # this is for 
> partitionId == 0
>         cloudpickle.dump(f"{tempdir}/train_output.pkl") {code}
> 3. Run that train.py file with `torchrun`
> 4. Check if `train_output.pkl` has been created on process on partitionId == 
> 0, if it has, then deserialize it and return that output through `.collect()`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41775) Implement training functions as input

2023-03-14 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700333#comment-17700333
 ] 

Apache Spark commented on SPARK-41775:
--

User 'rithwik-db' has created a pull request for this issue:
https://github.com/apache/spark/pull/40423

> Implement training functions as input
> -
>
> Key: SPARK-41775
> URL: https://issues.apache.org/jira/browse/SPARK-41775
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, PySpark
>Affects Versions: 3.4.0
>Reporter: Rithwik Ediga Lakhamsani
>Assignee: Rithwik Ediga Lakhamsani
>Priority: Major
> Fix For: 3.4.0
>
>
> Sidenote: make formatting updates described in 
> https://github.com/apache/spark/pull/39188
>  
> Currently, `Distributor().run(...)` takes only files as input. Now we will 
> add in additional functionality to take in functions as well. This will 
> require us to go through the following process on each task in the executor 
> nodes:
> 1. take the input function and args and pickle them
> 2. Create a temp train.py file that looks like
> {code:java}
> import cloudpickle
> import os
> if _name_ == "_main_":
>     train, args = cloudpickle.load(f"{tempdir}/train_input.pkl")
>     output = train(*args)
>     if output and os.environ.get("RANK", "") == "0": # this is for 
> partitionId == 0
>         cloudpickle.dump(f"{tempdir}/train_output.pkl") {code}
> 3. Run that train.py file with `torchrun`
> 4. Check if `train_output.pkl` has been created on process on partitionId == 
> 0, if it has, then deserialize it and return that output through `.collect()`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42779) Allow V2 writes to indicate advisory partition size

2023-03-14 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42779:


Assignee: Apache Spark

> Allow V2 writes to indicate advisory partition size
> ---
>
> Key: SPARK-42779
> URL: https://issues.apache.org/jira/browse/SPARK-42779
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Anton Okolnychyi
>Assignee: Apache Spark
>Priority: Major
>
> Data sources may request a particular distribution and ordering of data for 
> V2 writes. If AQE is enabled, the default session advisory partition size 
> (64MB) will be used as guidance. Unfortunately, this default value can still 
> lead to small files because the written data can be compressed nicely using 
> columnar file formats. Spark should allow data sources to indicate the 
> advisory shuffle partition size, just like it lets data sources request a 
> particular number of partitions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42779) Allow V2 writes to indicate advisory partition size

2023-03-14 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42779:


Assignee: (was: Apache Spark)

> Allow V2 writes to indicate advisory partition size
> ---
>
> Key: SPARK-42779
> URL: https://issues.apache.org/jira/browse/SPARK-42779
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Anton Okolnychyi
>Priority: Major
>
> Data sources may request a particular distribution and ordering of data for 
> V2 writes. If AQE is enabled, the default session advisory partition size 
> (64MB) will be used as guidance. Unfortunately, this default value can still 
> lead to small files because the written data can be compressed nicely using 
> columnar file formats. Spark should allow data sources to indicate the 
> advisory shuffle partition size, just like it lets data sources request a 
> particular number of partitions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42779) Allow V2 writes to indicate advisory partition size

2023-03-14 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700316#comment-17700316
 ] 

Apache Spark commented on SPARK-42779:
--

User 'aokolnychyi' has created a pull request for this issue:
https://github.com/apache/spark/pull/40421

> Allow V2 writes to indicate advisory partition size
> ---
>
> Key: SPARK-42779
> URL: https://issues.apache.org/jira/browse/SPARK-42779
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Anton Okolnychyi
>Priority: Major
>
> Data sources may request a particular distribution and ordering of data for 
> V2 writes. If AQE is enabled, the default session advisory partition size 
> (64MB) will be used as guidance. Unfortunately, this default value can still 
> lead to small files because the written data can be compressed nicely using 
> columnar file formats. Spark should allow data sources to indicate the 
> advisory shuffle partition size, just like it lets data sources request a 
> particular number of partitions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42617) Support `isocalendar`

2023-03-14 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42617:


Assignee: (was: Apache Spark)

> Support `isocalendar`
> -
>
> Key: SPARK-42617
> URL: https://issues.apache.org/jira/browse/SPARK-42617
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Priority: Major
>
> We should support `isocalendar` to match pandas behavior 
> (https://pandas.pydata.org/pandas-docs/dev/reference/api/pandas.Series.dt.isocalendar.html)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42617) Support `isocalendar`

2023-03-14 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42617:


Assignee: Apache Spark

> Support `isocalendar`
> -
>
> Key: SPARK-42617
> URL: https://issues.apache.org/jira/browse/SPARK-42617
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Assignee: Apache Spark
>Priority: Major
>
> We should support `isocalendar` to match pandas behavior 
> (https://pandas.pydata.org/pandas-docs/dev/reference/api/pandas.Series.dt.isocalendar.html)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42617) Support `isocalendar`

2023-03-14 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700282#comment-17700282
 ] 

Apache Spark commented on SPARK-42617:
--

User 'dzhigimont' has created a pull request for this issue:
https://github.com/apache/spark/pull/40420

> Support `isocalendar`
> -
>
> Key: SPARK-42617
> URL: https://issues.apache.org/jira/browse/SPARK-42617
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Priority: Major
>
> We should support `isocalendar` to match pandas behavior 
> (https://pandas.pydata.org/pandas-docs/dev/reference/api/pandas.Series.dt.isocalendar.html)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42789) rewrites multiple GetJsonObjects to a JsonTuple if their json expression is the same

2023-03-14 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700241#comment-17700241
 ] 

Apache Spark commented on SPARK-42789:
--

User 'wangyum' has created a pull request for this issue:
https://github.com/apache/spark/pull/40419

> rewrites multiple GetJsonObjects to a JsonTuple if their json expression is 
> the same
> 
>
> Key: SPARK-42789
> URL: https://issues.apache.org/jira/browse/SPARK-42789
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Yuming Wang
>Priority: Major
>
> Benchmark result:
> {noformat}
> Running benchmark: Benchmark rewrite GetJsonObjects
>   Running case: Default: 2
>   Stopped after 2 iterations, 80787 ms
>   Running case: Rewrite: 2
>   Stopped after 2 iterations, 48900 ms
> Java HotSpot(TM) 64-Bit Server VM 17.0.4.1+1-LTS-2 on Mac OS X 13.2.1
> Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
> Benchmark rewrite GetJsonObjects: Best Time(ms)   Avg Time(ms)   
> Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
> 
> Default: 239026  40394
> 1935  0.25397.8   1.0X
> Rewrite: 224354  24450
>  137  0.33368.4   1.6X
> Running benchmark: Benchmark rewrite GetJsonObjects
>   Running case: Default: 3
>   Stopped after 2 iterations, 115055 ms
>   Running case: Rewrite: 3
>   Stopped after 2 iterations, 62297 ms
> Java HotSpot(TM) 64-Bit Server VM 17.0.4.1+1-LTS-2 on Mac OS X 13.2.1
> Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
> Benchmark rewrite GetJsonObjects: Best Time(ms)   Avg Time(ms)   
> Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
> 
> Default: 354652  57528
>  NaN  0.17559.1   1.0X
> Rewrite: 330702  31149
>  631  0.24246.6   1.8X
> Running benchmark: Benchmark rewrite GetJsonObjects
>   Running case: Default: 4
>   Stopped after 2 iterations, 155392 ms
>   Running case: Rewrite: 4
>   Stopped after 2 iterations, 54776 ms
> Java HotSpot(TM) 64-Bit Server VM 17.0.4.1+1-LTS-2 on Mac OS X 13.2.1
> Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
> Benchmark rewrite GetJsonObjects: Best Time(ms)   Avg Time(ms)   
> Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
> 
> Default: 475503  77696
>  NaN  0.1   10443.1   1.0X
> Rewrite: 426962  27388
>  602  0.33729.3   2.8X
> Running benchmark: Benchmark rewrite GetJsonObjects
>   Running case: Default: 5
>   Stopped after 2 iterations, 192836 ms
>   Running case: Rewrite: 5
>   Stopped after 2 iterations, 51967 ms
> Java HotSpot(TM) 64-Bit Server VM 17.0.4.1+1-LTS-2 on Mac OS X 13.2.1
> Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
> Benchmark rewrite GetJsonObjects: Best Time(ms)   Avg Time(ms)   
> Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
> 
> Default: 594923  96418
> 2115  0.1   13129.1   1.0X
> Rewrite: 525362  25984
>  880  0.33507.8   3.7X
> Running benchmark: Benchmark rewrite GetJsonObjects
>   Running case: Default: 10
>   Stopped after 2 iterations, 317246 ms
>   Running case: Rewrite: 10
>   Stopped after 2 iterations, 56734 ms
> Java HotSpot(TM) 64-Bit Server VM 17.0.4.1+1-LTS-2 on Mac OS X 13.2.1
> Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
> Benchmark rewrite GetJsonObjects: Best Time(ms)   Avg Time(ms)   
> Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
> 
> Default: 10  157458 158623
> 1648  0.0   21778.6   1.0X
> Rewrite: 10   28296  28367
>  100  0.33913.8   5.6X
> Running benchmark: Benchmark rewrite GetJsonObjects
>   Running case: Default: 20
>

[jira] [Assigned] (SPARK-42789) rewrites multiple GetJsonObjects to a JsonTuple if their json expression is the same

2023-03-14 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42789:


Assignee: Apache Spark

> rewrites multiple GetJsonObjects to a JsonTuple if their json expression is 
> the same
> 
>
> Key: SPARK-42789
> URL: https://issues.apache.org/jira/browse/SPARK-42789
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Yuming Wang
>Assignee: Apache Spark
>Priority: Major
>
> Benchmark result:
> {noformat}
> Running benchmark: Benchmark rewrite GetJsonObjects
>   Running case: Default: 2
>   Stopped after 2 iterations, 80787 ms
>   Running case: Rewrite: 2
>   Stopped after 2 iterations, 48900 ms
> Java HotSpot(TM) 64-Bit Server VM 17.0.4.1+1-LTS-2 on Mac OS X 13.2.1
> Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
> Benchmark rewrite GetJsonObjects: Best Time(ms)   Avg Time(ms)   
> Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
> 
> Default: 239026  40394
> 1935  0.25397.8   1.0X
> Rewrite: 224354  24450
>  137  0.33368.4   1.6X
> Running benchmark: Benchmark rewrite GetJsonObjects
>   Running case: Default: 3
>   Stopped after 2 iterations, 115055 ms
>   Running case: Rewrite: 3
>   Stopped after 2 iterations, 62297 ms
> Java HotSpot(TM) 64-Bit Server VM 17.0.4.1+1-LTS-2 on Mac OS X 13.2.1
> Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
> Benchmark rewrite GetJsonObjects: Best Time(ms)   Avg Time(ms)   
> Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
> 
> Default: 354652  57528
>  NaN  0.17559.1   1.0X
> Rewrite: 330702  31149
>  631  0.24246.6   1.8X
> Running benchmark: Benchmark rewrite GetJsonObjects
>   Running case: Default: 4
>   Stopped after 2 iterations, 155392 ms
>   Running case: Rewrite: 4
>   Stopped after 2 iterations, 54776 ms
> Java HotSpot(TM) 64-Bit Server VM 17.0.4.1+1-LTS-2 on Mac OS X 13.2.1
> Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
> Benchmark rewrite GetJsonObjects: Best Time(ms)   Avg Time(ms)   
> Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
> 
> Default: 475503  77696
>  NaN  0.1   10443.1   1.0X
> Rewrite: 426962  27388
>  602  0.33729.3   2.8X
> Running benchmark: Benchmark rewrite GetJsonObjects
>   Running case: Default: 5
>   Stopped after 2 iterations, 192836 ms
>   Running case: Rewrite: 5
>   Stopped after 2 iterations, 51967 ms
> Java HotSpot(TM) 64-Bit Server VM 17.0.4.1+1-LTS-2 on Mac OS X 13.2.1
> Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
> Benchmark rewrite GetJsonObjects: Best Time(ms)   Avg Time(ms)   
> Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
> 
> Default: 594923  96418
> 2115  0.1   13129.1   1.0X
> Rewrite: 525362  25984
>  880  0.33507.8   3.7X
> Running benchmark: Benchmark rewrite GetJsonObjects
>   Running case: Default: 10
>   Stopped after 2 iterations, 317246 ms
>   Running case: Rewrite: 10
>   Stopped after 2 iterations, 56734 ms
> Java HotSpot(TM) 64-Bit Server VM 17.0.4.1+1-LTS-2 on Mac OS X 13.2.1
> Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
> Benchmark rewrite GetJsonObjects: Best Time(ms)   Avg Time(ms)   
> Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
> 
> Default: 10  157458 158623
> 1648  0.0   21778.6   1.0X
> Rewrite: 10   28296  28367
>  100  0.33913.8   5.6X
> Running benchmark: Benchmark rewrite GetJsonObjects
>   Running case: Default: 20
>   Stopped after 2 iterations, 618089 ms
>   Running case: Rewrite: 20
>

[jira] [Assigned] (SPARK-42789) rewrites multiple GetJsonObjects to a JsonTuple if their json expression is the same

2023-03-14 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42789:


Assignee: (was: Apache Spark)

> rewrites multiple GetJsonObjects to a JsonTuple if their json expression is 
> the same
> 
>
> Key: SPARK-42789
> URL: https://issues.apache.org/jira/browse/SPARK-42789
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Yuming Wang
>Priority: Major
>
> Benchmark result:
> {noformat}
> Running benchmark: Benchmark rewrite GetJsonObjects
>   Running case: Default: 2
>   Stopped after 2 iterations, 80787 ms
>   Running case: Rewrite: 2
>   Stopped after 2 iterations, 48900 ms
> Java HotSpot(TM) 64-Bit Server VM 17.0.4.1+1-LTS-2 on Mac OS X 13.2.1
> Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
> Benchmark rewrite GetJsonObjects: Best Time(ms)   Avg Time(ms)   
> Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
> 
> Default: 239026  40394
> 1935  0.25397.8   1.0X
> Rewrite: 224354  24450
>  137  0.33368.4   1.6X
> Running benchmark: Benchmark rewrite GetJsonObjects
>   Running case: Default: 3
>   Stopped after 2 iterations, 115055 ms
>   Running case: Rewrite: 3
>   Stopped after 2 iterations, 62297 ms
> Java HotSpot(TM) 64-Bit Server VM 17.0.4.1+1-LTS-2 on Mac OS X 13.2.1
> Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
> Benchmark rewrite GetJsonObjects: Best Time(ms)   Avg Time(ms)   
> Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
> 
> Default: 354652  57528
>  NaN  0.17559.1   1.0X
> Rewrite: 330702  31149
>  631  0.24246.6   1.8X
> Running benchmark: Benchmark rewrite GetJsonObjects
>   Running case: Default: 4
>   Stopped after 2 iterations, 155392 ms
>   Running case: Rewrite: 4
>   Stopped after 2 iterations, 54776 ms
> Java HotSpot(TM) 64-Bit Server VM 17.0.4.1+1-LTS-2 on Mac OS X 13.2.1
> Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
> Benchmark rewrite GetJsonObjects: Best Time(ms)   Avg Time(ms)   
> Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
> 
> Default: 475503  77696
>  NaN  0.1   10443.1   1.0X
> Rewrite: 426962  27388
>  602  0.33729.3   2.8X
> Running benchmark: Benchmark rewrite GetJsonObjects
>   Running case: Default: 5
>   Stopped after 2 iterations, 192836 ms
>   Running case: Rewrite: 5
>   Stopped after 2 iterations, 51967 ms
> Java HotSpot(TM) 64-Bit Server VM 17.0.4.1+1-LTS-2 on Mac OS X 13.2.1
> Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
> Benchmark rewrite GetJsonObjects: Best Time(ms)   Avg Time(ms)   
> Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
> 
> Default: 594923  96418
> 2115  0.1   13129.1   1.0X
> Rewrite: 525362  25984
>  880  0.33507.8   3.7X
> Running benchmark: Benchmark rewrite GetJsonObjects
>   Running case: Default: 10
>   Stopped after 2 iterations, 317246 ms
>   Running case: Rewrite: 10
>   Stopped after 2 iterations, 56734 ms
> Java HotSpot(TM) 64-Bit Server VM 17.0.4.1+1-LTS-2 on Mac OS X 13.2.1
> Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
> Benchmark rewrite GetJsonObjects: Best Time(ms)   Avg Time(ms)   
> Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
> 
> Default: 10  157458 158623
> 1648  0.0   21778.6   1.0X
> Rewrite: 10   28296  28367
>  100  0.33913.8   5.6X
> Running benchmark: Benchmark rewrite GetJsonObjects
>   Running case: Default: 20
>   Stopped after 2 iterations, 618089 ms
>   Running case: Rewrite: 20
>   Stopped after 2 iterations,

[jira] [Assigned] (SPARK-42790) Abstract the excluded method for better test for JDBC docker tests.

2023-03-14 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42790:


Assignee: Apache Spark

> Abstract the excluded method for better test for JDBC docker tests.
> ---
>
> Key: SPARK-42790
> URL: https://issues.apache.org/jira/browse/SPARK-42790
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42790) Abstract the excluded method for better test for JDBC docker tests.

2023-03-14 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42790:


Assignee: (was: Apache Spark)

> Abstract the excluded method for better test for JDBC docker tests.
> ---
>
> Key: SPARK-42790
> URL: https://issues.apache.org/jira/browse/SPARK-42790
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42790) Abstract the excluded method for better test for JDBC docker tests.

2023-03-14 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700220#comment-17700220
 ] 

Apache Spark commented on SPARK-42790:
--

User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/40418

> Abstract the excluded method for better test for JDBC docker tests.
> ---
>
> Key: SPARK-42790
> URL: https://issues.apache.org/jira/browse/SPARK-42790
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42778) QueryStageExec should respect supportsRowBased

2023-03-14 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700214#comment-17700214
 ] 

Apache Spark commented on SPARK-42778:
--

User 'ulysses-you' has created a pull request for this issue:
https://github.com/apache/spark/pull/40417

> QueryStageExec should respect supportsRowBased
> --
>
> Key: SPARK-42778
> URL: https://issues.apache.org/jira/browse/SPARK-42778
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: XiDuo You
>Priority: Major
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42786) Impl typed select in Dataset

2023-03-14 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42786:


Assignee: Apache Spark

> Impl typed select in Dataset
> 
>
> Key: SPARK-42786
> URL: https://issues.apache.org/jira/browse/SPARK-42786
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Zhen Li
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42786) Impl typed select in Dataset

2023-03-14 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700106#comment-17700106
 ] 

Apache Spark commented on SPARK-42786:
--

User 'zhenlineo' has created a pull request for this issue:
https://github.com/apache/spark/pull/40413

> Impl typed select in Dataset
> 
>
> Key: SPARK-42786
> URL: https://issues.apache.org/jira/browse/SPARK-42786
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Zhen Li
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42786) Impl typed select in Dataset

2023-03-14 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42786:


Assignee: (was: Apache Spark)

> Impl typed select in Dataset
> 
>
> Key: SPARK-42786
> URL: https://issues.apache.org/jira/browse/SPARK-42786
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Zhen Li
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42731) Update Spark Configuration

2023-03-14 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700101#comment-17700101
 ] 

Apache Spark commented on SPARK-42731:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/40416

> Update Spark Configuration
> --
>
> Key: SPARK-42731
> URL: https://issues.apache.org/jira/browse/SPARK-42731
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Documentation
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> https://spark.apache.org/docs/latest/configuration.html
> Add a section for Spark Connect configurations



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42731) Update Spark Configuration

2023-03-14 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42731:


Assignee: Apache Spark

> Update Spark Configuration
> --
>
> Key: SPARK-42731
> URL: https://issues.apache.org/jira/browse/SPARK-42731
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Documentation
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>Priority: Major
>
> https://spark.apache.org/docs/latest/configuration.html
> Add a section for Spark Connect configurations



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42731) Update Spark Configuration

2023-03-14 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42731:


Assignee: (was: Apache Spark)

> Update Spark Configuration
> --
>
> Key: SPARK-42731
> URL: https://issues.apache.org/jira/browse/SPARK-42731
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Documentation
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> https://spark.apache.org/docs/latest/configuration.html
> Add a section for Spark Connect configurations



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42785) [K8S][Core] When spark submit without --deploy-mode, will face NPE in Kubernetes Case

2023-03-14 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17700096#comment-17700096
 ] 

Apache Spark commented on SPARK-42785:
--

User 'zwangsheng' has created a pull request for this issue:
https://github.com/apache/spark/pull/40414

> [K8S][Core] When spark submit without --deploy-mode, will face NPE in 
> Kubernetes Case
> -
>
> Key: SPARK-42785
> URL: https://issues.apache.org/jira/browse/SPARK-42785
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.3.2
>Reporter: binjie yang
>Priority: Major
>
> According to this PR 
> [https://github.com/apache/spark/pull/37880#issuecomment-134890,] when 
> user spark submit without `--deploy-mode XXX` or `–conf 
> spark.submit.deployMode=`, may face NPE with this code
>  
> args.deployMode.equals("client")
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42785) [K8S][Core] When spark submit without --deploy-mode, will face NPE in Kubernetes Case

2023-03-14 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42785:


Assignee: (was: Apache Spark)

> [K8S][Core] When spark submit without --deploy-mode, will face NPE in 
> Kubernetes Case
> -
>
> Key: SPARK-42785
> URL: https://issues.apache.org/jira/browse/SPARK-42785
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.3.2
>Reporter: binjie yang
>Priority: Major
>
> According to this PR 
> [https://github.com/apache/spark/pull/37880#issuecomment-134890,] when 
> user spark submit without `--deploy-mode XXX` or `–conf 
> spark.submit.deployMode=`, may face NPE with this code
>  
> args.deployMode.equals("client")
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42785) [K8S][Core] When spark submit without --deploy-mode, will face NPE in Kubernetes Case

2023-03-14 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42785:


Assignee: Apache Spark

> [K8S][Core] When spark submit without --deploy-mode, will face NPE in 
> Kubernetes Case
> -
>
> Key: SPARK-42785
> URL: https://issues.apache.org/jira/browse/SPARK-42785
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.3.2
>Reporter: binjie yang
>Assignee: Apache Spark
>Priority: Major
>
> According to this PR 
> [https://github.com/apache/spark/pull/37880#issuecomment-134890,] when 
> user spark submit without `--deploy-mode XXX` or `–conf 
> spark.submit.deployMode=`, may face NPE with this code
>  
> args.deployMode.equals("client")
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42784) Fix the problem of incomplete creation of subdirectories in push merged localDir

2023-03-14 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42784:


Assignee: (was: Apache Spark)

> Fix the problem of incomplete creation of subdirectories in push merged 
> localDir
> 
>
> Key: SPARK-42784
> URL: https://issues.apache.org/jira/browse/SPARK-42784
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle, Spark Core
>Affects Versions: 3.3.2
>Reporter: Fencheng Mei
>Priority: Major
>
> After we massively enabled push-based shuffle in our production environment, 
> we found some warn messages appearing in the server-side log messages.
> the warning log like:
> ShuffleBlockPusher: Pushing block shufflePush_3_0_5352_935 to 
> BlockManagerId(shuffle-push-merger, zw06-data-hdp-dn08251.mt, 7337, None) 
> failed.
> java.lang.RuntimeException: java.lang.RuntimeException: Cannot initialize 
> merged shuffle partition for appId application_1671244879475_44020960 
> shuffleId 3 shuffleMergeId 0 reduceId 935.
> After investigation, we identified the triggering mechanism of the bug。
> The driver requested two different containers on the same physical machine. 
> During the creation of the 'push-merged' directory in the first container 
> (container_1), the mergeDir was created first, then the subDir were created 
> based on the value of the "spark.diskStore.subDirectories" parameter. 
> However, the resources of container_1 were preempted during the creation of 
> the sub-directories, resulting in subDir not being created (only part of it 
> was created ). As the mergeDir still existed, the second container 
> (container_2) was unable to create further subDir (as it assumed that all 
> directories had already been created).
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

< 1 2 3 4 5 6 7 8 9 10 >

201 - 300 of 86669 matches

Mail list logo