date:20230227

[jira] [Assigned] (SPARK-42491) Upgrade jetty to 9.4.51.v20230217

2023-02-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42491:


Assignee: (was: Apache Spark)

> Upgrade jetty to  9.4.51.v20230217
> --
>
> Key: SPARK-42491
> URL: https://issues.apache.org/jira/browse/SPARK-42491
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Priority: Minor
>
> https://github.com/eclipse/jetty.project/releases/tag/jetty-9.4.51.v20230217



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42491) Upgrade jetty to 9.4.51.v20230217

2023-02-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42491:


Assignee: Apache Spark

> Upgrade jetty to  9.4.51.v20230217
> --
>
> Key: SPARK-42491
> URL: https://issues.apache.org/jira/browse/SPARK-42491
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Minor
>
> https://github.com/eclipse/jetty.project/releases/tag/jetty-9.4.51.v20230217



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42491) Upgrade jetty to 9.4.51.v20230217

2023-02-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17694393#comment-17694393
 ] 

Apache Spark commented on SPARK-42491:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/40214

> Upgrade jetty to  9.4.51.v20230217
> --
>
> Key: SPARK-42491
> URL: https://issues.apache.org/jira/browse/SPARK-42491
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Priority: Minor
>
> https://github.com/eclipse/jetty.project/releases/tag/jetty-9.4.51.v20230217



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-42617) Support `isocalendar`

2023-02-27 Thread Haejoon Lee (Jira)

Haejoon Lee created SPARK-42617:
---

 Summary: Support `isocalendar`
 Key: SPARK-42617
 URL: https://issues.apache.org/jira/browse/SPARK-42617
 Project: Spark
  Issue Type: New Feature
  Components: Pandas API on Spark
Affects Versions: 3.5.0
Reporter: Haejoon Lee


We should support `isocalendar` to match pandas behavior 
(https://pandas.pydata.org/pandas-docs/dev/reference/api/pandas.Series.dt.isocalendar.html)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42548) Add ReferenceAllColumns to skip rewriting attributes

2023-02-27 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-42548:
---

Assignee: XiDuo You

> Add ReferenceAllColumns to skip rewriting attributes
> 
>
> Key: SPARK-42548
> URL: https://issues.apache.org/jira/browse/SPARK-42548
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: XiDuo You
>Assignee: XiDuo You
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-42548) Add ReferenceAllColumns to skip rewriting attributes

2023-02-27 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-42548.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 40154
[https://github.com/apache/spark/pull/40154]

> Add ReferenceAllColumns to skip rewriting attributes
> 
>
> Key: SPARK-42548
> URL: https://issues.apache.org/jira/browse/SPARK-42548
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: XiDuo You
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42599) Make `CompatibilitySuite` as a tool like `dev/mima`

2023-02-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17694378#comment-17694378
 ] 

Apache Spark commented on SPARK-42599:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/40213

> Make `CompatibilitySuite` as a tool like `dev/mima`
> ---
>
> Key: SPARK-42599
> URL: https://issues.apache.org/jira/browse/SPARK-42599
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0, 3.5.0
>Reporter: Yang Jie
>Priority: Major
>
> Using maven to test `CompatibilitySuite` requires some pre-work(need maven 
> build sql & 
> connect-client-jvm module before test), so when we run `mvn package test`, 
> there will be following errors:
>  
> {code:java}
> CompatibilitySuite:
> - compatibility MiMa tests *** FAILED ***
>   java.lang.AssertionError: assertion failed: Failed to find the jar inside 
> folder: /home/bjorn/spark-3.4.0/connector/connect/client/jvm/target
>   at scala.Predef$.assert(Predef.scala:223)
>   at 
> org.apache.spark.sql.connect.client.util.IntegrationTestUtils$.findJar(IntegrationTestUtils.scala:67)
>   at 
> org.apache.spark.sql.connect.client.CompatibilitySuite.clientJar$lzycompute(CompatibilitySuite.scala:57)
>   at 
> org.apache.spark.sql.connect.client.CompatibilitySuite.clientJar(CompatibilitySuite.scala:53)
>   at 
> org.apache.spark.sql.connect.client.CompatibilitySuite.$anonfun$new$1(CompatibilitySuite.scala:69)
>   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>   at org.scalatest.Transformer.apply(Transformer.scala:22)
>   at org.scalatest.Transformer.apply(Transformer.scala:20)
>   ...
> - compatibility API tests: Dataset *** FAILED ***
>   java.lang.AssertionError: assertion failed: Failed to find the jar inside 
> folder: /home/bjorn/spark-3.4.0/connector/connect/client/jvm/target
>   at scala.Predef$.assert(Predef.scala:223)
>   at 
> org.apache.spark.sql.connect.client.util.IntegrationTestUtils$.findJar(IntegrationTestUtils.scala:67)
>   at 
> org.apache.spark.sql.connect.client.CompatibilitySuite.clientJar$lzycompute(CompatibilitySuite.scala:57)
>   at 
> org.apache.spark.sql.connect.client.CompatibilitySuite.clientJar(CompatibilitySuite.scala:53)
>   at 
> org.apache.spark.sql.connect.client.CompatibilitySuite.$anonfun$new$7(CompatibilitySuite.scala:110)
>   at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>   at org.scalatest.Transformer.apply(Transformer.scala:22) {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-41171) Push down filter through window when partitionSpec is empty

2023-02-27 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-41171.
-
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 40142
[https://github.com/apache/spark/pull/40142]

> Push down filter through window when partitionSpec is empty
> ---
>
> Key: SPARK-41171
> URL: https://issues.apache.org/jira/browse/SPARK-41171
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: jiaan.geng
>Assignee: jiaan.geng
>Priority: Major
> Fix For: 3.5.0
>
>
> Sometimes, filter compares the rank-like window functions with number.
> {code:java}
> SELECT *, ROW_NUMBER() OVER(ORDER BY a) AS rn FROM Tab1 WHERE rn <= 5
> {code}
> We can create a Limit(5) and push down it as the child of Window.
> {code:java}
> SELECT *, ROW_NUMBER() OVER(ORDER BY a) AS rn FROM (SELECT * FROM Tab1 ORDER 
> BY a LIMIT 5) t
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41171) Push down filter through window when partitionSpec is empty

2023-02-27 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-41171:
---

Assignee: jiaan.geng

> Push down filter through window when partitionSpec is empty
> ---
>
> Key: SPARK-41171
> URL: https://issues.apache.org/jira/browse/SPARK-41171
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: jiaan.geng
>Assignee: jiaan.geng
>Priority: Major
>
> Sometimes, filter compares the rank-like window functions with number.
> {code:java}
> SELECT *, ROW_NUMBER() OVER(ORDER BY a) AS rn FROM Tab1 WHERE rn <= 5
> {code}
> We can create a Limit(5) and push down it as the child of Window.
> {code:java}
> SELECT *, ROW_NUMBER() OVER(ORDER BY a) AS rn FROM (SELECT * FROM Tab1 ORDER 
> BY a LIMIT 5) t
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42616) SparkSQLCLIDriver shall only close started hive sessionState

2023-02-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42616:


Assignee: Apache Spark

> SparkSQLCLIDriver shall only close started hive sessionState
> 
>
> Key: SPARK-42616
> URL: https://issues.apache.org/jira/browse/SPARK-42616
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.2
>Reporter: Kent Yao
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42613) PythonRunner should set OMP_NUM_THREADS to task cpus instead of executor cores by default

2023-02-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42613:


Assignee: Apache Spark

> PythonRunner should set OMP_NUM_THREADS to task cpus instead of executor 
> cores by default
> -
>
> Key: SPARK-42613
> URL: https://issues.apache.org/jira/browse/SPARK-42613
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, YARN
>Affects Versions: 3.3.0
>Reporter: John Zhuge
>Assignee: Apache Spark
>Priority: Major
>
> Follow up from 
> [https://github.com/apache/spark/pull/40199#discussion_r1119453996]
> If OMP_NUM_THREADS is not set explicitly, we should set it to 
> `spark.task.cpus` instead of `spark.executor.cores` as described in [PR 
> #38699|https://github.com/apache/spark/pull/38699].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42613) PythonRunner should set OMP_NUM_THREADS to task cpus instead of executor cores by default

2023-02-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17694324#comment-17694324
 ] 

Apache Spark commented on SPARK-42613:
--

User 'jzhuge' has created a pull request for this issue:
https://github.com/apache/spark/pull/40212

> PythonRunner should set OMP_NUM_THREADS to task cpus instead of executor 
> cores by default
> -
>
> Key: SPARK-42613
> URL: https://issues.apache.org/jira/browse/SPARK-42613
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, YARN
>Affects Versions: 3.3.0
>Reporter: John Zhuge
>Priority: Major
>
> Follow up from 
> [https://github.com/apache/spark/pull/40199#discussion_r1119453996]
> If OMP_NUM_THREADS is not set explicitly, we should set it to 
> `spark.task.cpus` instead of `spark.executor.cores` as described in [PR 
> #38699|https://github.com/apache/spark/pull/38699].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42616) SparkSQLCLIDriver shall only close started hive sessionState

2023-02-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42616:


Assignee: (was: Apache Spark)

> SparkSQLCLIDriver shall only close started hive sessionState
> 
>
> Key: SPARK-42616
> URL: https://issues.apache.org/jira/browse/SPARK-42616
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.2
>Reporter: Kent Yao
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42613) PythonRunner should set OMP_NUM_THREADS to task cpus instead of executor cores by default

2023-02-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42613:


Assignee: (was: Apache Spark)

> PythonRunner should set OMP_NUM_THREADS to task cpus instead of executor 
> cores by default
> -
>
> Key: SPARK-42613
> URL: https://issues.apache.org/jira/browse/SPARK-42613
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, YARN
>Affects Versions: 3.3.0
>Reporter: John Zhuge
>Priority: Major
>
> Follow up from 
> [https://github.com/apache/spark/pull/40199#discussion_r1119453996]
> If OMP_NUM_THREADS is not set explicitly, we should set it to 
> `spark.task.cpus` instead of `spark.executor.cores` as described in [PR 
> #38699|https://github.com/apache/spark/pull/38699].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42616) SparkSQLCLIDriver shall only close started hive sessionState

2023-02-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17694323#comment-17694323
 ] 

Apache Spark commented on SPARK-42616:
--

User 'yaooqinn' has created a pull request for this issue:
https://github.com/apache/spark/pull/40211

> SparkSQLCLIDriver shall only close started hive sessionState
> 
>
> Key: SPARK-42616
> URL: https://issues.apache.org/jira/browse/SPARK-42616
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.2
>Reporter: Kent Yao
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-42613) PythonRunner should set OMP_NUM_THREADS to task cpus instead of executor cores by default

2023-02-27 Thread John Zhuge (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Zhuge updated SPARK-42613:
---
Description: 
Follow up from 
[https://github.com/apache/spark/pull/40199#discussion_r1119453996]

If OMP_NUM_THREADS is not set explicitly, we should set it to `spark.task.cpus` 
instead of `spark.executor.cores` as described in [PR 
#38699|https://github.com/apache/spark/pull/38699].

  was:
Follow up from 
[https://github.com/apache/spark/pull/40199#discussion_r1119453996]

 

If OMP_NUM_THREADS is not set explicitly, we should set it to `spark.task.cpus` 
instead of `spark.executor.cores`. Otherwise, we will still have issues when 
executer core is set to a very large number but task cpus is 1.


> PythonRunner should set OMP_NUM_THREADS to task cpus instead of executor 
> cores by default
> -
>
> Key: SPARK-42613
> URL: https://issues.apache.org/jira/browse/SPARK-42613
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, YARN
>Affects Versions: 3.3.0
>Reporter: John Zhuge
>Priority: Major
>
> Follow up from 
> [https://github.com/apache/spark/pull/40199#discussion_r1119453996]
> If OMP_NUM_THREADS is not set explicitly, we should set it to 
> `spark.task.cpus` instead of `spark.executor.cores` as described in [PR 
> #38699|https://github.com/apache/spark/pull/38699].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-42583) Remove outer join if all aggregate functions are distinct

2023-02-27 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-42583:
--
Affects Version/s: 3.5.0
   (was: 3.4.0)

> Remove outer join if all aggregate functions are distinct
> -
>
> Key: SPARK-42583
> URL: https://issues.apache.org/jira/browse/SPARK-42583
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.5.0
>
>
> To support more cases: 
> https://github.com/pingcap/tidb/blob/master/planner/core/rule_join_elimination.go#L159



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-42583) Remove outer join if all aggregate functions are distinct

2023-02-27 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-42583.
---
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 40177
[https://github.com/apache/spark/pull/40177]

> Remove outer join if all aggregate functions are distinct
> -
>
> Key: SPARK-42583
> URL: https://issues.apache.org/jira/browse/SPARK-42583
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.5.0
>
>
> To support more cases: 
> https://github.com/pingcap/tidb/blob/master/planner/core/rule_join_elimination.go#L159



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42583) Remove outer join if all aggregate functions are distinct

2023-02-27 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-42583:
-

Assignee: Yuming Wang

> Remove outer join if all aggregate functions are distinct
> -
>
> Key: SPARK-42583
> URL: https://issues.apache.org/jira/browse/SPARK-42583
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
>
> To support more cases: 
> https://github.com/pingcap/tidb/blob/master/planner/core/rule_join_elimination.go#L159



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-42610) Add implicit encoders to SQLImplicits

2023-02-27 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-42610.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 40205
[https://github.com/apache/spark/pull/40205]

> Add implicit encoders to SQLImplicits
> -
>
> Key: SPARK-42610
> URL: https://issues.apache.org/jira/browse/SPARK-42610
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Assignee: Herman van Hövell
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-42616) SparkSQLCLIDriver shall only close started hive sessionState

2023-02-27 Thread Kent Yao (Jira)

Kent Yao created SPARK-42616:


 Summary: SparkSQLCLIDriver shall only close started hive 
sessionState
 Key: SPARK-42616
 URL: https://issues.apache.org/jira/browse/SPARK-42616
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.3.2
Reporter: Kent Yao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-42613) PythonRunner should set OMP_NUM_THREADS to task cpus instead of executor cores by default

2023-02-27 Thread John Zhuge (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Zhuge updated SPARK-42613:
---
Summary: PythonRunner should set OMP_NUM_THREADS to task cpus instead of 
executor cores by default  (was: PythonRunner should set OMP_NUM_THREADS to 
task cpus times executor cores by default)

> PythonRunner should set OMP_NUM_THREADS to task cpus instead of executor 
> cores by default
> -
>
> Key: SPARK-42613
> URL: https://issues.apache.org/jira/browse/SPARK-42613
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, YARN
>Affects Versions: 3.3.0
>Reporter: John Zhuge
>Priority: Major
>
> Follow up from 
> [https://github.com/apache/spark/pull/40199#discussion_r1119453996]
>  
> If OMP_NUM_THREADS is not set explicitly, we should set it to 
> `spark.task.cpus x spark.executor.cores`. Otherwise, we will still have 
> issues when executer core is set to a very large number but task cpus is 1.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42602) Provide more details in TaskEndReason s for tasks killed by TaskScheduler.cancelTasks

2023-02-27 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-42602:
-

Assignee: Bo Zhang

> Provide more details in TaskEndReason s for tasks killed by 
> TaskScheduler.cancelTasks
> -
>
> Key: SPARK-42602
> URL: https://issues.apache.org/jira/browse/SPARK-42602
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: Bo Zhang
>Assignee: Bo Zhang
>Priority: Major
>
> Currently tasks killed by `TaskScheduler.cancelTasks` will have a 
> `TaskEndReason` "TaskKilled (Stage cancelled)". We should do better at 
> differentiating reasons for stage cancellations (e.g. user-initiated or 
> caused by task failures in the stage).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-42613) PythonRunner should set OMP_NUM_THREADS to task cpus instead of executor cores by default

2023-02-27 Thread John Zhuge (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Zhuge updated SPARK-42613:
---
Description: 
Follow up from 
[https://github.com/apache/spark/pull/40199#discussion_r1119453996]

 

If OMP_NUM_THREADS is not set explicitly, we should set it to `spark.task.cpus` 
instead of `spark.executor.cores`. Otherwise, we will still have issues when 
executer core is set to a very large number but task cpus is 1.

  was:
Follow up from 
[https://github.com/apache/spark/pull/40199#discussion_r1119453996]

 

If OMP_NUM_THREADS is not set explicitly, we should set it to `spark.task.cpus 
x spark.executor.cores`. Otherwise, we will still have issues when executer 
core is set to a very large number but task cpus is 1.


> PythonRunner should set OMP_NUM_THREADS to task cpus instead of executor 
> cores by default
> -
>
> Key: SPARK-42613
> URL: https://issues.apache.org/jira/browse/SPARK-42613
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, YARN
>Affects Versions: 3.3.0
>Reporter: John Zhuge
>Priority: Major
>
> Follow up from 
> [https://github.com/apache/spark/pull/40199#discussion_r1119453996]
>  
> If OMP_NUM_THREADS is not set explicitly, we should set it to 
> `spark.task.cpus` instead of `spark.executor.cores`. Otherwise, we will still 
> have issues when executer core is set to a very large number but task cpus is 
> 1.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-42602) Provide more details in TaskEndReason s for tasks killed by TaskScheduler.cancelTasks

2023-02-27 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-42602.
---
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 40194
[https://github.com/apache/spark/pull/40194]

> Provide more details in TaskEndReason s for tasks killed by 
> TaskScheduler.cancelTasks
> -
>
> Key: SPARK-42602
> URL: https://issues.apache.org/jira/browse/SPARK-42602
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: Bo Zhang
>Assignee: Bo Zhang
>Priority: Major
> Fix For: 3.5.0
>
>
> Currently tasks killed by `TaskScheduler.cancelTasks` will have a 
> `TaskEndReason` "TaskKilled (Stage cancelled)". We should do better at 
> differentiating reasons for stage cancellations (e.g. user-initiated or 
> caused by task failures in the stage).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-42613) PythonRunner should set OMP_NUM_THREADS to task cpus times executor cores by default

2023-02-27 Thread John Zhuge (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Zhuge updated SPARK-42613:
---
Description: 
Follow up from 
[https://github.com/apache/spark/pull/40199#discussion_r1119453996]

 

If OMP_NUM_THREADS is not set explicitly, we should set it to `spark.task.cpus 
x spark.executor.cores`. Otherwise, we will still have issues when executer 
core is set to a very large number but task cpus is 1.

  was:
Follow up from 
[https://github.com/apache/spark/pull/40199#discussion_r1119453996]

 

If OMP_NUM_THREADS is not set explicitly, we should set it to `spark.task.cpus` 
instead of `spark.executor.cores`. Otherwise, we will still have issues when 
executer core is set to a very large number but task cpus is 1.


> PythonRunner should set OMP_NUM_THREADS to task cpus times executor cores by 
> default
> 
>
> Key: SPARK-42613
> URL: https://issues.apache.org/jira/browse/SPARK-42613
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, YARN
>Affects Versions: 3.3.0
>Reporter: John Zhuge
>Priority: Major
>
> Follow up from 
> [https://github.com/apache/spark/pull/40199#discussion_r1119453996]
>  
> If OMP_NUM_THREADS is not set explicitly, we should set it to 
> `spark.task.cpus x spark.executor.cores`. Otherwise, we will still have 
> issues when executer core is set to a very large number but task cpus is 1.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-42613) PythonRunner should set OMP_NUM_THREADS to task cpus times executor cores by default

2023-02-27 Thread John Zhuge (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Zhuge updated SPARK-42613:
---
Summary: PythonRunner should set OMP_NUM_THREADS to task cpus times 
executor cores by default  (was: PythonRunner should set OMP_NUM_THREADS to 
task cpus instead of executor cores by default)

> PythonRunner should set OMP_NUM_THREADS to task cpus times executor cores by 
> default
> 
>
> Key: SPARK-42613
> URL: https://issues.apache.org/jira/browse/SPARK-42613
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, YARN
>Affects Versions: 3.3.0
>Reporter: John Zhuge
>Priority: Major
>
> Follow up from 
> [https://github.com/apache/spark/pull/40199#discussion_r1119453996]
>  
> If OMP_NUM_THREADS is not set explicitly, we should set it to 
> `spark.task.cpus` instead of `spark.executor.cores`. Otherwise, we will still 
> have issues when executer core is set to a very large number but task cpus is 
> 1.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42615) Refactor the AnalyzePlan RPC and add `session.version`

2023-02-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17694313#comment-17694313
 ] 

Apache Spark commented on SPARK-42615:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/40210

> Refactor the AnalyzePlan RPC and add `session.version`
> --
>
> Key: SPARK-42615
> URL: https://issues.apache.org/jira/browse/SPARK-42615
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42615) Refactor the AnalyzePlan RPC and add `session.version`

2023-02-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42615:


Assignee: (was: Apache Spark)

> Refactor the AnalyzePlan RPC and add `session.version`
> --
>
> Key: SPARK-42615
> URL: https://issues.apache.org/jira/browse/SPARK-42615
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42615) Refactor the AnalyzePlan RPC and add `session.version`

2023-02-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42615:


Assignee: Apache Spark

> Refactor the AnalyzePlan RPC and add `session.version`
> --
>
> Key: SPARK-42615
> URL: https://issues.apache.org/jira/browse/SPARK-42615
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-42615) Refactor the AnalyzePlan RPC and add `session.version`

2023-02-27 Thread Ruifeng Zheng (Jira)

Ruifeng Zheng created SPARK-42615:
-

 Summary: Refactor the AnalyzePlan RPC and add `session.version`
 Key: SPARK-42615
 URL: https://issues.apache.org/jira/browse/SPARK-42615
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, PySpark
Affects Versions: 3.4.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-42406) [PROTOBUF] Recursive field handling is incompatible with delta

2023-02-27 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-42406.

Resolution: Fixed

Issue resolved by pull request 40141
[https://github.com/apache/spark/pull/40141]

> [PROTOBUF] Recursive field handling is incompatible with delta
> --
>
> Key: SPARK-42406
> URL: https://issues.apache.org/jira/browse/SPARK-42406
> Project: Spark
>  Issue Type: Bug
>  Components: Protobuf
>Affects Versions: 3.4.0
>Reporter: Raghu Angadi
>Assignee: Raghu Angadi
>Priority: Major
> Fix For: 3.4.0
>
>
> Protobuf deserializer (`from_protobuf()` function()) optionally supports 
> recursive fields by limiting the depth to certain level. See example below. 
> It assigns a 'NullType' for such a field when allowed depth is reached. 
> It causes a few issues. E.g. a repeated field as in the following example 
> results in a Array field with 'NullType'. Delta does not support null type in 
> a complex type.
> Actually `Array[NullType]` is not really useful anyway.
> How about this fix: Drop the recursive field when the limit reached rather 
> than using a NullType. 
> The example below makes it clear:
> Consider a recursive Protobuf:
>  
> {code:python}
> message TreeNode {
>   string value = 1;
>   repeated TreeNode children = 2;
> }
> {code}
> Allow depth of 2: 
>  
> {code:python}
>    df.select(
>     'proto',
>      messageName = 'TreeNode',
>      options = { ... "recursive.fields.max.depth" : "2" }
>   ).printSchema()
> {code}
> Schema looks like this:
> {noformat}
> root
> |– from_protobuf(proto): struct (nullable = true)|
> | |– value: string (nullable = true)|
> | |– children: array (nullable = false)|
> | | |– element: struct (containsNull = false)|
> | | | |– value: string (nullable = true)|
> | | | |– children: array (nullable = false)|
> | | | | |– element: struct (containsNull = false)|
> | | | | | |– value: string (nullable = true)|
> | | | | | |– children: array (nullable = false). [ === Proposed fix: Drop 
> this field === ]|
> | | | | | | |– element: void (containsNull = false) [ === NOTICE 'void' HERE 
> === ] 
> {noformat}
> When we try to write this to a delta table, we get an error:
> {noformat}
> AnalysisException: Found nested NullType in column 
> from_protobuf(proto).children which is of ArrayType. Delta doesn't support 
> writing NullType in complex types.
> {noformat}
>  
> We could just drop the field 'element' when recursion depth is reached. It is 
> simpler and does not need to deal with NullType. We are ignoring the value 
> anyway. There is no use in keeping the field.
> Another issue is setting for 'recursive.fields.max.depth': It is not enforced 
> correctly. '0' does not make sense. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42427) Conv should return an error if the internal conversion overflows

2023-02-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17694299#comment-17694299
 ] 

Apache Spark commented on SPARK-42427:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/40209

> Conv should return an error if the internal conversion overflows
> 
>
> Key: SPARK-42427
> URL: https://issues.apache.org/jira/browse/SPARK-42427
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42539) User-provided JARs can override Spark's Hive metadata client JARs when using "builtin"

2023-02-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42539:


Assignee: Apache Spark

> User-provided JARs can override Spark's Hive metadata client JARs when using 
> "builtin"
> --
>
> Key: SPARK-42539
> URL: https://issues.apache.org/jira/browse/SPARK-42539
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.3, 3.2.3, 3.3.2
>Reporter: Erik Krogen
>Assignee: Apache Spark
>Priority: Major
>
> Recently we observed that on version 3.2.0 and Java 8, it is possible for 
> user-provided Hive JARs to break the ability for Spark, via the Hive metadata 
> client / {{IsolatedClientLoader}}, to communicate with Hive Metastore, when 
> using the default behavior of the "builtin" Hive version. After SPARK-35321, 
> when Spark is compiled against Hive >= 2.3.9 and the "builtin" Hive client 
> version is used, we will call the method {{Hive.getWithoutRegisterFns()}} 
> (from HIVE-21563) instead of {{Hive.get()}}. If the user has included, for 
> example, {{hive-exec-2.3.8.jar}} on their classpath, the client will break 
> with a {{NoSuchMethodError}}. This particular failure mode was resolved in 
> 3.2.1 by SPARK-37446, but while investigating, we found a general issue that 
> it's possible for user JARs to override Spark's own JARs -- but only inside 
> of the IsolatedClientLoader when using "builtin". This happens because even 
> when Spark is configured to use the "builtin" Hive classes, it still creates 
> a separate URLClassLoader for the HiveClientImpl used for HMS communication. 
> To get the set of JAR URLs to use for this classloader, Spark [collects all 
> of the JARs used by the user classloader (and its parent, and that 
> classloader's parent, and so 
> on)|https://github.com/apache/spark/blob/87e3d5625e76bb734b8dd753bfb25002822c8585/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala#L412-L438].
>  Thus the newly created classloader will have all of the same JARs as the 
> user classloader, but the ordering has been reversed! User JARs get 
> prioritized ahead of system JARs, because the classloader hierarchy is 
> traversed from bottom-to-top. For example let's say we have user JARs 
> "foo.jar" and "hive-exec-2.3.8.jar". The user classloader will look like this:
> {code}
> MutableURLClassLoader
> -- foo.jar
> -- hive-exec-2.3.8.jar
> -- parent: URLClassLoader
> - spark-core_2.12-3.2.0.jar
> - ...
> - hive-exec-2.3.9.jar
> - ...
> {code}
> This setup provides the expected behavior within the user classloader; it 
> will first check the parent, so hive-exec-2.3.9.jar takes precedence, and the 
> MutableURLClassLoader is only checked if the class doesn't exist in the 
> parent. But when a JAR list is constructed for the IsolatedClientLoader, it 
> traverses the URLs from MutableURLClassLoader first, then it's parent, so the 
> final list looks like (in order):
> {code}
> URLClassLoader [IsolatedClientLoader]
> -- foo.jar
> -- hive-exec-2.3.8.jar
> -- spark-core_2.12-3.2.0.jar
> -- ...
> -- hive-exec-2.3.9.jar
> -- ...
> -- parent: boot classloader (JVM classes)
> {code}
> Now when a lookup happens, all of the JARs are within the same 
> URLClassLoader, and the user JARs are in front of the Spark ones, so the user 
> JARs get prioritized. This is the opposite of the expected behavior when 
> using the default user/application classloader in Spark, which has 
> parent-first behavior, prioritizing the Spark/system classes over the user 
> classes. (Note that this behavior is correct when using the 
> {{ChildFirstURLClassLoader}}.)
> After SPARK-37446, the NoSuchMethodError is no longer an issue, but this 
> still breaks assumptions about how user JARs should be treated vs. system 
> JARs, and presents the ability for the client to break in other ways. For 
> example in SPARK-37446 it describes a scenario whereby Hive 2.3.8 JARs have 
> been included; the changes in Hive 2.3.9 were needed to improve compatibility 
> with older HMS, so if a user were to accidentally include these older JARs, 
> it could break the ability of Spark to communicate with HMS 1.x
> I see two solutions to this:
> *(A) Remove the separate classloader entirely when using "builtin"*
> Starting from 3.0.0, due to SPARK-26839, when using Java 9+, we don't even 
> create a new classloader when using "builtin". This makes sense, as [called 
> out in this 
> comment|https://github.com/apache/spark/pull/24057#discussion_r265142878], 
> since the point of "builtin" is to use the existing JARs on the classpath 
> anyway. This proposes simply extending the changes from SPARK-26839 to all 
> Java versions, instead of restricting to Java 9+ only.
> *(B) Reverse the

[jira] [Assigned] (SPARK-42539) User-provided JARs can override Spark's Hive metadata client JARs when using "builtin"

2023-02-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42539:


Assignee: (was: Apache Spark)

> User-provided JARs can override Spark's Hive metadata client JARs when using 
> "builtin"
> --
>
> Key: SPARK-42539
> URL: https://issues.apache.org/jira/browse/SPARK-42539
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.3, 3.2.3, 3.3.2
>Reporter: Erik Krogen
>Priority: Major
>
> Recently we observed that on version 3.2.0 and Java 8, it is possible for 
> user-provided Hive JARs to break the ability for Spark, via the Hive metadata 
> client / {{IsolatedClientLoader}}, to communicate with Hive Metastore, when 
> using the default behavior of the "builtin" Hive version. After SPARK-35321, 
> when Spark is compiled against Hive >= 2.3.9 and the "builtin" Hive client 
> version is used, we will call the method {{Hive.getWithoutRegisterFns()}} 
> (from HIVE-21563) instead of {{Hive.get()}}. If the user has included, for 
> example, {{hive-exec-2.3.8.jar}} on their classpath, the client will break 
> with a {{NoSuchMethodError}}. This particular failure mode was resolved in 
> 3.2.1 by SPARK-37446, but while investigating, we found a general issue that 
> it's possible for user JARs to override Spark's own JARs -- but only inside 
> of the IsolatedClientLoader when using "builtin". This happens because even 
> when Spark is configured to use the "builtin" Hive classes, it still creates 
> a separate URLClassLoader for the HiveClientImpl used for HMS communication. 
> To get the set of JAR URLs to use for this classloader, Spark [collects all 
> of the JARs used by the user classloader (and its parent, and that 
> classloader's parent, and so 
> on)|https://github.com/apache/spark/blob/87e3d5625e76bb734b8dd753bfb25002822c8585/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala#L412-L438].
>  Thus the newly created classloader will have all of the same JARs as the 
> user classloader, but the ordering has been reversed! User JARs get 
> prioritized ahead of system JARs, because the classloader hierarchy is 
> traversed from bottom-to-top. For example let's say we have user JARs 
> "foo.jar" and "hive-exec-2.3.8.jar". The user classloader will look like this:
> {code}
> MutableURLClassLoader
> -- foo.jar
> -- hive-exec-2.3.8.jar
> -- parent: URLClassLoader
> - spark-core_2.12-3.2.0.jar
> - ...
> - hive-exec-2.3.9.jar
> - ...
> {code}
> This setup provides the expected behavior within the user classloader; it 
> will first check the parent, so hive-exec-2.3.9.jar takes precedence, and the 
> MutableURLClassLoader is only checked if the class doesn't exist in the 
> parent. But when a JAR list is constructed for the IsolatedClientLoader, it 
> traverses the URLs from MutableURLClassLoader first, then it's parent, so the 
> final list looks like (in order):
> {code}
> URLClassLoader [IsolatedClientLoader]
> -- foo.jar
> -- hive-exec-2.3.8.jar
> -- spark-core_2.12-3.2.0.jar
> -- ...
> -- hive-exec-2.3.9.jar
> -- ...
> -- parent: boot classloader (JVM classes)
> {code}
> Now when a lookup happens, all of the JARs are within the same 
> URLClassLoader, and the user JARs are in front of the Spark ones, so the user 
> JARs get prioritized. This is the opposite of the expected behavior when 
> using the default user/application classloader in Spark, which has 
> parent-first behavior, prioritizing the Spark/system classes over the user 
> classes. (Note that this behavior is correct when using the 
> {{ChildFirstURLClassLoader}}.)
> After SPARK-37446, the NoSuchMethodError is no longer an issue, but this 
> still breaks assumptions about how user JARs should be treated vs. system 
> JARs, and presents the ability for the client to break in other ways. For 
> example in SPARK-37446 it describes a scenario whereby Hive 2.3.8 JARs have 
> been included; the changes in Hive 2.3.9 were needed to improve compatibility 
> with older HMS, so if a user were to accidentally include these older JARs, 
> it could break the ability of Spark to communicate with HMS 1.x
> I see two solutions to this:
> *(A) Remove the separate classloader entirely when using "builtin"*
> Starting from 3.0.0, due to SPARK-26839, when using Java 9+, we don't even 
> create a new classloader when using "builtin". This makes sense, as [called 
> out in this 
> comment|https://github.com/apache/spark/pull/24057#discussion_r265142878], 
> since the point of "builtin" is to use the existing JARs on the classpath 
> anyway. This proposes simply extending the changes from SPARK-26839 to all 
> Java versions, instead of restricting to Java 9+ only.
> *(B) Reverse the ordering of parent/child

[jira] [Reopened] (SPARK-42539) User-provided JARs can override Spark's Hive metadata client JARs when using "builtin"

2023-02-27 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reopened SPARK-42539:
--
  Assignee: (was: Erik Krogen)

Reverted in 
https://github.com/apache/spark/commit/5627ceeddb45f2796fb8ad08b9f1c8a163823b2b 
and 
https://github.com/apache/spark/commit/26009d47c1f80897d65445fe48d8d5f2edcf848c

> User-provided JARs can override Spark's Hive metadata client JARs when using 
> "builtin"
> --
>
> Key: SPARK-42539
> URL: https://issues.apache.org/jira/browse/SPARK-42539
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.3, 3.2.3, 3.3.2
>Reporter: Erik Krogen
>Priority: Major
>
> Recently we observed that on version 3.2.0 and Java 8, it is possible for 
> user-provided Hive JARs to break the ability for Spark, via the Hive metadata 
> client / {{IsolatedClientLoader}}, to communicate with Hive Metastore, when 
> using the default behavior of the "builtin" Hive version. After SPARK-35321, 
> when Spark is compiled against Hive >= 2.3.9 and the "builtin" Hive client 
> version is used, we will call the method {{Hive.getWithoutRegisterFns()}} 
> (from HIVE-21563) instead of {{Hive.get()}}. If the user has included, for 
> example, {{hive-exec-2.3.8.jar}} on their classpath, the client will break 
> with a {{NoSuchMethodError}}. This particular failure mode was resolved in 
> 3.2.1 by SPARK-37446, but while investigating, we found a general issue that 
> it's possible for user JARs to override Spark's own JARs -- but only inside 
> of the IsolatedClientLoader when using "builtin". This happens because even 
> when Spark is configured to use the "builtin" Hive classes, it still creates 
> a separate URLClassLoader for the HiveClientImpl used for HMS communication. 
> To get the set of JAR URLs to use for this classloader, Spark [collects all 
> of the JARs used by the user classloader (and its parent, and that 
> classloader's parent, and so 
> on)|https://github.com/apache/spark/blob/87e3d5625e76bb734b8dd753bfb25002822c8585/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala#L412-L438].
>  Thus the newly created classloader will have all of the same JARs as the 
> user classloader, but the ordering has been reversed! User JARs get 
> prioritized ahead of system JARs, because the classloader hierarchy is 
> traversed from bottom-to-top. For example let's say we have user JARs 
> "foo.jar" and "hive-exec-2.3.8.jar". The user classloader will look like this:
> {code}
> MutableURLClassLoader
> -- foo.jar
> -- hive-exec-2.3.8.jar
> -- parent: URLClassLoader
> - spark-core_2.12-3.2.0.jar
> - ...
> - hive-exec-2.3.9.jar
> - ...
> {code}
> This setup provides the expected behavior within the user classloader; it 
> will first check the parent, so hive-exec-2.3.9.jar takes precedence, and the 
> MutableURLClassLoader is only checked if the class doesn't exist in the 
> parent. But when a JAR list is constructed for the IsolatedClientLoader, it 
> traverses the URLs from MutableURLClassLoader first, then it's parent, so the 
> final list looks like (in order):
> {code}
> URLClassLoader [IsolatedClientLoader]
> -- foo.jar
> -- hive-exec-2.3.8.jar
> -- spark-core_2.12-3.2.0.jar
> -- ...
> -- hive-exec-2.3.9.jar
> -- ...
> -- parent: boot classloader (JVM classes)
> {code}
> Now when a lookup happens, all of the JARs are within the same 
> URLClassLoader, and the user JARs are in front of the Spark ones, so the user 
> JARs get prioritized. This is the opposite of the expected behavior when 
> using the default user/application classloader in Spark, which has 
> parent-first behavior, prioritizing the Spark/system classes over the user 
> classes. (Note that this behavior is correct when using the 
> {{ChildFirstURLClassLoader}}.)
> After SPARK-37446, the NoSuchMethodError is no longer an issue, but this 
> still breaks assumptions about how user JARs should be treated vs. system 
> JARs, and presents the ability for the client to break in other ways. For 
> example in SPARK-37446 it describes a scenario whereby Hive 2.3.8 JARs have 
> been included; the changes in Hive 2.3.9 were needed to improve compatibility 
> with older HMS, so if a user were to accidentally include these older JARs, 
> it could break the ability of Spark to communicate with HMS 1.x
> I see two solutions to this:
> *(A) Remove the separate classloader entirely when using "builtin"*
> Starting from 3.0.0, due to SPARK-26839, when using Java 9+, we don't even 
> create a new classloader when using "builtin". This makes sense, as [called 
> out in this 
> comment|https://github.com/apache/spark/pull/24057#discussion_r265142878], 
> since the point of "builtin" is to use the existing JARs on the classpath 
>

[jira] [Updated] (SPARK-42539) User-provided JARs can override Spark's Hive metadata client JARs when using "builtin"

2023-02-27 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-42539:
-
Fix Version/s: (was: 3.4.0)
   (was: 3.5.0)

> User-provided JARs can override Spark's Hive metadata client JARs when using 
> "builtin"
> --
>
> Key: SPARK-42539
> URL: https://issues.apache.org/jira/browse/SPARK-42539
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.3, 3.2.3, 3.3.2
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
>
> Recently we observed that on version 3.2.0 and Java 8, it is possible for 
> user-provided Hive JARs to break the ability for Spark, via the Hive metadata 
> client / {{IsolatedClientLoader}}, to communicate with Hive Metastore, when 
> using the default behavior of the "builtin" Hive version. After SPARK-35321, 
> when Spark is compiled against Hive >= 2.3.9 and the "builtin" Hive client 
> version is used, we will call the method {{Hive.getWithoutRegisterFns()}} 
> (from HIVE-21563) instead of {{Hive.get()}}. If the user has included, for 
> example, {{hive-exec-2.3.8.jar}} on their classpath, the client will break 
> with a {{NoSuchMethodError}}. This particular failure mode was resolved in 
> 3.2.1 by SPARK-37446, but while investigating, we found a general issue that 
> it's possible for user JARs to override Spark's own JARs -- but only inside 
> of the IsolatedClientLoader when using "builtin". This happens because even 
> when Spark is configured to use the "builtin" Hive classes, it still creates 
> a separate URLClassLoader for the HiveClientImpl used for HMS communication. 
> To get the set of JAR URLs to use for this classloader, Spark [collects all 
> of the JARs used by the user classloader (and its parent, and that 
> classloader's parent, and so 
> on)|https://github.com/apache/spark/blob/87e3d5625e76bb734b8dd753bfb25002822c8585/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala#L412-L438].
>  Thus the newly created classloader will have all of the same JARs as the 
> user classloader, but the ordering has been reversed! User JARs get 
> prioritized ahead of system JARs, because the classloader hierarchy is 
> traversed from bottom-to-top. For example let's say we have user JARs 
> "foo.jar" and "hive-exec-2.3.8.jar". The user classloader will look like this:
> {code}
> MutableURLClassLoader
> -- foo.jar
> -- hive-exec-2.3.8.jar
> -- parent: URLClassLoader
> - spark-core_2.12-3.2.0.jar
> - ...
> - hive-exec-2.3.9.jar
> - ...
> {code}
> This setup provides the expected behavior within the user classloader; it 
> will first check the parent, so hive-exec-2.3.9.jar takes precedence, and the 
> MutableURLClassLoader is only checked if the class doesn't exist in the 
> parent. But when a JAR list is constructed for the IsolatedClientLoader, it 
> traverses the URLs from MutableURLClassLoader first, then it's parent, so the 
> final list looks like (in order):
> {code}
> URLClassLoader [IsolatedClientLoader]
> -- foo.jar
> -- hive-exec-2.3.8.jar
> -- spark-core_2.12-3.2.0.jar
> -- ...
> -- hive-exec-2.3.9.jar
> -- ...
> -- parent: boot classloader (JVM classes)
> {code}
> Now when a lookup happens, all of the JARs are within the same 
> URLClassLoader, and the user JARs are in front of the Spark ones, so the user 
> JARs get prioritized. This is the opposite of the expected behavior when 
> using the default user/application classloader in Spark, which has 
> parent-first behavior, prioritizing the Spark/system classes over the user 
> classes. (Note that this behavior is correct when using the 
> {{ChildFirstURLClassLoader}}.)
> After SPARK-37446, the NoSuchMethodError is no longer an issue, but this 
> still breaks assumptions about how user JARs should be treated vs. system 
> JARs, and presents the ability for the client to break in other ways. For 
> example in SPARK-37446 it describes a scenario whereby Hive 2.3.8 JARs have 
> been included; the changes in Hive 2.3.9 were needed to improve compatibility 
> with older HMS, so if a user were to accidentally include these older JARs, 
> it could break the ability of Spark to communicate with HMS 1.x
> I see two solutions to this:
> *(A) Remove the separate classloader entirely when using "builtin"*
> Starting from 3.0.0, due to SPARK-26839, when using Java 9+, we don't even 
> create a new classloader when using "builtin". This makes sense, as [called 
> out in this 
> comment|https://github.com/apache/spark/pull/24057#discussion_r265142878], 
> since the point of "builtin" is to use the existing JARs on the classpath 
> anyway. This proposes simply extending the changes from SPARK-26839 to all 
> Java versions, instead of restricting

[jira] [Resolved] (SPARK-42612) Enable more parity tests related to functions

2023-02-27 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-42612.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 40203
[https://github.com/apache/spark/pull/40203]

> Enable more parity tests related to functions
> -
>
> Key: SPARK-42612
> URL: https://issues.apache.org/jira/browse/SPARK-42612
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Takuya Ueshin
>Assignee: Takuya Ueshin
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42612) Enable more parity tests related to functions

2023-02-27 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-42612:


Assignee: Takuya Ueshin

> Enable more parity tests related to functions
> -
>
> Key: SPARK-42612
> URL: https://issues.apache.org/jira/browse/SPARK-42612
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Takuya Ueshin
>Assignee: Takuya Ueshin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40776) Add documentation (similar to Avro functions).

2023-02-27 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-40776:


Assignee: Sandish Kumar HN  (was: Raghu Angadi)

> Add documentation (similar to Avro functions).
> --
>
> Key: SPARK-40776
> URL: https://issues.apache.org/jira/browse/SPARK-40776
> Project: Spark
>  Issue Type: Improvement
>  Components: Protobuf, Structured Streaming
>Affects Versions: 3.3.0
>Reporter: Raghu Angadi
>Assignee: Sandish Kumar HN
>Priority: Major
> Fix For: 3.4.0
>
>
> Build documentation for protobufs similar to 
> [https://spark.apache.org/docs/latest/sql-data-sources-avro.html]
> This is a follow up from Protobuf PR 
> https://github.com/apache/spark/pull/37972
> cc: [~sanysand...@gmail.com] .



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-40776) Add documentation (similar to Avro functions).

2023-02-27 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-40776.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39039
[https://github.com/apache/spark/pull/39039]

> Add documentation (similar to Avro functions).
> --
>
> Key: SPARK-40776
> URL: https://issues.apache.org/jira/browse/SPARK-40776
> Project: Spark
>  Issue Type: Improvement
>  Components: Protobuf, Structured Streaming
>Affects Versions: 3.3.0
>Reporter: Raghu Angadi
>Assignee: Raghu Angadi
>Priority: Major
> Fix For: 3.4.0
>
>
> Build documentation for protobufs similar to 
> [https://spark.apache.org/docs/latest/sql-data-sources-avro.html]
> This is a follow up from Protobuf PR 
> https://github.com/apache/spark/pull/37972
> cc: [~sanysand...@gmail.com] .



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-40776) Add documentation (similar to Avro functions).

2023-02-27 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-40776:


Assignee: Raghu Angadi

> Add documentation (similar to Avro functions).
> --
>
> Key: SPARK-40776
> URL: https://issues.apache.org/jira/browse/SPARK-40776
> Project: Spark
>  Issue Type: Improvement
>  Components: Protobuf, Structured Streaming
>Affects Versions: 3.3.0
>Reporter: Raghu Angadi
>Assignee: Raghu Angadi
>Priority: Major
>
> Build documentation for protobufs similar to 
> [https://spark.apache.org/docs/latest/sql-data-sources-avro.html]
> This is a follow up from Protobuf PR 
> https://github.com/apache/spark/pull/37972
> cc: [~sanysand...@gmail.com] .



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42515) ClientE2ETestSuite local test failed

2023-02-27 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-42515:


Assignee: Yang Jie

> ClientE2ETestSuite local test failed
> 
>
> Key: SPARK-42515
> URL: https://issues.apache.org/jira/browse/SPARK-42515
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.4.0, 3.5.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
>
>  
> local run `build/sbt clean "connect-client-jvm/test"`, 
> `ClientE2ETestSuite#write table` failed, GA not failed.
>  
> {code:java}
> [info] - rite table *** FAILED *** (41 milliseconds)
> [info]   io.grpc.StatusRuntimeException: UNKNOWN: 
> org/apache/parquet/hadoop/api/ReadSupport
> [info]   at io.grpc.Status.asRuntimeException(Status.java:535)
> [info]   at 
> io.grpc.stub.ClientCalls$BlockingResponseStream.hasNext(ClientCalls.java:660)
> [info]   at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:45)
> [info]   at scala.collection.Iterator.foreach(Iterator.scala:943)
> [info]   at scala.collection.Iterator.foreach$(Iterator.scala:943)
> [info]   at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
> [info]   at org.apache.spark.sql.SparkSession.execute(SparkSession.scala:169)
> [info]   at 
> org.apache.spark.sql.DataFrameWriter.executeWriteOperation(DataFrameWriter.scala:255)
> [info]   at 
> org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:338)
> [info]   at 
> org.apache.spark.sql.ClientE2ETestSuite.$anonfun$new$12(ClientE2ETestSuite.scala:145)
> [info]   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
> [info]   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
> [info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
> [info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
> [info]   at org.scalatest.Transformer.apply(Transformer.scala:20)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226)
> [info]   at org.scalatest.TestSuite.withFixture(TestSuite.scala:196)
> [info]   at org.scalatest.TestSuite.withFixture$(TestSuite.scala:195)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuite.withFixture(AnyFunSuite.scala:1564)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236)
> [info]   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218)
> [info]   at org.scalatest.funsuite.AnyFunSuite.runTest(AnyFunSuite.scala:1564)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269)
> [info]   at 
> org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413)
> [info]   at scala.collection.immutable.List.foreach(List.scala:431)
> [info]   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
> [info]   at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:396)
> [info]   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:475)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTests(AnyFunSuiteLike.scala:269)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTests$(AnyFunSuiteLike.scala:268)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuite.runTests(AnyFunSuite.scala:1564)
> [info]   at org.scalatest.Suite.run(Suite.scala:1114)
> [info]   at org.scalatest.Suite.run$(Suite.scala:1096)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuite.org$scalatest$funsuite$AnyFunSuiteLike$$super$run(AnyFunSuite.scala:1564)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$run$1(AnyFunSuiteLike.scala:273)
> [info]   at org.scalatest.SuperEngine.runImpl(Engine.scala:535)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.run(AnyFunSuiteLike.scala:273)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.run$(AnyFunSuiteLike.scala:272)
> [info]   at 
> org.apache.spark.sql.ClientE2ETestSuite.org$scalatest$BeforeAndAfterAll$$super$run(ClientE2ETestSuite.scala:33)
> [info]   at 
> org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213)
> [info]   at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
> [info]   at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
> [info]   at 
> org.apache.spark.sql.ClientE2ETestSuite.run(ClientE2ETestSuite.scala:33)
> [info]   at 
> org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:321)
> [info]   at 
> org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:517)
> [info]   at

[jira] [Resolved] (SPARK-42515) ClientE2ETestSuite local test failed

2023-02-27 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-42515.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 40136
[https://github.com/apache/spark/pull/40136]

> ClientE2ETestSuite local test failed
> 
>
> Key: SPARK-42515
> URL: https://issues.apache.org/jira/browse/SPARK-42515
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.4.0, 3.5.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
> Fix For: 3.4.0
>
>
>  
> local run `build/sbt clean "connect-client-jvm/test"`, 
> `ClientE2ETestSuite#write table` failed, GA not failed.
>  
> {code:java}
> [info] - rite table *** FAILED *** (41 milliseconds)
> [info]   io.grpc.StatusRuntimeException: UNKNOWN: 
> org/apache/parquet/hadoop/api/ReadSupport
> [info]   at io.grpc.Status.asRuntimeException(Status.java:535)
> [info]   at 
> io.grpc.stub.ClientCalls$BlockingResponseStream.hasNext(ClientCalls.java:660)
> [info]   at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:45)
> [info]   at scala.collection.Iterator.foreach(Iterator.scala:943)
> [info]   at scala.collection.Iterator.foreach$(Iterator.scala:943)
> [info]   at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
> [info]   at org.apache.spark.sql.SparkSession.execute(SparkSession.scala:169)
> [info]   at 
> org.apache.spark.sql.DataFrameWriter.executeWriteOperation(DataFrameWriter.scala:255)
> [info]   at 
> org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:338)
> [info]   at 
> org.apache.spark.sql.ClientE2ETestSuite.$anonfun$new$12(ClientE2ETestSuite.scala:145)
> [info]   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
> [info]   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
> [info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
> [info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
> [info]   at org.scalatest.Transformer.apply(Transformer.scala:20)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226)
> [info]   at org.scalatest.TestSuite.withFixture(TestSuite.scala:196)
> [info]   at org.scalatest.TestSuite.withFixture$(TestSuite.scala:195)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuite.withFixture(AnyFunSuite.scala:1564)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236)
> [info]   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218)
> [info]   at org.scalatest.funsuite.AnyFunSuite.runTest(AnyFunSuite.scala:1564)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269)
> [info]   at 
> org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413)
> [info]   at scala.collection.immutable.List.foreach(List.scala:431)
> [info]   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
> [info]   at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:396)
> [info]   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:475)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTests(AnyFunSuiteLike.scala:269)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTests$(AnyFunSuiteLike.scala:268)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuite.runTests(AnyFunSuite.scala:1564)
> [info]   at org.scalatest.Suite.run(Suite.scala:1114)
> [info]   at org.scalatest.Suite.run$(Suite.scala:1096)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuite.org$scalatest$funsuite$AnyFunSuiteLike$$super$run(AnyFunSuite.scala:1564)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$run$1(AnyFunSuiteLike.scala:273)
> [info]   at org.scalatest.SuperEngine.runImpl(Engine.scala:535)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.run(AnyFunSuiteLike.scala:273)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.run$(AnyFunSuiteLike.scala:272)
> [info]   at 
> org.apache.spark.sql.ClientE2ETestSuite.org$scalatest$BeforeAndAfterAll$$super$run(ClientE2ETestSuite.scala:33)
> [info]   at 
> org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213)
> [info]   at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
> [info]   at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
> [info]   at 
> org.apache.spark.sql.ClientE2ETestSuite.run(ClientE2ETestSuite.scala:33)
> [info]   at 
>

[jira] [Resolved] (SPARK-42367) DataFrame.drop should handle duplicated columns properly

2023-02-27 Thread Ruifeng Zheng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-42367.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 40013
[https://github.com/apache/spark/pull/40013]

> DataFrame.drop should handle duplicated columns properly
> 
>
> Key: SPARK-42367
> URL: https://issues.apache.org/jira/browse/SPARK-42367
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
> Fix For: 3.4.0
>
>
> {code:java}
> >>> df.join(df2, df.name == df2.name, 'inner').show()
> +---++--++
> |age|name|height|name|
> +---++--++
> | 16| Bob|85| Bob|
> | 14| Tom|80| Tom|
> +---++--++
> >>> df.join(df2, df.name == df2.name, 'inner').drop('name').show()
> +---+--+
> |age|height|
> +---+--+
> | 16|85|
> | 14|80|
> +---+--+
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42367) DataFrame.drop should handle duplicated columns properly

2023-02-27 Thread Ruifeng Zheng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-42367:
-

Assignee: Ruifeng Zheng

> DataFrame.drop should handle duplicated columns properly
> 
>
> Key: SPARK-42367
> URL: https://issues.apache.org/jira/browse/SPARK-42367
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>
> {code:java}
> >>> df.join(df2, df.name == df2.name, 'inner').show()
> +---++--++
> |age|name|height|name|
> +---++--++
> | 16| Bob|85| Bob|
> | 14| Tom|80| Tom|
> +---++--++
> >>> df.join(df2, df.name == df2.name, 'inner').drop('name').show()
> +---+--+
> |age|height|
> +---+--+
> | 16|85|
> | 14|80|
> +---+--+
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-42572) Logic error for StateStore.validateStateRowFormat

2023-02-27 Thread Jungtaek Lim (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim resolved SPARK-42572.
--
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 40187
[https://github.com/apache/spark/pull/40187]

> Logic error for StateStore.validateStateRowFormat
> -
>
> Key: SPARK-42572
> URL: https://issues.apache.org/jira/browse/SPARK-42572
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Wei Liu
>Assignee: Wei Liu
>Priority: Major
> Fix For: 3.5.0
>
>
> SPARK-42484 Changed the logic of whether to check state store format in 
> StateStore.validateStateRowFormat. Revert it and add unit test to make sure 
> this won't happen again



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42572) Logic error for StateStore.validateStateRowFormat

2023-02-27 Thread Jungtaek Lim (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim reassigned SPARK-42572:


Assignee: Wei Liu

> Logic error for StateStore.validateStateRowFormat
> -
>
> Key: SPARK-42572
> URL: https://issues.apache.org/jira/browse/SPARK-42572
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Wei Liu
>Assignee: Wei Liu
>Priority: Major
>
> SPARK-42484 Changed the logic of whether to check state store format in 
> StateStore.validateStateRowFormat. Revert it and add unit test to make sure 
> this won't happen again



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42592) Document SS guide doc for supporting multiple stateful operators (especially chained aggregations)

2023-02-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17694281#comment-17694281
 ] 

Apache Spark commented on SPARK-42592:
--

User 'HeartSaVioR' has created a pull request for this issue:
https://github.com/apache/spark/pull/40208

> Document SS guide doc for supporting multiple stateful operators (especially 
> chained aggregations)
> --
>
> Key: SPARK-42592
> URL: https://issues.apache.org/jira/browse/SPARK-42592
> Project: Spark
>  Issue Type: Documentation
>  Components: Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
> Fix For: 3.4.1, 3.5.0
>
>
> We made a change on the guide doc for SPARK-40925 via SPARK-42105, but from 
> SPARK-42105 we only removed the section of "limitation of global watermark". 
> That said, we haven't provided any example of new functionality, especially 
> that users need to know about the change of SQL function (window) in chained 
> time window aggregations.
> In this ticket, we will add the example of chained time window aggregations, 
> with introducing new functionality of SQL function.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42614) Make all constructors private[sql]

2023-02-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17694279#comment-17694279
 ] 

Apache Spark commented on SPARK-42614:
--

User 'hvanhovell' has created a pull request for this issue:
https://github.com/apache/spark/pull/40207

> Make all constructors private[sql]
> --
>
> Key: SPARK-42614
> URL: https://issues.apache.org/jira/browse/SPARK-42614
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Assignee: Herman van Hövell
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42614) Make all constructors private[sql]

2023-02-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42614:


Assignee: Herman van Hövell  (was: Apache Spark)

> Make all constructors private[sql]
> --
>
> Key: SPARK-42614
> URL: https://issues.apache.org/jira/browse/SPARK-42614
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Assignee: Herman van Hövell
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42614) Make all constructors private[sql]

2023-02-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42614:


Assignee: Apache Spark  (was: Herman van Hövell)

> Make all constructors private[sql]
> --
>
> Key: SPARK-42614
> URL: https://issues.apache.org/jira/browse/SPARK-42614
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42614) Make all constructors private[sql]

2023-02-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17694278#comment-17694278
 ] 

Apache Spark commented on SPARK-42614:
--

User 'hvanhovell' has created a pull request for this issue:
https://github.com/apache/spark/pull/40207

> Make all constructors private[sql]
> --
>
> Key: SPARK-42614
> URL: https://issues.apache.org/jira/browse/SPARK-42614
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Assignee: Herman van Hövell
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-42611) Insert char/varchar length checks for inner fields during resolution

2023-02-27 Thread Anton Okolnychyi (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Okolnychyi updated SPARK-42611:
-
Affects Version/s: 3.3.2
   3.3.1
   3.3.0
   3.3.3

> Insert char/varchar length checks for inner fields during resolution
> 
>
> Key: SPARK-42611
> URL: https://issues.apache.org/jira/browse/SPARK-42611
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0, 3.3.1, 3.3.3, 3.3.2, 3.4.0
>Reporter: Anton Okolnychyi
>Priority: Major
>
> In SPARK-36498, we added support for reordering inner fields in structs 
> during resolution. Unfortunately, we don't add any length validation for 
> nested char/varchar columns in that path.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-42614) Make all constructors private[sql]

2023-02-27 Thread Jira

Herman van Hövell created SPARK-42614:
-

 Summary: Make all constructors private[sql]
 Key: SPARK-42614
 URL: https://issues.apache.org/jira/browse/SPARK-42614
 Project: Spark
  Issue Type: New Feature
  Components: Connect
Affects Versions: 3.4.0
Reporter: Herman van Hövell
Assignee: Herman van Hövell






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42611) Insert char/varchar length checks for inner fields during resolution

2023-02-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42611:


Assignee: (was: Apache Spark)

> Insert char/varchar length checks for inner fields during resolution
> 
>
> Key: SPARK-42611
> URL: https://issues.apache.org/jira/browse/SPARK-42611
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Anton Okolnychyi
>Priority: Major
>
> In SPARK-36498, we added support for reordering inner fields in structs 
> during resolution. Unfortunately, we don't add any length validation for 
> nested char/varchar columns in that path.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-42550) table directory will lost on hdfs when `INSERT OVERWRITE` faild

2023-02-27 Thread kevinshin (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kevinshin closed SPARK-42550.
-

> table directory will lost on hdfs when `INSERT OVERWRITE` faild
> ---
>
> Key: SPARK-42550
> URL: https://issues.apache.org/jira/browse/SPARK-42550
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.3
> Environment: spark 3.2.3 / HDP 3.1.4
>Reporter: kevinshin
>Priority: Critical
> Attachments: image-2023-02-24-15-21-55-273.png, 
> image-2023-02-24-15-23-32-977.png, image-2023-02-24-15-25-57-770.png
>
>
> {color:#4c9aff}when a  `{*}INSERT{*} OVERWRITE *TABLE`* statment faild during 
> execution, the table's directory will be deleted.  this is not happen in 
> spark 3.2.1.{color}
> {color:#4c9aff}for example: {color}
> *CREATE* *TABLE* *IF* *NOT* *EXISTS* test.spark32_overwrite(amt1 {*}int{*}) 
> STORED *AS* ORC;
> *INSERT* OVERWRITE *TABLE* test.spark32_overwrite *select* 128;
> *CREATE* *TABLE* *IF* *NOT* *EXISTS* test.spark32_overwrite2(amt1 long) 
> STORED *AS* ORC;
> *INSERT* OVERWRITE *TABLE* test.spark32_overwrite *select* 644164;
> *INSERT* OVERWRITE *TABLE* test.spark32_overwrite *select* amt1 *from* 
> test.spark32_overwrite2;    – {color:#de350b}*this will got Casting overflow 
> exception*{color}
> {color:#4c9aff}and then :{color}
> *select* * *from* test.spark32_overwrite; 
> {color:#4c9aff}will got error:{color}
> {color:#172b4d}java.io.FileNotFoundException{color}
> {color:#172b4d}!image-2023-02-24-15-21-55-273.png!{color}
> {color:#172b4d}the table's directory is losted. use `hdfs dfs -ls` cmd to 
> check:{color}
> !image-2023-02-24-15-23-32-977.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42611) Insert char/varchar length checks for inner fields during resolution

2023-02-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17694272#comment-17694272
 ] 

Apache Spark commented on SPARK-42611:
--

User 'aokolnychyi' has created a pull request for this issue:
https://github.com/apache/spark/pull/40206

> Insert char/varchar length checks for inner fields during resolution
> 
>
> Key: SPARK-42611
> URL: https://issues.apache.org/jira/browse/SPARK-42611
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Anton Okolnychyi
>Priority: Major
>
> In SPARK-36498, we added support for reordering inner fields in structs 
> during resolution. Unfortunately, we don't add any length validation for 
> nested char/varchar columns in that path.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42611) Insert char/varchar length checks for inner fields during resolution

2023-02-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42611:


Assignee: Apache Spark

> Insert char/varchar length checks for inner fields during resolution
> 
>
> Key: SPARK-42611
> URL: https://issues.apache.org/jira/browse/SPARK-42611
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Anton Okolnychyi
>Assignee: Apache Spark
>Priority: Major
>
> In SPARK-36498, we added support for reordering inner fields in structs 
> during resolution. Unfortunately, we don't add any length validation for 
> nested char/varchar columns in that path.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42611) Insert char/varchar length checks for inner fields during resolution

2023-02-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17694271#comment-17694271
 ] 

Apache Spark commented on SPARK-42611:
--

User 'aokolnychyi' has created a pull request for this issue:
https://github.com/apache/spark/pull/40206

> Insert char/varchar length checks for inner fields during resolution
> 
>
> Key: SPARK-42611
> URL: https://issues.apache.org/jira/browse/SPARK-42611
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Anton Okolnychyi
>Priority: Major
>
> In SPARK-36498, we added support for reordering inner fields in structs 
> during resolution. Unfortunately, we don't add any length validation for 
> nested char/varchar columns in that path.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42550) table directory will lost on hdfs when `INSERT OVERWRITE` faild

2023-02-27 Thread kevinshin (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17694270#comment-17694270
 ] 

kevinshin commented on SPARK-42550:
---

this is not spark's issue

> table directory will lost on hdfs when `INSERT OVERWRITE` faild
> ---
>
> Key: SPARK-42550
> URL: https://issues.apache.org/jira/browse/SPARK-42550
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.3
> Environment: spark 3.2.3 / HDP 3.1.4
>Reporter: kevinshin
>Priority: Critical
> Attachments: image-2023-02-24-15-21-55-273.png, 
> image-2023-02-24-15-23-32-977.png, image-2023-02-24-15-25-57-770.png
>
>
> {color:#4c9aff}when a  `{*}INSERT{*} OVERWRITE *TABLE`* statment faild during 
> execution, the table's directory will be deleted.  this is not happen in 
> spark 3.2.1.{color}
> {color:#4c9aff}for example: {color}
> *CREATE* *TABLE* *IF* *NOT* *EXISTS* test.spark32_overwrite(amt1 {*}int{*}) 
> STORED *AS* ORC;
> *INSERT* OVERWRITE *TABLE* test.spark32_overwrite *select* 128;
> *CREATE* *TABLE* *IF* *NOT* *EXISTS* test.spark32_overwrite2(amt1 long) 
> STORED *AS* ORC;
> *INSERT* OVERWRITE *TABLE* test.spark32_overwrite *select* 644164;
> *INSERT* OVERWRITE *TABLE* test.spark32_overwrite *select* amt1 *from* 
> test.spark32_overwrite2;    – {color:#de350b}*this will got Casting overflow 
> exception*{color}
> {color:#4c9aff}and then :{color}
> *select* * *from* test.spark32_overwrite; 
> {color:#4c9aff}will got error:{color}
> {color:#172b4d}java.io.FileNotFoundException{color}
> {color:#172b4d}!image-2023-02-24-15-21-55-273.png!{color}
> {color:#172b4d}the table's directory is losted. use `hdfs dfs -ls` cmd to 
> check:{color}
> !image-2023-02-24-15-23-32-977.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-42596) [YARN] OMP_NUM_THREADS not set to number of executor cores by default

2023-02-27 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-42596.
--
Fix Version/s: 3.3.3
   3.2.4
   3.4.0
   Resolution: Fixed

Issue resolved by pull request 40199
[https://github.com/apache/spark/pull/40199]

> [YARN] OMP_NUM_THREADS not set to number of executor cores by default
> -
>
> Key: SPARK-42596
> URL: https://issues.apache.org/jira/browse/SPARK-42596
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, YARN
>Affects Versions: 3.3.2
>Reporter: John Zhuge
>Assignee: John Zhuge
>Priority: Major
> Fix For: 3.3.3, 3.2.4, 3.4.0
>
>
> Run this PySpark script with `spark.executor.cores=1`
> {code:python}
> import os
> from pyspark.sql import SparkSession
> from pyspark.sql.functions import udf
> spark = SparkSession.builder.getOrCreate()
> var_name = 'OMP_NUM_THREADS'
> def get_env_var():
>   return os.getenv(var_name)
> udf_get_env_var = udf(get_env_var)
> spark.range(1).toDF("id").withColumn(f"env_{var_name}", 
> udf_get_env_var()).show(truncate=False)
> {code}
> Output with release `3.3.2`:
> {noformat}
> +---+---+
> |id |env_OMP_NUM_THREADS|
> +---+---+
> |0  |null   |
> +---+---+
> {noformat}
> Output with release `3.3.0`:
> {noformat}
> +---+---+
> |id |env_OMP_NUM_THREADS|
> +---+---+
> |0  |1  |
> +---+---+
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42596) [YARN] OMP_NUM_THREADS not set to number of executor cores by default

2023-02-27 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-42596:


Assignee: John Zhuge

> [YARN] OMP_NUM_THREADS not set to number of executor cores by default
> -
>
> Key: SPARK-42596
> URL: https://issues.apache.org/jira/browse/SPARK-42596
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, YARN
>Affects Versions: 3.3.2
>Reporter: John Zhuge
>Assignee: John Zhuge
>Priority: Major
>
> Run this PySpark script with `spark.executor.cores=1`
> {code:python}
> import os
> from pyspark.sql import SparkSession
> from pyspark.sql.functions import udf
> spark = SparkSession.builder.getOrCreate()
> var_name = 'OMP_NUM_THREADS'
> def get_env_var():
>   return os.getenv(var_name)
> udf_get_env_var = udf(get_env_var)
> spark.range(1).toDF("id").withColumn(f"env_{var_name}", 
> udf_get_env_var()).show(truncate=False)
> {code}
> Output with release `3.3.2`:
> {noformat}
> +---+---+
> |id |env_OMP_NUM_THREADS|
> +---+---+
> |0  |null   |
> +---+---+
> {noformat}
> Output with release `3.3.0`:
> {noformat}
> +---+---+
> |id |env_OMP_NUM_THREADS|
> +---+---+
> |0  |1  |
> +---+---+
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42610) Add implicit encoders to SQLImplicits

2023-02-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17694265#comment-17694265
 ] 

Apache Spark commented on SPARK-42610:
--

User 'hvanhovell' has created a pull request for this issue:
https://github.com/apache/spark/pull/40205

> Add implicit encoders to SQLImplicits
> -
>
> Key: SPARK-42610
> URL: https://issues.apache.org/jira/browse/SPARK-42610
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Assignee: Herman van Hövell
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42601) New physical type Decimal128 for DecimalType

2023-02-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17694259#comment-17694259
 ] 

Apache Spark commented on SPARK-42601:
--

User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/40204

> New physical type Decimal128 for DecimalType
> 
>
> Key: SPARK-42601
> URL: https://issues.apache.org/jira/browse/SPARK-42601
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42601) New physical type Decimal128 for DecimalType

2023-02-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42601:


Assignee: (was: Apache Spark)

> New physical type Decimal128 for DecimalType
> 
>
> Key: SPARK-42601
> URL: https://issues.apache.org/jira/browse/SPARK-42601
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42601) New physical type Decimal128 for DecimalType

2023-02-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42601:


Assignee: Apache Spark

> New physical type Decimal128 for DecimalType
> 
>
> Key: SPARK-42601
> URL: https://issues.apache.org/jira/browse/SPARK-42601
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42601) New physical type Decimal128 for DecimalType

2023-02-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17694258#comment-17694258
 ] 

Apache Spark commented on SPARK-42601:
--

User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/40204

> New physical type Decimal128 for DecimalType
> 
>
> Key: SPARK-42601
> URL: https://issues.apache.org/jira/browse/SPARK-42601
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-42613) PythonRunner should set OMP_NUM_THREADS to task cpus instead of executor cores by default

2023-02-27 Thread John Zhuge (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Zhuge updated SPARK-42613:
---
Description: 
Follow up from 
[https://github.com/apache/spark/pull/40199#discussion_r1119453996]

 

If OMP_NUM_THREADS is not set explicitly, we should set it to `spark.task.cpus` 
instead of `spark.executor.cores`. Otherwise, we will still have issues when 
executer core is set to a very large number but task cpus is 1.

  was:
Coming from [https://github.com/apache/spark/pull/40199#discussion_r1119453996]

 

If OMP_NUM_THREADS is not set explicitly, we should set it to `spark.task.cpus` 
instead of `spark.executor.cores`. Otherwise, we will still have issues when 
executer core is set to a very large number but task cpus is 1.


> PythonRunner should set OMP_NUM_THREADS to task cpus instead of executor 
> cores by default
> -
>
> Key: SPARK-42613
> URL: https://issues.apache.org/jira/browse/SPARK-42613
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, YARN
>Affects Versions: 3.3.0
>Reporter: John Zhuge
>Priority: Major
>
> Follow up from 
> [https://github.com/apache/spark/pull/40199#discussion_r1119453996]
>  
> If OMP_NUM_THREADS is not set explicitly, we should set it to 
> `spark.task.cpus` instead of `spark.executor.cores`. Otherwise, we will still 
> have issues when executer core is set to a very large number but task cpus is 
> 1.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-42613) PythonRunner should set OMP_NUM_THREADS to task cpus instead of executor cores by default

2023-02-27 Thread John Zhuge (Jira)

John Zhuge created SPARK-42613:
--

 Summary: PythonRunner should set OMP_NUM_THREADS to task cpus 
instead of executor cores by default
 Key: SPARK-42613
 URL: https://issues.apache.org/jira/browse/SPARK-42613
 Project: Spark
  Issue Type: Bug
  Components: PySpark, YARN
Affects Versions: 3.3.0
Reporter: John Zhuge


Coming from [https://github.com/apache/spark/pull/40199#discussion_r1119453996]

 

If OMP_NUM_THREADS is not set explicitly, we should set it to `spark.task.cpus` 
instead of `spark.executor.cores`. Otherwise, we will still have issues when 
executer core is set to a very large number but task cpus is 1.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42600) currentDatabase Shall use NamespaceHelper instead of MultipartIdentifierHelper

2023-02-27 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-42600:
---

Assignee: Kent Yao

> currentDatabase Shall use  NamespaceHelper instead of 
> MultipartIdentifierHelper
> ---
>
> Key: SPARK-42600
> URL: https://issues.apache.org/jira/browse/SPARK-42600
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.2
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-42600) currentDatabase Shall use NamespaceHelper instead of MultipartIdentifierHelper

2023-02-27 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-42600.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 40192
[https://github.com/apache/spark/pull/40192]

> currentDatabase Shall use  NamespaceHelper instead of 
> MultipartIdentifierHelper
> ---
>
> Key: SPARK-42600
> URL: https://issues.apache.org/jira/browse/SPARK-42600
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.2
>Reporter: Kent Yao
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-41956) Refetch shuffle blocks when executor is decommissioned

2023-02-27 Thread Zhongwei Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhongwei Zhu updated SPARK-41956:
-
Summary: Refetch shuffle blocks when executor is decommissioned  (was: 
Shuffle output location refetch in ShuffleBlockFetcherIterator)

> Refetch shuffle blocks when executor is decommissioned
> --
>
> Key: SPARK-41956
> URL: https://issues.apache.org/jira/browse/SPARK-41956
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.3.1
>Reporter: Zhongwei Zhu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42612) Enable more parity tests related to functions

2023-02-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42612:


Assignee: (was: Apache Spark)

> Enable more parity tests related to functions
> -
>
> Key: SPARK-42612
> URL: https://issues.apache.org/jira/browse/SPARK-42612
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Takuya Ueshin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42612) Enable more parity tests related to functions

2023-02-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42612:


Assignee: Apache Spark

> Enable more parity tests related to functions
> -
>
> Key: SPARK-42612
> URL: https://issues.apache.org/jira/browse/SPARK-42612
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Takuya Ueshin
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42612) Enable more parity tests related to functions

2023-02-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17694244#comment-17694244
 ] 

Apache Spark commented on SPARK-42612:
--

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/40203

> Enable more parity tests related to functions
> -
>
> Key: SPARK-42612
> URL: https://issues.apache.org/jira/browse/SPARK-42612
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Takuya Ueshin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42612) Enable more parity tests related to functions

2023-02-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17694245#comment-17694245
 ] 

Apache Spark commented on SPARK-42612:
--

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/40203

> Enable more parity tests related to functions
> -
>
> Key: SPARK-42612
> URL: https://issues.apache.org/jira/browse/SPARK-42612
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Takuya Ueshin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-42612) Enable more parity tests related to functions

2023-02-27 Thread Takuya Ueshin (Jira)

Takuya Ueshin created SPARK-42612:
-

 Summary: Enable more parity tests related to functions
 Key: SPARK-42612
 URL: https://issues.apache.org/jira/browse/SPARK-42612
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Takuya Ueshin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42608) Use full column names for inner fields in resolution errors

2023-02-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17694242#comment-17694242
 ] 

Apache Spark commented on SPARK-42608:
--

User 'aokolnychyi' has created a pull request for this issue:
https://github.com/apache/spark/pull/40202

> Use full column names for inner fields in resolution errors
> ---
>
> Key: SPARK-42608
> URL: https://issues.apache.org/jira/browse/SPARK-42608
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Anton Okolnychyi
>Priority: Major
>
> If there are multiple inner columns with the same name, resolution errors may 
> be confusing as we only use field names, not full column names.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42608) Use full column names for inner fields in resolution errors

2023-02-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42608:


Assignee: Apache Spark

> Use full column names for inner fields in resolution errors
> ---
>
> Key: SPARK-42608
> URL: https://issues.apache.org/jira/browse/SPARK-42608
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Anton Okolnychyi
>Assignee: Apache Spark
>Priority: Major
>
> If there are multiple inner columns with the same name, resolution errors may 
> be confusing as we only use field names, not full column names.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42608) Use full column names for inner fields in resolution errors

2023-02-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42608:


Assignee: (was: Apache Spark)

> Use full column names for inner fields in resolution errors
> ---
>
> Key: SPARK-42608
> URL: https://issues.apache.org/jira/browse/SPARK-42608
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Anton Okolnychyi
>Priority: Major
>
> If there are multiple inner columns with the same name, resolution errors may 
> be confusing as we only use field names, not full column names.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-42121) Add built-in table-valued functions posexplode and posexplode_outer

2023-02-27 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-42121.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 40151
[https://github.com/apache/spark/pull/40151]

> Add built-in table-valued functions posexplode and posexplode_outer
> ---
>
> Key: SPARK-42121
> URL: https://issues.apache.org/jira/browse/SPARK-42121
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
> Fix For: 3.4.0
>
>
> Add `posexplode` and `posexplode_outer` to the built-in table function 
> registry.
> Add new SQL tests in `table-valued-functions.sql` and `join-lateral.sql`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42121) Add built-in table-valued functions posexplode and posexplode_outer

2023-02-27 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-42121:
---

Assignee: Allison Wang

> Add built-in table-valued functions posexplode and posexplode_outer
> ---
>
> Key: SPARK-42121
> URL: https://issues.apache.org/jira/browse/SPARK-42121
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
>
> Add `posexplode` and `posexplode_outer` to the built-in table function 
> registry.
> Add new SQL tests in `table-valued-functions.sql` and `join-lateral.sql`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-42611) Insert char/varchar length checks for inner fields during resolution

2023-02-27 Thread Anton Okolnychyi (Jira)

Anton Okolnychyi created SPARK-42611:


 Summary: Insert char/varchar length checks for inner fields during 
resolution
 Key: SPARK-42611
 URL: https://issues.apache.org/jira/browse/SPARK-42611
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.4.0
Reporter: Anton Okolnychyi


In SPARK-36498, we added support for reordering inner fields in structs during 
resolution. Unfortunately, we don't add any length validation for nested 
char/varchar columns in that path.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-42592) Document SS guide doc for supporting multiple stateful operators (especially chained aggregations)

2023-02-27 Thread Jungtaek Lim (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim updated SPARK-42592:
-
Fix Version/s: 3.4.1

> Document SS guide doc for supporting multiple stateful operators (especially 
> chained aggregations)
> --
>
> Key: SPARK-42592
> URL: https://issues.apache.org/jira/browse/SPARK-42592
> Project: Spark
>  Issue Type: Documentation
>  Components: Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
> Fix For: 3.4.1, 3.5.0
>
>
> We made a change on the guide doc for SPARK-40925 via SPARK-42105, but from 
> SPARK-42105 we only removed the section of "limitation of global watermark". 
> That said, we haven't provided any example of new functionality, especially 
> that users need to know about the change of SQL function (window) in chained 
> time window aggregations.
> In this ticket, we will add the example of chained time window aggregations, 
> with introducing new functionality of SQL function.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42592) Document SS guide doc for supporting multiple stateful operators (especially chained aggregations)

2023-02-27 Thread Jungtaek Lim (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim reassigned SPARK-42592:


Assignee: Jungtaek Lim

> Document SS guide doc for supporting multiple stateful operators (especially 
> chained aggregations)
> --
>
> Key: SPARK-42592
> URL: https://issues.apache.org/jira/browse/SPARK-42592
> Project: Spark
>  Issue Type: Documentation
>  Components: Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
>
> We made a change on the guide doc for SPARK-40925 via SPARK-42105, but from 
> SPARK-42105 we only removed the section of "limitation of global watermark". 
> That said, we haven't provided any example of new functionality, especially 
> that users need to know about the change of SQL function (window) in chained 
> time window aggregations.
> In this ticket, we will add the example of chained time window aggregations, 
> with introducing new functionality of SQL function.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-42592) Document SS guide doc for supporting multiple stateful operators (especially chained aggregations)

2023-02-27 Thread Jungtaek Lim (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim resolved SPARK-42592.
--
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 40188
[https://github.com/apache/spark/pull/40188]

> Document SS guide doc for supporting multiple stateful operators (especially 
> chained aggregations)
> --
>
> Key: SPARK-42592
> URL: https://issues.apache.org/jira/browse/SPARK-42592
> Project: Spark
>  Issue Type: Documentation
>  Components: Structured Streaming
>Affects Versions: 3.5.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
> Fix For: 3.5.0
>
>
> We made a change on the guide doc for SPARK-40925 via SPARK-42105, but from 
> SPARK-42105 we only removed the section of "limitation of global watermark". 
> That said, we haven't provided any example of new functionality, especially 
> that users need to know about the change of SQL function (window) in chained 
> time window aggregations.
> In this ticket, we will add the example of chained time window aggregations, 
> with introducing new functionality of SQL function.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-42539) User-provided JARs can override Spark's Hive metadata client JARs when using "builtin"

2023-02-27 Thread Chao Sun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated SPARK-42539:
-
Fix Version/s: 3.4.0

> User-provided JARs can override Spark's Hive metadata client JARs when using 
> "builtin"
> --
>
> Key: SPARK-42539
> URL: https://issues.apache.org/jira/browse/SPARK-42539
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.3, 3.2.3, 3.3.2
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Fix For: 3.4.0, 3.5.0
>
>
> Recently we observed that on version 3.2.0 and Java 8, it is possible for 
> user-provided Hive JARs to break the ability for Spark, via the Hive metadata 
> client / {{IsolatedClientLoader}}, to communicate with Hive Metastore, when 
> using the default behavior of the "builtin" Hive version. After SPARK-35321, 
> when Spark is compiled against Hive >= 2.3.9 and the "builtin" Hive client 
> version is used, we will call the method {{Hive.getWithoutRegisterFns()}} 
> (from HIVE-21563) instead of {{Hive.get()}}. If the user has included, for 
> example, {{hive-exec-2.3.8.jar}} on their classpath, the client will break 
> with a {{NoSuchMethodError}}. This particular failure mode was resolved in 
> 3.2.1 by SPARK-37446, but while investigating, we found a general issue that 
> it's possible for user JARs to override Spark's own JARs -- but only inside 
> of the IsolatedClientLoader when using "builtin". This happens because even 
> when Spark is configured to use the "builtin" Hive classes, it still creates 
> a separate URLClassLoader for the HiveClientImpl used for HMS communication. 
> To get the set of JAR URLs to use for this classloader, Spark [collects all 
> of the JARs used by the user classloader (and its parent, and that 
> classloader's parent, and so 
> on)|https://github.com/apache/spark/blob/87e3d5625e76bb734b8dd753bfb25002822c8585/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala#L412-L438].
>  Thus the newly created classloader will have all of the same JARs as the 
> user classloader, but the ordering has been reversed! User JARs get 
> prioritized ahead of system JARs, because the classloader hierarchy is 
> traversed from bottom-to-top. For example let's say we have user JARs 
> "foo.jar" and "hive-exec-2.3.8.jar". The user classloader will look like this:
> {code}
> MutableURLClassLoader
> -- foo.jar
> -- hive-exec-2.3.8.jar
> -- parent: URLClassLoader
> - spark-core_2.12-3.2.0.jar
> - ...
> - hive-exec-2.3.9.jar
> - ...
> {code}
> This setup provides the expected behavior within the user classloader; it 
> will first check the parent, so hive-exec-2.3.9.jar takes precedence, and the 
> MutableURLClassLoader is only checked if the class doesn't exist in the 
> parent. But when a JAR list is constructed for the IsolatedClientLoader, it 
> traverses the URLs from MutableURLClassLoader first, then it's parent, so the 
> final list looks like (in order):
> {code}
> URLClassLoader [IsolatedClientLoader]
> -- foo.jar
> -- hive-exec-2.3.8.jar
> -- spark-core_2.12-3.2.0.jar
> -- ...
> -- hive-exec-2.3.9.jar
> -- ...
> -- parent: boot classloader (JVM classes)
> {code}
> Now when a lookup happens, all of the JARs are within the same 
> URLClassLoader, and the user JARs are in front of the Spark ones, so the user 
> JARs get prioritized. This is the opposite of the expected behavior when 
> using the default user/application classloader in Spark, which has 
> parent-first behavior, prioritizing the Spark/system classes over the user 
> classes. (Note that this behavior is correct when using the 
> {{ChildFirstURLClassLoader}}.)
> After SPARK-37446, the NoSuchMethodError is no longer an issue, but this 
> still breaks assumptions about how user JARs should be treated vs. system 
> JARs, and presents the ability for the client to break in other ways. For 
> example in SPARK-37446 it describes a scenario whereby Hive 2.3.8 JARs have 
> been included; the changes in Hive 2.3.9 were needed to improve compatibility 
> with older HMS, so if a user were to accidentally include these older JARs, 
> it could break the ability of Spark to communicate with HMS 1.x
> I see two solutions to this:
> *(A) Remove the separate classloader entirely when using "builtin"*
> Starting from 3.0.0, due to SPARK-26839, when using Java 9+, we don't even 
> create a new classloader when using "builtin". This makes sense, as [called 
> out in this 
> comment|https://github.com/apache/spark/pull/24057#discussion_r265142878], 
> since the point of "builtin" is to use the existing JARs on the classpath 
> anyway. This proposes simply extending the changes from SPARK-26839 to all 
> Java versions, instead of restricting to Java 9+ only.
>

[jira] [Resolved] (SPARK-42539) User-provided JARs can override Spark's Hive metadata client JARs when using "builtin"

2023-02-27 Thread Chao Sun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun resolved SPARK-42539.
--
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 40144
[https://github.com/apache/spark/pull/40144]

> User-provided JARs can override Spark's Hive metadata client JARs when using 
> "builtin"
> --
>
> Key: SPARK-42539
> URL: https://issues.apache.org/jira/browse/SPARK-42539
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.3, 3.2.3, 3.3.2
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Fix For: 3.5.0
>
>
> Recently we observed that on version 3.2.0 and Java 8, it is possible for 
> user-provided Hive JARs to break the ability for Spark, via the Hive metadata 
> client / {{IsolatedClientLoader}}, to communicate with Hive Metastore, when 
> using the default behavior of the "builtin" Hive version. After SPARK-35321, 
> when Spark is compiled against Hive >= 2.3.9 and the "builtin" Hive client 
> version is used, we will call the method {{Hive.getWithoutRegisterFns()}} 
> (from HIVE-21563) instead of {{Hive.get()}}. If the user has included, for 
> example, {{hive-exec-2.3.8.jar}} on their classpath, the client will break 
> with a {{NoSuchMethodError}}. This particular failure mode was resolved in 
> 3.2.1 by SPARK-37446, but while investigating, we found a general issue that 
> it's possible for user JARs to override Spark's own JARs -- but only inside 
> of the IsolatedClientLoader when using "builtin". This happens because even 
> when Spark is configured to use the "builtin" Hive classes, it still creates 
> a separate URLClassLoader for the HiveClientImpl used for HMS communication. 
> To get the set of JAR URLs to use for this classloader, Spark [collects all 
> of the JARs used by the user classloader (and its parent, and that 
> classloader's parent, and so 
> on)|https://github.com/apache/spark/blob/87e3d5625e76bb734b8dd753bfb25002822c8585/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala#L412-L438].
>  Thus the newly created classloader will have all of the same JARs as the 
> user classloader, but the ordering has been reversed! User JARs get 
> prioritized ahead of system JARs, because the classloader hierarchy is 
> traversed from bottom-to-top. For example let's say we have user JARs 
> "foo.jar" and "hive-exec-2.3.8.jar". The user classloader will look like this:
> {code}
> MutableURLClassLoader
> -- foo.jar
> -- hive-exec-2.3.8.jar
> -- parent: URLClassLoader
> - spark-core_2.12-3.2.0.jar
> - ...
> - hive-exec-2.3.9.jar
> - ...
> {code}
> This setup provides the expected behavior within the user classloader; it 
> will first check the parent, so hive-exec-2.3.9.jar takes precedence, and the 
> MutableURLClassLoader is only checked if the class doesn't exist in the 
> parent. But when a JAR list is constructed for the IsolatedClientLoader, it 
> traverses the URLs from MutableURLClassLoader first, then it's parent, so the 
> final list looks like (in order):
> {code}
> URLClassLoader [IsolatedClientLoader]
> -- foo.jar
> -- hive-exec-2.3.8.jar
> -- spark-core_2.12-3.2.0.jar
> -- ...
> -- hive-exec-2.3.9.jar
> -- ...
> -- parent: boot classloader (JVM classes)
> {code}
> Now when a lookup happens, all of the JARs are within the same 
> URLClassLoader, and the user JARs are in front of the Spark ones, so the user 
> JARs get prioritized. This is the opposite of the expected behavior when 
> using the default user/application classloader in Spark, which has 
> parent-first behavior, prioritizing the Spark/system classes over the user 
> classes. (Note that this behavior is correct when using the 
> {{ChildFirstURLClassLoader}}.)
> After SPARK-37446, the NoSuchMethodError is no longer an issue, but this 
> still breaks assumptions about how user JARs should be treated vs. system 
> JARs, and presents the ability for the client to break in other ways. For 
> example in SPARK-37446 it describes a scenario whereby Hive 2.3.8 JARs have 
> been included; the changes in Hive 2.3.9 were needed to improve compatibility 
> with older HMS, so if a user were to accidentally include these older JARs, 
> it could break the ability of Spark to communicate with HMS 1.x
> I see two solutions to this:
> *(A) Remove the separate classloader entirely when using "builtin"*
> Starting from 3.0.0, due to SPARK-26839, when using Java 9+, we don't even 
> create a new classloader when using "builtin". This makes sense, as [called 
> out in this 
> comment|https://github.com/apache/spark/pull/24057#discussion_r265142878], 
> since the point of "builtin" is to use the existing JARs on the classpath 
> anyway. This proposes simply

[jira] [Assigned] (SPARK-42539) User-provided JARs can override Spark's Hive metadata client JARs when using "builtin"

2023-02-27 Thread Chao Sun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun reassigned SPARK-42539:


Assignee: Erik Krogen

> User-provided JARs can override Spark's Hive metadata client JARs when using 
> "builtin"
> --
>
> Key: SPARK-42539
> URL: https://issues.apache.org/jira/browse/SPARK-42539
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.3, 3.2.3, 3.3.2
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
>
> Recently we observed that on version 3.2.0 and Java 8, it is possible for 
> user-provided Hive JARs to break the ability for Spark, via the Hive metadata 
> client / {{IsolatedClientLoader}}, to communicate with Hive Metastore, when 
> using the default behavior of the "builtin" Hive version. After SPARK-35321, 
> when Spark is compiled against Hive >= 2.3.9 and the "builtin" Hive client 
> version is used, we will call the method {{Hive.getWithoutRegisterFns()}} 
> (from HIVE-21563) instead of {{Hive.get()}}. If the user has included, for 
> example, {{hive-exec-2.3.8.jar}} on their classpath, the client will break 
> with a {{NoSuchMethodError}}. This particular failure mode was resolved in 
> 3.2.1 by SPARK-37446, but while investigating, we found a general issue that 
> it's possible for user JARs to override Spark's own JARs -- but only inside 
> of the IsolatedClientLoader when using "builtin". This happens because even 
> when Spark is configured to use the "builtin" Hive classes, it still creates 
> a separate URLClassLoader for the HiveClientImpl used for HMS communication. 
> To get the set of JAR URLs to use for this classloader, Spark [collects all 
> of the JARs used by the user classloader (and its parent, and that 
> classloader's parent, and so 
> on)|https://github.com/apache/spark/blob/87e3d5625e76bb734b8dd753bfb25002822c8585/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala#L412-L438].
>  Thus the newly created classloader will have all of the same JARs as the 
> user classloader, but the ordering has been reversed! User JARs get 
> prioritized ahead of system JARs, because the classloader hierarchy is 
> traversed from bottom-to-top. For example let's say we have user JARs 
> "foo.jar" and "hive-exec-2.3.8.jar". The user classloader will look like this:
> {code}
> MutableURLClassLoader
> -- foo.jar
> -- hive-exec-2.3.8.jar
> -- parent: URLClassLoader
> - spark-core_2.12-3.2.0.jar
> - ...
> - hive-exec-2.3.9.jar
> - ...
> {code}
> This setup provides the expected behavior within the user classloader; it 
> will first check the parent, so hive-exec-2.3.9.jar takes precedence, and the 
> MutableURLClassLoader is only checked if the class doesn't exist in the 
> parent. But when a JAR list is constructed for the IsolatedClientLoader, it 
> traverses the URLs from MutableURLClassLoader first, then it's parent, so the 
> final list looks like (in order):
> {code}
> URLClassLoader [IsolatedClientLoader]
> -- foo.jar
> -- hive-exec-2.3.8.jar
> -- spark-core_2.12-3.2.0.jar
> -- ...
> -- hive-exec-2.3.9.jar
> -- ...
> -- parent: boot classloader (JVM classes)
> {code}
> Now when a lookup happens, all of the JARs are within the same 
> URLClassLoader, and the user JARs are in front of the Spark ones, so the user 
> JARs get prioritized. This is the opposite of the expected behavior when 
> using the default user/application classloader in Spark, which has 
> parent-first behavior, prioritizing the Spark/system classes over the user 
> classes. (Note that this behavior is correct when using the 
> {{ChildFirstURLClassLoader}}.)
> After SPARK-37446, the NoSuchMethodError is no longer an issue, but this 
> still breaks assumptions about how user JARs should be treated vs. system 
> JARs, and presents the ability for the client to break in other ways. For 
> example in SPARK-37446 it describes a scenario whereby Hive 2.3.8 JARs have 
> been included; the changes in Hive 2.3.9 were needed to improve compatibility 
> with older HMS, so if a user were to accidentally include these older JARs, 
> it could break the ability of Spark to communicate with HMS 1.x
> I see two solutions to this:
> *(A) Remove the separate classloader entirely when using "builtin"*
> Starting from 3.0.0, due to SPARK-26839, when using Java 9+, we don't even 
> create a new classloader when using "builtin". This makes sense, as [called 
> out in this 
> comment|https://github.com/apache/spark/pull/24057#discussion_r265142878], 
> since the point of "builtin" is to use the existing JARs on the classpath 
> anyway. This proposes simply extending the changes from SPARK-26839 to all 
> Java versions, instead of restricting to Java 9+ only.
> *(B) Reverse the ordering of

[jira] [Assigned] (SPARK-42610) Add implicit encoders to SQLImplicits

2023-02-27 Thread Jira



 [ 
https://issues.apache.org/jira/browse/SPARK-42610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hövell reassigned SPARK-42610:
-

Assignee: Herman van Hövell

> Add implicit encoders to SQLImplicits
> -
>
> Key: SPARK-42610
> URL: https://issues.apache.org/jira/browse/SPARK-42610
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Assignee: Herman van Hövell
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-42467) Spark Connect Scala Client: GroupBy and Aggregation

2023-02-27 Thread Jira



 [ 
https://issues.apache.org/jira/browse/SPARK-42467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hövell updated SPARK-42467:
--
Epic Link: SPARK-42554

> Spark Connect Scala Client: GroupBy and Aggregation
> ---
>
> Key: SPARK-42467
> URL: https://issues.apache.org/jira/browse/SPARK-42467
> Project: Spark
>  Issue Type: Task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-42542) Support Pivot without providing pivot column values

2023-02-27 Thread Jira



 [ 
https://issues.apache.org/jira/browse/SPARK-42542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hövell resolved SPARK-42542.
---
Fix Version/s: 3.4.1
   Resolution: Fixed

> Support Pivot without providing pivot column values
> ---
>
> Key: SPARK-42542
> URL: https://issues.apache.org/jira/browse/SPARK-42542
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
> Fix For: 3.4.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41725) Remove the workaround of sql(...).collect back in PySpark tests

2023-02-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41725:


Assignee: (was: Apache Spark)

> Remove the workaround of sql(...).collect back in PySpark tests
> ---
>
> Key: SPARK-41725
> URL: https://issues.apache.org/jira/browse/SPARK-41725
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark, Tests
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> See https://github.com/apache/spark/pull/39224/files#r1057436437
> We don't have to `collect` for every `sql` but Spark Connect requires it. We 
> should remove them out.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41725) Remove the workaround of sql(...).collect back in PySpark tests

2023-02-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17694208#comment-17694208
 ] 

Apache Spark commented on SPARK-41725:
--

User 'grundprinzip' has created a pull request for this issue:
https://github.com/apache/spark/pull/40160

> Remove the workaround of sql(...).collect back in PySpark tests
> ---
>
> Key: SPARK-41725
> URL: https://issues.apache.org/jira/browse/SPARK-41725
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark, Tests
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> See https://github.com/apache/spark/pull/39224/files#r1057436437
> We don't have to `collect` for every `sql` but Spark Connect requires it. We 
> should remove them out.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41725) Remove the workaround of sql(...).collect back in PySpark tests

2023-02-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41725:


Assignee: Apache Spark

> Remove the workaround of sql(...).collect back in PySpark tests
> ---
>
> Key: SPARK-41725
> URL: https://issues.apache.org/jira/browse/SPARK-41725
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark, Tests
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>Priority: Major
>
> See https://github.com/apache/spark/pull/39224/files#r1057436437
> We don't have to `collect` for every `sql` but Spark Connect requires it. We 
> should remove them out.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42510) Implement `DataFrame.mapInPandas`

2023-02-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17694162#comment-17694162
 ] 

Apache Spark commented on SPARK-42510:
--

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/40201

> Implement `DataFrame.mapInPandas`
> -
>
> Key: SPARK-42510
> URL: https://issues.apache.org/jira/browse/SPARK-42510
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
> Fix For: 3.4.0
>
>
> Implement `DataFrame.mapInPandas`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-42510) Implement `DataFrame.mapInPandas`

2023-02-27 Thread Takuya Ueshin (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin resolved SPARK-42510.
---
Fix Version/s: 3.4.0
 Assignee: Xinrong Meng
   Resolution: Fixed

Issue resolved by pull request 40104
https://github.com/apache/spark/pull/40104

> Implement `DataFrame.mapInPandas`
> -
>
> Key: SPARK-42510
> URL: https://issues.apache.org/jira/browse/SPARK-42510
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
> Fix For: 3.4.0
>
>
> Implement `DataFrame.mapInPandas`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-42610) Add implicit encoders to SQLImplicits

2023-02-27 Thread Jira

Herman van Hövell created SPARK-42610:
-

 Summary: Add implicit encoders to SQLImplicits
 Key: SPARK-42610
 URL: https://issues.apache.org/jira/browse/SPARK-42610
 Project: Spark
  Issue Type: New Feature
  Components: Connect
Affects Versions: 3.4.0
Reporter: Herman van Hövell






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-42609) Support Grouping/Grouping_Set

2023-02-27 Thread Rui Wang (Jira)

Rui Wang created SPARK-42609:


 Summary: Support Grouping/Grouping_Set
 Key: SPARK-42609
 URL: https://issues.apache.org/jira/browse/SPARK-42609
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Rui Wang
Assignee: Rui Wang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

1 2 >

1 - 100 of 171 matches

Mail list logo