date:20221206

[jira] [Comment Edited] (SPARK-41266) Spark does not parse timestamp strings when using the IN operator

2022-12-06 Thread huldar chen (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17643706#comment-17643706
 ] 

huldar chen edited comment on SPARK-41266 at 12/6/22 8:11 AM:
--

You can try to use ANSI compliance：
{code:java}
spark.sql.ansi.enabled=true {code}
In the default hive compliance: promotes all the way to StringType.

In the ANSI compliance: promotes StringType to other data types.


was (Author: huldar):
You can try to use ANSI compliance：
{code:java}
spark.sql.ansi.enabled=true {code}

> Spark does not parse timestamp strings when using the IN operator
> -
>
> Key: SPARK-41266
> URL: https://issues.apache.org/jira/browse/SPARK-41266
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.1
> Environment: Windows 10, Spark 3.2.1 with Java 11
>Reporter: Laurens Versluis
>Priority: Major
>
> Likely affects more versions, tested only with 3.2.1.
>  
> Summary:
> Spark will convert a timestamp string to a timestamp when using the equal 
> operator (=), yet won't do this when using the IN operator.
>  
> Details:
> While debugging an issue why we got no results on a query, we found out that 
> when using the equal symbol `=` in the WHERE clause combined with a 
> TimeStampType column that Spark will convert the string to a timestamp and 
> filter.
> However, when using the IN operator (our query), it will not do so, and 
> perform a cast to string. We expected the behavior to be similar, or at least 
> that Spark realizes the IN clause operates on a TimeStampType column and thus 
> attempts to convert to timestamp first before falling back to string 
> comparison.
>  
> *Minimal reproducible example:*
> Suppose we have a one-line dataset with the follow contents and schema:
>  
> {noformat}
> ++
> |starttime   |
> ++
> |2019-08-11 19:33:05         |
> ++
> root
>  |-- starttime: timestamp (nullable = true){noformat}
> Then if we fire the following queries, we will not get results for the 
> IN-clause one using a timestamp string with timezone information:
>  
>  
> {code:java}
> // Works - Spark casts the argument to a string and the internal 
> representation of the time seems to match it...
> singleCol.filter("starttime IN ('2019-08-11 19:33:05')").show();
> // Works
> singleCol.filter("starttime = '2019-08-11 19:33:05'").show();
> // Works
> singleCol.filter("starttime = '2019-08-11T19:33:05Z'").show();
> // Doesn't work
> singleCol.filter("starttime IN ('2019-08-11T19:33:05Z')").show();
> //Works
> singleCol.filter("starttime IN 
> (to_timestamp('2019-08-11T19:33:05Z'))").show(); {code}
>  
> We can see from the output that a cast to string is taking place:
> {noformat}
> [...] isnotnull(starttime#59),(cast(starttime#59 as string) = 2019-08-11 
> 19:33:05){noformat}
> Since the = operator does work, it would be consistent if operators such as 
> the IN operator would have similar, consistent behavior.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-41403) Implement DataFrame.describe

2022-12-06 Thread Ruifeng Zheng (Jira)

Ruifeng Zheng created SPARK-41403:
-

 Summary: Implement DataFrame.describe
 Key: SPARK-41403
 URL: https://issues.apache.org/jira/browse/SPARK-41403
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, PySpark
Affects Versions: 3.4.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41403) Implement DataFrame.describe

2022-12-06 Thread Ruifeng Zheng (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17643714#comment-17643714
 ] 

Ruifeng Zheng commented on SPARK-41403:
---

[~beliefer] Jiaan, would you want to have a try? You may refer to 
https://issues.apache.org/jira/browse/SPARK-40852

> Implement DataFrame.describe
> 
>
> Key: SPARK-41403
> URL: https://issues.apache.org/jira/browse/SPARK-41403
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-41404) Support `ColumnarBatchSuite#testRandomRows` to test more primitive dataType

2022-12-06 Thread Yang Jie (Jira)

Yang Jie created SPARK-41404:


 Summary: Support `ColumnarBatchSuite#testRandomRows` to test more 
primitive dataType
 Key: SPARK-41404
 URL: https://issues.apache.org/jira/browse/SPARK-41404
 Project: Spark
  Issue Type: Improvement
  Components: Tests
Affects Versions: 3.4.0
Reporter: Yang Jie






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41404) Support `ColumnarBatchSuite#testRandomRows` to test more primitive dataType

2022-12-06 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41404:


Assignee: Apache Spark

> Support `ColumnarBatchSuite#testRandomRows` to test more primitive dataType
> ---
>
> Key: SPARK-41404
> URL: https://issues.apache.org/jira/browse/SPARK-41404
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41404) Support `ColumnarBatchSuite#testRandomRows` to test more primitive dataType

2022-12-06 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17643718#comment-17643718
 ] 

Apache Spark commented on SPARK-41404:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/38933

> Support `ColumnarBatchSuite#testRandomRows` to test more primitive dataType
> ---
>
> Key: SPARK-41404
> URL: https://issues.apache.org/jira/browse/SPARK-41404
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41404) Support `ColumnarBatchSuite#testRandomRows` to test more primitive dataType

2022-12-06 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41404:


Assignee: (was: Apache Spark)

> Support `ColumnarBatchSuite#testRandomRows` to test more primitive dataType
> ---
>
> Key: SPARK-41404
> URL: https://issues.apache.org/jira/browse/SPARK-41404
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41403) Implement DataFrame.describe

2022-12-06 Thread jiaan.geng (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17643733#comment-17643733
 ] 

jiaan.geng commented on SPARK-41403:


[~podongfeng] Thank you for your ping. I will try to do this!

> Implement DataFrame.describe
> 
>
> Key: SPARK-41403
> URL: https://issues.apache.org/jira/browse/SPARK-41403
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-41405) centralize the column resolution logic

2022-12-06 Thread Wenchen Fan (Jira)

Wenchen Fan created SPARK-41405:
---

 Summary: centralize the column resolution logic
 Key: SPARK-41405
 URL: https://issues.apache.org/jira/browse/SPARK-41405
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.4.0
Reporter: Wenchen Fan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41405) centralize the column resolution logic

2022-12-06 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17643742#comment-17643742
 ] 

Apache Spark commented on SPARK-41405:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/3

> centralize the column resolution logic
> --
>
> Key: SPARK-41405
> URL: https://issues.apache.org/jira/browse/SPARK-41405
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41405) centralize the column resolution logic

2022-12-06 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41405:


Assignee: Apache Spark

> centralize the column resolution logic
> --
>
> Key: SPARK-41405
> URL: https://issues.apache.org/jira/browse/SPARK-41405
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Wenchen Fan
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41405) centralize the column resolution logic

2022-12-06 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41405:


Assignee: (was: Apache Spark)

> centralize the column resolution logic
> --
>
> Key: SPARK-41405
> URL: https://issues.apache.org/jira/browse/SPARK-41405
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41405) centralize the column resolution logic

2022-12-06 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17643743#comment-17643743
 ] 

Apache Spark commented on SPARK-41405:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/3

> centralize the column resolution logic
> --
>
> Key: SPARK-41405
> URL: https://issues.apache.org/jira/browse/SPARK-41405
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41317) PySpark write API for Spark Connect

2022-12-06 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17643746#comment-17643746
 ] 

Apache Spark commented on SPARK-41317:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/38934

> PySpark write API for Spark Connect
> ---
>
> Key: SPARK-41317
> URL: https://issues.apache.org/jira/browse/SPARK-41317
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41317) PySpark write API for Spark Connect

2022-12-06 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17643747#comment-17643747
 ] 

Apache Spark commented on SPARK-41317:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/38934

> PySpark write API for Spark Connect
> ---
>
> Key: SPARK-41317
> URL: https://issues.apache.org/jira/browse/SPARK-41317
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-41121) Upgrade sbt-assembly from 1.2.0 to 2.0.0

2022-12-06 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-41121.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 38637
[https://github.com/apache/spark/pull/38637]

> Upgrade sbt-assembly from 1.2.0 to 2.0.0
> 
>
> Key: SPARK-41121
> URL: https://issues.apache.org/jira/browse/SPARK-41121
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41121) Upgrade sbt-assembly from 1.2.0 to 2.0.0

2022-12-06 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-41121:
-

Assignee: BingKun Pan

> Upgrade sbt-assembly from 1.2.0 to 2.0.0
> 
>
> Key: SPARK-41121
> URL: https://issues.apache.org/jira/browse/SPARK-41121
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-41406) Refactor error message for `NUM_COLUMNS_MISMATCH` to make it more generic

2022-12-06 Thread BingKun Pan (Jira)

BingKun Pan created SPARK-41406:
---

 Summary: Refactor error message for `NUM_COLUMNS_MISMATCH` to make 
it more generic
 Key: SPARK-41406
 URL: https://issues.apache.org/jira/browse/SPARK-41406
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.4.0
Reporter: BingKun Pan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41392) spark builds against hadoop trunk/3.4.0-SNAPSHOT fail in scala-maven plugin

2022-12-06 Thread Steve Loughran (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17643774#comment-17643774
 ] 

Steve Loughran commented on SPARK-41392:


may relate to the bouncy castle 1.68 update of HADOOP-1756 -but this is also in 
the 3.3.5/3.3 branches and spark is happy there. so there must be more to it

> spark builds against hadoop trunk/3.4.0-SNAPSHOT fail in scala-maven plugin
> ---
>
> Key: SPARK-41392
> URL: https://issues.apache.org/jira/browse/SPARK-41392
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Steve Loughran
>Priority: Minor
>
> on hadoop trunk (but not the 3.3.x line), spark builds fail with a CNFE
> {code}
> net.alchim31.maven:scala-maven-plugin:4.7.2:testCompile: 
> org/bouncycastle/jce/provider/BouncyCastleProvider
> {code}
> full stack
> {code}
> [ERROR] Failed to execute goal 
> net.alchim31.maven:scala-maven-plugin:4.7.2:testCompile 
> (scala-test-compile-first) on project spark-sql_2.12: Execution 
> scala-test-compile-first of goal 
> net.alchim31.maven:scala-maven-plugin:4.7.2:testCompile failed: A required 
> class was missing while executing 
> net.alchim31.maven:scala-maven-plugin:4.7.2:testCompile: 
> org/bouncycastle/jce/provider/BouncyCastleProvider
> [ERROR] -
> [ERROR] realm =plugin>net.alchim31.maven:scala-maven-plugin:4.7.2
> [ERROR] strategy = org.codehaus.plexus.classworlds.strategy.SelfFirstStrategy
> [ERROR] urls[0] = 
> file:/Users/stevel/.m2/repository/net/alchim31/maven/scala-maven-plugin/4.7.2/scala-maven-plugin-4.7.2.jar
> [ERROR] urls[1] = 
> file:/Users/stevel/.m2/repository/org/apache/maven/shared/maven-dependency-tree/3.2.0/maven-dependency-tree-3.2.0.jar
> [ERROR] urls[2] = 
> file:/Users/stevel/.m2/repository/org/eclipse/aether/aether-util/1.0.0.v20140518/aether-util-1.0.0.v20140518.jar
> [ERROR] urls[3] = 
> file:/Users/stevel/.m2/repository/org/apache/maven/reporting/maven-reporting-api/3.1.1/maven-reporting-api-3.1.1.jar
> [ERROR] urls[4] = 
> file:/Users/stevel/.m2/repository/org/apache/maven/doxia/doxia-sink-api/1.11.1/doxia-sink-api-1.11.1.jar
> [ERROR] urls[5] = 
> file:/Users/stevel/.m2/repository/org/apache/maven/doxia/doxia-logging-api/1.11.1/doxia-logging-api-1.11.1.jar
> [ERROR] urls[6] = 
> file:/Users/stevel/.m2/repository/org/apache/maven/maven-archiver/3.6.0/maven-archiver-3.6.0.jar
> [ERROR] urls[7] = 
> file:/Users/stevel/.m2/repository/org/codehaus/plexus/plexus-io/3.4.0/plexus-io-3.4.0.jar
> [ERROR] urls[8] = 
> file:/Users/stevel/.m2/repository/org/codehaus/plexus/plexus-interpolation/1.26/plexus-interpolation-1.26.jar
> [ERROR] urls[9] = 
> file:/Users/stevel/.m2/repository/org/apache/commons/commons-exec/1.3/commons-exec-1.3.jar
> [ERROR] urls[10] = 
> file:/Users/stevel/.m2/repository/org/codehaus/plexus/plexus-utils/3.4.2/plexus-utils-3.4.2.jar
> [ERROR] urls[11] = 
> file:/Users/stevel/.m2/repository/org/codehaus/plexus/plexus-archiver/4.5.0/plexus-archiver-4.5.0.jar
> [ERROR] urls[12] = 
> file:/Users/stevel/.m2/repository/commons-io/commons-io/2.11.0/commons-io-2.11.0.jar
> [ERROR] urls[13] = 
> file:/Users/stevel/.m2/repository/org/apache/commons/commons-compress/1.21/commons-compress-1.21.jar
> [ERROR] urls[14] = 
> file:/Users/stevel/.m2/repository/org/iq80/snappy/snappy/0.4/snappy-0.4.jar
> [ERROR] urls[15] = 
> file:/Users/stevel/.m2/repository/org/tukaani/xz/1.9/xz-1.9.jar
> [ERROR] urls[16] = 
> file:/Users/stevel/.m2/repository/com/github/luben/zstd-jni/1.5.2-4/zstd-jni-1.5.2-4.jar
> [ERROR] urls[17] = 
> file:/Users/stevel/.m2/repository/org/scala-sbt/zinc_2.13/1.7.1/zinc_2.13-1.7.1.jar
> [ERROR] urls[18] = 
> file:/Users/stevel/.m2/repository/org/scala-lang/scala-library/2.13.8/scala-library-2.13.8.jar
> [ERROR] urls[19] = 
> file:/Users/stevel/.m2/repository/org/scala-sbt/zinc-core_2.13/1.7.1/zinc-core_2.13-1.7.1.jar
> [ERROR] urls[20] = 
> file:/Users/stevel/.m2/repository/org/scala-sbt/zinc-apiinfo_2.13/1.7.1/zinc-apiinfo_2.13-1.7.1.jar
> [ERROR] urls[21] = 
> file:/Users/stevel/.m2/repository/org/scala-sbt/compiler-bridge_2.13/1.7.1/compiler-bridge_2.13-1.7.1.jar
> [ERROR] urls[22] = 
> file:/Users/stevel/.m2/repository/org/scala-sbt/zinc-classpath_2.13/1.7.1/zinc-classpath_2.13-1.7.1.jar
> [ERROR] urls[23] = 
> file:/Users/stevel/.m2/repository/org/scala-lang/scala-compiler/2.13.8/scala-compiler-2.13.8.jar
> [ERROR] urls[24] = 
> file:/Users/stevel/.m2/repository/org/scala-sbt/compiler-interface/1.7.1/compiler-interface-1.7.1.jar
> [ERROR] urls[25] = 
> file:/Users/stevel/.m2/repository/org/scala-sbt/util-interface/1.7.0/util-interface-1.7.0.jar
> [ERROR] urls[26] = 
> file:/Users/stevel/.m2/repository/org/scala-sbt/zinc-persist-core-assembly/1.7

[jira] [Commented] (SPARK-41319) when-otherwise support

2022-12-06 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17643776#comment-17643776
 ] 

Apache Spark commented on SPARK-41319:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/38935

> when-otherwise support
> --
>
> Key: SPARK-41319
> URL: https://issues.apache.org/jira/browse/SPARK-41319
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>
> 1, add protobuf message for expression 'CaseWhen';
> 2, support the 'Column.\{when, otherwise\}' methods in Spark Connect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41319) when-otherwise support

2022-12-06 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17643775#comment-17643775
 ] 

Apache Spark commented on SPARK-41319:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/38935

> when-otherwise support
> --
>
> Key: SPARK-41319
> URL: https://issues.apache.org/jira/browse/SPARK-41319
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>
> 1, add protobuf message for expression 'CaseWhen';
> 2, support the 'Column.\{when, otherwise\}' methods in Spark Connect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41319) when-otherwise support

2022-12-06 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41319:


Assignee: Apache Spark

> when-otherwise support
> --
>
> Key: SPARK-41319
> URL: https://issues.apache.org/jira/browse/SPARK-41319
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Apache Spark
>Priority: Major
>
> 1, add protobuf message for expression 'CaseWhen';
> 2, support the 'Column.\{when, otherwise\}' methods in Spark Connect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41319) when-otherwise support

2022-12-06 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41319:


Assignee: (was: Apache Spark)

> when-otherwise support
> --
>
> Key: SPARK-41319
> URL: https://issues.apache.org/jira/browse/SPARK-41319
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>
> 1, add protobuf message for expression 'CaseWhen';
> 2, support the 'Column.\{when, otherwise\}' methods in Spark Connect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-28869) Roll over event log files

2022-12-06 Thread Ranga Reddy (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-28869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ranga Reddy updated SPARK-28869:

Attachment: application_1670216197043_0012.log

> Roll over event log files
> -
>
> Key: SPARK-28869
> URL: https://issues.apache.org/jira/browse/SPARK-28869
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: application_1670216197043_0012.log
>
>
> This issue tracks the effort on rolling over event log files in driver and 
> let SHS replay the multiple event log files correctly.
> This issue doesn't deal with overall size of event log, as well as no 
> guarantee when deleting old event log files.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-28869) Roll over event log files

2022-12-06 Thread Ranga Reddy (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-28869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17643804#comment-17643804
 ] 

Ranga Reddy commented on SPARK-28869:
-

Hi [~kabhwan] 

I have enabled the eventlog rolling for the spark streaming network word count 
example, but event log files are not compacted. 

*Configuration Parameters:*
{code:java}
spark.eventLog.rolling.enabled=true
spark.eventLog.rolling.maxFileSize=10m
spark.history.fs.eventLog.rolling.maxFilesToRetain=2
spark.history.fs.cleaner.interval=1800{code}
*Event log file list:*

[^application_1670216197043_0012.log]

^Could you please check the issue.^

> Roll over event log files
> -
>
> Key: SPARK-28869
> URL: https://issues.apache.org/jira/browse/SPARK-28869
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: application_1670216197043_0012.log
>
>
> This issue tracks the effort on rolling over event log files in driver and 
> let SHS replay the multiple event log files correctly.
> This issue doesn't deal with overall size of event log, as well as no 
> guarantee when deleting old event log files.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41406) Refactor error message for `NUM_COLUMNS_MISMATCH` to make it more generic

2022-12-06 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17643845#comment-17643845
 ] 

Apache Spark commented on SPARK-41406:
--

User 'panbingkun' has created a pull request for this issue:
https://github.com/apache/spark/pull/38937

> Refactor error message for `NUM_COLUMNS_MISMATCH` to make it more generic
> -
>
> Key: SPARK-41406
> URL: https://issues.apache.org/jira/browse/SPARK-41406
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41406) Refactor error message for `NUM_COLUMNS_MISMATCH` to make it more generic

2022-12-06 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41406:


Assignee: Apache Spark

> Refactor error message for `NUM_COLUMNS_MISMATCH` to make it more generic
> -
>
> Key: SPARK-41406
> URL: https://issues.apache.org/jira/browse/SPARK-41406
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: BingKun Pan
>Assignee: Apache Spark
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41406) Refactor error message for `NUM_COLUMNS_MISMATCH` to make it more generic

2022-12-06 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41406:


Assignee: (was: Apache Spark)

> Refactor error message for `NUM_COLUMNS_MISMATCH` to make it more generic
> -
>
> Key: SPARK-41406
> URL: https://issues.apache.org/jira/browse/SPARK-41406
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41403) Implement DataFrame.describe

2022-12-06 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41403:


Assignee: Apache Spark

> Implement DataFrame.describe
> 
>
> Key: SPARK-41403
> URL: https://issues.apache.org/jira/browse/SPARK-41403
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41403) Implement DataFrame.describe

2022-12-06 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41403:


Assignee: (was: Apache Spark)

> Implement DataFrame.describe
> 
>
> Key: SPARK-41403
> URL: https://issues.apache.org/jira/browse/SPARK-41403
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41403) Implement DataFrame.describe

2022-12-06 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17643850#comment-17643850
 ] 

Apache Spark commented on SPARK-41403:
--

User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/38938

> Implement DataFrame.describe
> 
>
> Key: SPARK-41403
> URL: https://issues.apache.org/jira/browse/SPARK-41403
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-41407) Pull out v1 write to WriteFiles

2022-12-06 Thread XiDuo You (Jira)

XiDuo You created SPARK-41407:
-

 Summary: Pull out v1 write to WriteFiles
 Key: SPARK-41407
 URL: https://issues.apache.org/jira/browse/SPARK-41407
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.4.0
Reporter: XiDuo You


Add new plan WriteFiles to do write files for v1writes.

We can make v1 write support whole stage codegen in future.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41407) Pull out v1 write to WriteFiles

2022-12-06 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17643859#comment-17643859
 ] 

Apache Spark commented on SPARK-41407:
--

User 'ulysses-you' has created a pull request for this issue:
https://github.com/apache/spark/pull/38939

> Pull out v1 write to WriteFiles
> ---
>
> Key: SPARK-41407
> URL: https://issues.apache.org/jira/browse/SPARK-41407
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: XiDuo You
>Priority: Major
>
> Add new plan WriteFiles to do write files for v1writes.
> We can make v1 write support whole stage codegen in future.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41407) Pull out v1 write to WriteFiles

2022-12-06 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41407:


Assignee: Apache Spark

> Pull out v1 write to WriteFiles
> ---
>
> Key: SPARK-41407
> URL: https://issues.apache.org/jira/browse/SPARK-41407
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: XiDuo You
>Assignee: Apache Spark
>Priority: Major
>
> Add new plan WriteFiles to do write files for v1writes.
> We can make v1 write support whole stage codegen in future.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41407) Pull out v1 write to WriteFiles

2022-12-06 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41407:


Assignee: (was: Apache Spark)

> Pull out v1 write to WriteFiles
> ---
>
> Key: SPARK-41407
> URL: https://issues.apache.org/jira/browse/SPARK-41407
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: XiDuo You
>Priority: Major
>
> Add new plan WriteFiles to do write files for v1writes.
> We can make v1 write support whole stage codegen in future.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41407) Pull out v1 write to WriteFiles

2022-12-06 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17643861#comment-17643861
 ] 

Apache Spark commented on SPARK-41407:
--

User 'ulysses-you' has created a pull request for this issue:
https://github.com/apache/spark/pull/38939

> Pull out v1 write to WriteFiles
> ---
>
> Key: SPARK-41407
> URL: https://issues.apache.org/jira/browse/SPARK-41407
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: XiDuo You
>Priority: Major
>
> Add new plan WriteFiles to do write files for v1writes.
> We can make v1 write support whole stage codegen in future.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-41408) Upgrade scala-maven-plugin to 4.8.0

2022-12-06 Thread Yang Jie (Jira)

Yang Jie created SPARK-41408:


 Summary: Upgrade scala-maven-plugin to 4.8.0
 Key: SPARK-41408
 URL: https://issues.apache.org/jira/browse/SPARK-41408
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.4.0
Reporter: Yang Jie






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41408) Upgrade scala-maven-plugin to 4.8.0

2022-12-06 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17643892#comment-17643892
 ] 

Apache Spark commented on SPARK-41408:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/38936

> Upgrade scala-maven-plugin to 4.8.0
> ---
>
> Key: SPARK-41408
> URL: https://issues.apache.org/jira/browse/SPARK-41408
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41408) Upgrade scala-maven-plugin to 4.8.0

2022-12-06 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41408:


Assignee: Apache Spark

> Upgrade scala-maven-plugin to 4.8.0
> ---
>
> Key: SPARK-41408
> URL: https://issues.apache.org/jira/browse/SPARK-41408
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41408) Upgrade scala-maven-plugin to 4.8.0

2022-12-06 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41408:


Assignee: (was: Apache Spark)

> Upgrade scala-maven-plugin to 4.8.0
> ---
>
> Key: SPARK-41408
> URL: https://issues.apache.org/jira/browse/SPARK-41408
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-41409) Reuse `WRONG_NUM_ARGS` instead of `_LEGACY_ERROR_TEMP_1043`

2022-12-06 Thread Yang Jie (Jira)

Yang Jie created SPARK-41409:


 Summary: Reuse `WRONG_NUM_ARGS` instead of 
`_LEGACY_ERROR_TEMP_1043`
 Key: SPARK-41409
 URL: https://issues.apache.org/jira/browse/SPARK-41409
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.4.0
Reporter: Yang Jie






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41409) Reuse `WRONG_NUM_ARGS` instead of `_LEGACY_ERROR_TEMP_1043`

2022-12-06 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41409:


Assignee: (was: Apache Spark)

> Reuse `WRONG_NUM_ARGS` instead of `_LEGACY_ERROR_TEMP_1043`
> ---
>
> Key: SPARK-41409
> URL: https://issues.apache.org/jira/browse/SPARK-41409
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41409) Reuse `WRONG_NUM_ARGS` instead of `_LEGACY_ERROR_TEMP_1043`

2022-12-06 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41409:


Assignee: Apache Spark

> Reuse `WRONG_NUM_ARGS` instead of `_LEGACY_ERROR_TEMP_1043`
> ---
>
> Key: SPARK-41409
> URL: https://issues.apache.org/jira/browse/SPARK-41409
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41409) Reuse `WRONG_NUM_ARGS` instead of `_LEGACY_ERROR_TEMP_1043`

2022-12-06 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17643929#comment-17643929
 ] 

Apache Spark commented on SPARK-41409:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/38940

> Reuse `WRONG_NUM_ARGS` instead of `_LEGACY_ERROR_TEMP_1043`
> ---
>
> Key: SPARK-41409
> URL: https://issues.apache.org/jira/browse/SPARK-41409
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41409) Reuse `WRONG_NUM_ARGS` instead of `_LEGACY_ERROR_TEMP_1043`

2022-12-06 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17643930#comment-17643930
 ] 

Apache Spark commented on SPARK-41409:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/38940

> Reuse `WRONG_NUM_ARGS` instead of `_LEGACY_ERROR_TEMP_1043`
> ---
>
> Key: SPARK-41409
> URL: https://issues.apache.org/jira/browse/SPARK-41409
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41398) Relax constraints on Storage-Partitioned Join when partition keys after runtime filtering do not match

2022-12-06 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-41398:
-

Assignee: Chao Sun

> Relax constraints on Storage-Partitioned Join when partition keys after 
> runtime filtering do not match
> --
>
> Key: SPARK-41398
> URL: https://issues.apache.org/jira/browse/SPARK-41398
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.1
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-41398) Relax constraints on Storage-Partitioned Join when partition keys after runtime filtering do not match

2022-12-06 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-41398.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 38924
[https://github.com/apache/spark/pull/38924]

> Relax constraints on Storage-Partitioned Join when partition keys after 
> runtime filtering do not match
> --
>
> Key: SPARK-41398
> URL: https://issues.apache.org/jira/browse/SPARK-41398
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.1
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-41393) Upgrade slf4j to 2.0.5

2022-12-06 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-41393.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 38918
[https://github.com/apache/spark/pull/38918]

> Upgrade slf4j to 2.0.5
> --
>
> Key: SPARK-41393
> URL: https://issues.apache.org/jira/browse/SPARK-41393
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
> Fix For: 3.4.0
>
>
> https://www.slf4j.org/news.html#2.0.5



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41393) Upgrade slf4j to 2.0.5

2022-12-06 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-41393:
-

Assignee: Yang Jie

> Upgrade slf4j to 2.0.5
> --
>
> Key: SPARK-41393
> URL: https://issues.apache.org/jira/browse/SPARK-41393
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
>
> https://www.slf4j.org/news.html#2.0.5



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-41410) Support PVC-oriented executor pod allocation

2022-12-06 Thread Dongjoon Hyun (Jira)

Dongjoon Hyun created SPARK-41410:
-

 Summary: Support PVC-oriented executor pod allocation
 Key: SPARK-41410
 URL: https://issues.apache.org/jira/browse/SPARK-41410
 Project: Spark
  Issue Type: New Feature
  Components: Kubernetes
Affects Versions: 3.4.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41410) Support PVC-oriented executor pod allocation

2022-12-06 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41410:


Assignee: Apache Spark

> Support PVC-oriented executor pod allocation
> 
>
> Key: SPARK-41410
> URL: https://issues.apache.org/jira/browse/SPARK-41410
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41410) Support PVC-oriented executor pod allocation

2022-12-06 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17643973#comment-17643973
 ] 

Apache Spark commented on SPARK-41410:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/38943

> Support PVC-oriented executor pod allocation
> 
>
> Key: SPARK-41410
> URL: https://issues.apache.org/jira/browse/SPARK-41410
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41410) Support PVC-oriented executor pod allocation

2022-12-06 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41410:


Assignee: (was: Apache Spark)

> Support PVC-oriented executor pod allocation
> 
>
> Key: SPARK-41410
> URL: https://issues.apache.org/jira/browse/SPARK-41410
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41369) Refactor connect directory structure

2022-12-06 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17643976#comment-17643976
 ] 

Apache Spark commented on SPARK-41369:
--

User 'hvanhovell' has created a pull request for this issue:
https://github.com/apache/spark/pull/38944

> Refactor connect directory structure
> 
>
> Key: SPARK-41369
> URL: https://issues.apache.org/jira/browse/SPARK-41369
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.3.2, 3.4.0
>Reporter: Venkata Sai Akhil Gudesa
>Priority: Major
>
> Currently, `spark/connector/connect/` is a single module that contains both 
> the "server"/service as well as the protobuf definitions.
> However, this module can be split into multiple modules - "server" and 
> "common". This brings the advantage of separating out the protobuf generation 
> from the core "server" module for efficient reuse.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-41411) Multi-Stateful Operator watermark support bug fix

2022-12-06 Thread Wei Liu (Jira)

Wei Liu created SPARK-41411:
---

 Summary: Multi-Stateful Operator watermark support bug fix
 Key: SPARK-41411
 URL: https://issues.apache.org/jira/browse/SPARK-41411
 Project: Spark
  Issue Type: Bug
  Components: Structured Streaming
Affects Versions: 3.3.2, 3.4.0
Reporter: Wei Liu


A typo in passing event time watermark to`StreamingSymmetricHashJoinExec` 
causes logic errrors. With the bug, the query would work with no error reported 
but producing incorrect results. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41411) Multi-Stateful Operator watermark support bug fix

2022-12-06 Thread Wei Liu (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17644006#comment-17644006
 ] 

Wei Liu commented on SPARK-41411:
-

PR: https://github.com/apache/spark/pull/38945

> Multi-Stateful Operator watermark support bug fix
> -
>
> Key: SPARK-41411
> URL: https://issues.apache.org/jira/browse/SPARK-41411
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.3.2, 3.4.0
>Reporter: Wei Liu
>Priority: Major
>
> A typo in passing event time watermark to`StreamingSymmetricHashJoinExec` 
> causes logic errrors. With the bug, the query would work with no error 
> reported but producing incorrect results. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41411) Multi-Stateful Operator watermark support bug fix

2022-12-06 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41411:


Assignee: (was: Apache Spark)

> Multi-Stateful Operator watermark support bug fix
> -
>
> Key: SPARK-41411
> URL: https://issues.apache.org/jira/browse/SPARK-41411
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.3.2, 3.4.0
>Reporter: Wei Liu
>Priority: Major
>
> A typo in passing event time watermark to`StreamingSymmetricHashJoinExec` 
> causes logic errrors. With the bug, the query would work with no error 
> reported but producing incorrect results. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41411) Multi-Stateful Operator watermark support bug fix

2022-12-06 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41411:


Assignee: Apache Spark

> Multi-Stateful Operator watermark support bug fix
> -
>
> Key: SPARK-41411
> URL: https://issues.apache.org/jira/browse/SPARK-41411
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.3.2, 3.4.0
>Reporter: Wei Liu
>Assignee: Apache Spark
>Priority: Major
>
> A typo in passing event time watermark to`StreamingSymmetricHashJoinExec` 
> causes logic errrors. With the bug, the query would work with no error 
> reported but producing incorrect results. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41411) Multi-Stateful Operator watermark support bug fix

2022-12-06 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17644007#comment-17644007
 ] 

Apache Spark commented on SPARK-41411:
--

User 'WweiL' has created a pull request for this issue:
https://github.com/apache/spark/pull/38945

> Multi-Stateful Operator watermark support bug fix
> -
>
> Key: SPARK-41411
> URL: https://issues.apache.org/jira/browse/SPARK-41411
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.3.2, 3.4.0
>Reporter: Wei Liu
>Priority: Major
>
> A typo in passing event time watermark to`StreamingSymmetricHashJoinExec` 
> causes logic errrors. With the bug, the query would work with no error 
> reported but producing incorrect results. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41411) Multi-Stateful Operator watermark support bug fix

2022-12-06 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17644008#comment-17644008
 ] 

Apache Spark commented on SPARK-41411:
--

User 'WweiL' has created a pull request for this issue:
https://github.com/apache/spark/pull/38945

> Multi-Stateful Operator watermark support bug fix
> -
>
> Key: SPARK-41411
> URL: https://issues.apache.org/jira/browse/SPARK-41411
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.3.2, 3.4.0
>Reporter: Wei Liu
>Priority: Major
>
> A typo in passing event time watermark to`StreamingSymmetricHashJoinExec` 
> causes logic errrors. With the bug, the query would work with no error 
> reported but producing incorrect results. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] (SPARK-41411) Multi-Stateful Operator watermark support bug fix

2022-12-06 Thread Wei Liu (Jira)



[ https://issues.apache.org/jira/browse/SPARK-41411 ]


Wei Liu deleted comment on SPARK-41411:
-

was (Author: JIRAUSER295948):
PR: https://github.com/apache/spark/pull/38945

> Multi-Stateful Operator watermark support bug fix
> -
>
> Key: SPARK-41411
> URL: https://issues.apache.org/jira/browse/SPARK-41411
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.3.2, 3.4.0
>Reporter: Wei Liu
>Priority: Major
>
> A typo in passing event time watermark to`StreamingSymmetricHashJoinExec` 
> causes logic errrors. With the bug, the query would work with no error 
> reported but producing incorrect results. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-41411) Multi-Stateful Operator watermark support bug fix

2022-12-06 Thread Wei Liu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Liu updated SPARK-41411:

Affects Version/s: (was: 3.3.2)

> Multi-Stateful Operator watermark support bug fix
> -
>
> Key: SPARK-41411
> URL: https://issues.apache.org/jira/browse/SPARK-41411
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.4.0
>Reporter: Wei Liu
>Priority: Major
>
> A typo in passing event time watermark to`StreamingSymmetricHashJoinExec` 
> causes logic errrors. With the bug, the query would work with no error 
> reported but producing incorrect results. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-41412) Implement `Cast`

2022-12-06 Thread Rui Wang (Jira)

Rui Wang created SPARK-41412:


 Summary: Implement `Cast`
 Key: SPARK-41412
 URL: https://issues.apache.org/jira/browse/SPARK-41412
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Rui Wang
Assignee: Rui Wang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-41413) Storage-Partitioned Join should avoid shuffle when partition keys mismatch, but join expressions are compatible

2022-12-06 Thread Chao Sun (Jira)

Chao Sun created SPARK-41413:


 Summary: Storage-Partitioned Join should avoid shuffle when 
partition keys mismatch, but join expressions are compatible
 Key: SPARK-41413
 URL: https://issues.apache.org/jira/browse/SPARK-41413
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.3.1
Reporter: Chao Sun


Currently when checking whether two sides of a Storage Partitioned Join are 
compatible, we requires both the partition expressions as well as the partition 
keys are compatible. However, this condition could be relaxed so that we only 
require the former. In the case that the latter is not compatible, we can 
calculate a common superset of keys and push down the information to both sides 
of the join, and use empty partitions for the missing keys.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-38918) Nested column pruning should filter out attributes that do not belong to the current relation

2022-12-06 Thread Wing Yew Poon (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-38918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17644041#comment-17644041
 ] 

Wing Yew Poon commented on SPARK-38918:
---

It seems that this is fixed in 3.2.2 
([7c0b9e6e|https://github.com/apache/spark/commit/7c0b9e6e6f680db45c1e2602b85753d9b521bb58]),
 but for some reason, 3.2.2 is not in the Fixed Version/s. Can we please 
correct this?
Probably because of this, this issue does not appear in 
https://spark.apache.org/releases/spark-release-3-2-2.html.

> Nested column pruning should filter out attributes that do not belong to the 
> current relation
> -
>
> Key: SPARK-38918
> URL: https://issues.apache.org/jira/browse/SPARK-38918
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
> Fix For: 3.1.3, 3.0.4, 3.3.0, 3.4.0
>
>
> `SchemaPruning` currently does not check if the root field of a nested column 
> belongs to the current relation. This can happen when the filter contains 
> correlated subqueries, where the children field can contain attributes from 
> both the inner and the outer query.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-41414) Implement data/timestamp functions

2022-12-06 Thread Xinrong Meng (Jira)

Xinrong Meng created SPARK-41414:


 Summary: Implement data/timestamp functions
 Key: SPARK-41414
 URL: https://issues.apache.org/jira/browse/SPARK-41414
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, PySpark
Affects Versions: 3.4.0
Reporter: Xinrong Meng


Implement data/timestamp functions



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-41410) Support PVC-oriented executor pod allocation

2022-12-06 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-41410.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 38943
[https://github.com/apache/spark/pull/38943]

> Support PVC-oriented executor pod allocation
> 
>
> Key: SPARK-41410
> URL: https://issues.apache.org/jira/browse/SPARK-41410
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41410) Support PVC-oriented executor pod allocation

2022-12-06 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-41410:
-

Assignee: Dongjoon Hyun

> Support PVC-oriented executor pod allocation
> 
>
> Key: SPARK-41410
> URL: https://issues.apache.org/jira/browse/SPARK-41410
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-41414) Implement date/timestamp functions

2022-12-06 Thread Xinrong Meng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-41414:
-
Summary: Implement date/timestamp functions  (was: Implement data/timestamp 
functions)

> Implement date/timestamp functions
> --
>
> Key: SPARK-41414
> URL: https://issues.apache.org/jira/browse/SPARK-41414
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Implement data/timestamp functions



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41414) Implement date/timestamp functions

2022-12-06 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41414:


Assignee: Apache Spark

> Implement date/timestamp functions
> --
>
> Key: SPARK-41414
> URL: https://issues.apache.org/jira/browse/SPARK-41414
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Assignee: Apache Spark
>Priority: Major
>
> Implement data/timestamp functions



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41414) Implement date/timestamp functions

2022-12-06 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41414:


Assignee: (was: Apache Spark)

> Implement date/timestamp functions
> --
>
> Key: SPARK-41414
> URL: https://issues.apache.org/jira/browse/SPARK-41414
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Implement data/timestamp functions



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41414) Implement date/timestamp functions

2022-12-06 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17644049#comment-17644049
 ] 

Apache Spark commented on SPARK-41414:
--

User 'xinrong-meng' has created a pull request for this issue:
https://github.com/apache/spark/pull/38946

> Implement date/timestamp functions
> --
>
> Key: SPARK-41414
> URL: https://issues.apache.org/jira/browse/SPARK-41414
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Implement data/timestamp functions



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41344) Reading V2 datasource masks underlying error

2022-12-06 Thread Pablo Langa Blanco (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17644061#comment-17644061
 ] 

Pablo Langa Blanco commented on SPARK-41344:


In this case the provider has been detected as DataSourceV2 and also implements 
SupportsCatalogOptions, so if it fails at that point, it does not make sense to 
try it as DataSource V1.

The CatalogV2Util.loadTable function catches NoSuchTableException, 
NoSuchDatabaseException and NoSuchNamespaceException to return an option, which 
makes sense in other places where it is used, but not at this point. Maybe the 
best solution is to have another function that does not catch those exceptions 
to use in this case and does not return an option.

> Reading V2 datasource masks underlying error
> 
>
> Key: SPARK-41344
> URL: https://issues.apache.org/jira/browse/SPARK-41344
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0, 3.3.1, 3.4.0
>Reporter: Kevin Cheung
>Priority: Critical
> Attachments: image-2022-12-03-09-24-43-285.png
>
>
> In Spark 3.3, 
>  # DataSourceV2Utils, the loadV2Source calls: 
> {*}(CatalogV2Util.loadTable(catalog, ident, timeTravel).get{*}, 
> Some(catalog), Some(ident)).
>  # CatalogV2Util.scala, when it tries to *loadTable(x,x,x)* and it fails with 
> any of these exceptions NoSuchTableException, NoSuchDatabaseException, 
> NoSuchNamespaceException, it would return None
>  # Coming back to DataSourceV2Utils, None was previously returned and calling 
> None.get results in a cryptic error technically "correct", but the *original 
> exceptions NoSuchTableException, NoSuchDatabaseException, 
> NoSuchNamespaceException are thrown away.*
>  
> *Ask:*
> Retain the original error and propagate this to the user. Prior to Spark 3.3, 
> the *original error* was shown and this seems like a design flaw.
>  
> *Sample user facing error:*
> None.get
> java.util.NoSuchElementException: None.get
>     at scala.None$.get(Option.scala:529)
>     at scala.None$.get(Option.scala:527)
>     at 
> org.apache.spark.sql.execution.datasources.v2.DataSourceV2Utils$.loadV2Source(DataSourceV2Utils.scala:129)
>     at 
> org.apache.spark.sql.DataFrameReader.$anonfun$load$1(DataFrameReader.scala:209)
>     at scala.Option.flatMap(Option.scala:271)
>     at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:207)
>     at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:171)
>  
> *DataSourceV2Utils.scala - CatalogV2Util.loadTable(x,x,x).get*
> [https://github.com/apache/spark/blob/7fd654c0142ab9e4002882da4e65d3b25bebd26c/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Utils.scala#L137]
> *CatalogV2Util.scala - Option(catalog.asTableCatalog.loadTable(ident))*
> {*}{{*}}[https://github.com/apache/spark/blob/7fd654c0142ab9e4002882da4e65d3b25bebd26c/sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogV2Util.scala#L341]
> *CatalogV2Util.scala - catching the exceptions and return None*
> [https://github.com/apache/spark/blob/7fd654c0142ab9e4002882da4e65d3b25bebd26c/sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogV2Util.scala#L344]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41411) Multi-Stateful Operator watermark support bug fix

2022-12-06 Thread Jungtaek Lim (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim reassigned SPARK-41411:


Assignee: Wei Liu

> Multi-Stateful Operator watermark support bug fix
> -
>
> Key: SPARK-41411
> URL: https://issues.apache.org/jira/browse/SPARK-41411
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.4.0
>Reporter: Wei Liu
>Assignee: Wei Liu
>Priority: Major
>
> A typo in passing event time watermark to`StreamingSymmetricHashJoinExec` 
> causes logic errrors. With the bug, the query would work with no error 
> reported but producing incorrect results. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-41411) Multi-Stateful Operator watermark support bug fix

2022-12-06 Thread Jungtaek Lim (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim resolved SPARK-41411.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 38945
[https://github.com/apache/spark/pull/38945]

> Multi-Stateful Operator watermark support bug fix
> -
>
> Key: SPARK-41411
> URL: https://issues.apache.org/jira/browse/SPARK-41411
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.4.0
>Reporter: Wei Liu
>Assignee: Wei Liu
>Priority: Major
> Fix For: 3.4.0
>
>
> A typo in passing event time watermark to`StreamingSymmetricHashJoinExec` 
> causes logic errrors. With the bug, the query would work with no error 
> reported but producing incorrect results. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41231) Built-in SQL Function Improvement

2022-12-06 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41231:


Assignee: (was: Apache Spark)

> Built-in SQL Function Improvement
> -
>
> Key: SPARK-41231
> URL: https://issues.apache.org/jira/browse/SPARK-41231
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark, SQL
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41231) Built-in SQL Function Improvement

2022-12-06 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17644066#comment-17644066
 ] 

Apache Spark commented on SPARK-41231:
--

User 'navinvishy' has created a pull request for this issue:
https://github.com/apache/spark/pull/38947

> Built-in SQL Function Improvement
> -
>
> Key: SPARK-41231
> URL: https://issues.apache.org/jira/browse/SPARK-41231
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark, SQL
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41231) Built-in SQL Function Improvement

2022-12-06 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41231:


Assignee: Apache Spark

> Built-in SQL Function Improvement
> -
>
> Key: SPARK-41231
> URL: https://issues.apache.org/jira/browse/SPARK-41231
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark, SQL
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41233) High-order function: array_prepend

2022-12-06 Thread Navin Viswanath (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17644070#comment-17644070
 ] 

Navin Viswanath commented on SPARK-41233:
-

PR : [https://github.com/apache/spark/pull/38947]

 

 

> High-order function: array_prepend
> --
>
> Key: SPARK-41233
> URL: https://issues.apache.org/jira/browse/SPARK-41233
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>
> refer to 
> https://docs.snowflake.com/en/developer-guide/snowpark/reference/python/api/snowflake.snowpark.functions.array_prepend.html



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41410) Support PVC-oriented executor pod allocation

2022-12-06 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17644080#comment-17644080
 ] 

Apache Spark commented on SPARK-41410:
--

User 'tedyu' has created a pull request for this issue:
https://github.com/apache/spark/pull/38948

> Support PVC-oriented executor pod allocation
> 
>
> Key: SPARK-41410
> URL: https://issues.apache.org/jira/browse/SPARK-41410
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41410) Support PVC-oriented executor pod allocation

2022-12-06 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17644079#comment-17644079
 ] 

Apache Spark commented on SPARK-41410:
--

User 'tedyu' has created a pull request for this issue:
https://github.com/apache/spark/pull/38948

> Support PVC-oriented executor pod allocation
> 
>
> Key: SPARK-41410
> URL: https://issues.apache.org/jira/browse/SPARK-41410
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-41415) SASL Request Retry

2022-12-06 Thread Aravind Patnam (Jira)

Aravind Patnam created SPARK-41415:
--

 Summary: SASL Request Retry
 Key: SPARK-41415
 URL: https://issues.apache.org/jira/browse/SPARK-41415
 Project: Spark
  Issue Type: Task
  Components: Shuffle
Affects Versions: 3.2.4
Reporter: Aravind Patnam






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-41415) SASL Request Retries

2022-12-06 Thread Aravind Patnam (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aravind Patnam updated SPARK-41415:
---
Summary: SASL Request Retries  (was: SASL Request Retry)

> SASL Request Retries
> 
>
> Key: SPARK-41415
> URL: https://issues.apache.org/jira/browse/SPARK-41415
> Project: Spark
>  Issue Type: Task
>  Components: Shuffle
>Affects Versions: 3.2.4
>Reporter: Aravind Patnam
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41410) Support PVC-oriented executor pod allocation

2022-12-06 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17644082#comment-17644082
 ] 

Apache Spark commented on SPARK-41410:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/38949

> Support PVC-oriented executor pod allocation
> 
>
> Key: SPARK-41410
> URL: https://issues.apache.org/jira/browse/SPARK-41410
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41410) Support PVC-oriented executor pod allocation

2022-12-06 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17644083#comment-17644083
 ] 

Apache Spark commented on SPARK-41410:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/38949

> Support PVC-oriented executor pod allocation
> 
>
> Key: SPARK-41410
> URL: https://issues.apache.org/jira/browse/SPARK-41410
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41344) Reading V2 datasource masks underlying error

2022-12-06 Thread Zhen Wang (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17644087#comment-17644087
 ] 

Zhen Wang commented on SPARK-41344:
---

[~planga82]  Thanks for your reply, I have submitted a PR 
[https://github.com/apache/spark/pull/38871], can you help me review it?

 

> Maybe the best solution is to have another function that does not catch those 
> exceptions to use in this case and does not return an option.

Does this mean we need to add a new method in CatalogV2Util?

> Reading V2 datasource masks underlying error
> 
>
> Key: SPARK-41344
> URL: https://issues.apache.org/jira/browse/SPARK-41344
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0, 3.3.1, 3.4.0
>Reporter: Kevin Cheung
>Priority: Critical
> Attachments: image-2022-12-03-09-24-43-285.png
>
>
> In Spark 3.3, 
>  # DataSourceV2Utils, the loadV2Source calls: 
> {*}(CatalogV2Util.loadTable(catalog, ident, timeTravel).get{*}, 
> Some(catalog), Some(ident)).
>  # CatalogV2Util.scala, when it tries to *loadTable(x,x,x)* and it fails with 
> any of these exceptions NoSuchTableException, NoSuchDatabaseException, 
> NoSuchNamespaceException, it would return None
>  # Coming back to DataSourceV2Utils, None was previously returned and calling 
> None.get results in a cryptic error technically "correct", but the *original 
> exceptions NoSuchTableException, NoSuchDatabaseException, 
> NoSuchNamespaceException are thrown away.*
>  
> *Ask:*
> Retain the original error and propagate this to the user. Prior to Spark 3.3, 
> the *original error* was shown and this seems like a design flaw.
>  
> *Sample user facing error:*
> None.get
> java.util.NoSuchElementException: None.get
>     at scala.None$.get(Option.scala:529)
>     at scala.None$.get(Option.scala:527)
>     at 
> org.apache.spark.sql.execution.datasources.v2.DataSourceV2Utils$.loadV2Source(DataSourceV2Utils.scala:129)
>     at 
> org.apache.spark.sql.DataFrameReader.$anonfun$load$1(DataFrameReader.scala:209)
>     at scala.Option.flatMap(Option.scala:271)
>     at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:207)
>     at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:171)
>  
> *DataSourceV2Utils.scala - CatalogV2Util.loadTable(x,x,x).get*
> [https://github.com/apache/spark/blob/7fd654c0142ab9e4002882da4e65d3b25bebd26c/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Utils.scala#L137]
> *CatalogV2Util.scala - Option(catalog.asTableCatalog.loadTable(ident))*
> {*}{{*}}[https://github.com/apache/spark/blob/7fd654c0142ab9e4002882da4e65d3b25bebd26c/sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogV2Util.scala#L341]
> *CatalogV2Util.scala - catching the exceptions and return None*
> [https://github.com/apache/spark/blob/7fd654c0142ab9e4002882da4e65d3b25bebd26c/sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogV2Util.scala#L344]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41413) Storage-Partitioned Join should avoid shuffle when partition keys mismatch, but join expressions are compatible

2022-12-06 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17644088#comment-17644088
 ] 

Apache Spark commented on SPARK-41413:
--

User 'sunchao' has created a pull request for this issue:
https://github.com/apache/spark/pull/38950

> Storage-Partitioned Join should avoid shuffle when partition keys mismatch, 
> but join expressions are compatible
> ---
>
> Key: SPARK-41413
> URL: https://issues.apache.org/jira/browse/SPARK-41413
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.1
>Reporter: Chao Sun
>Priority: Major
>
> Currently when checking whether two sides of a Storage Partitioned Join are 
> compatible, we requires both the partition expressions as well as the 
> partition keys are compatible. However, this condition could be relaxed so 
> that we only require the former. In the case that the latter is not 
> compatible, we can calculate a common superset of keys and push down the 
> information to both sides of the join, and use empty partitions for the 
> missing keys.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41413) Storage-Partitioned Join should avoid shuffle when partition keys mismatch, but join expressions are compatible

2022-12-06 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41413:


Assignee: Apache Spark

> Storage-Partitioned Join should avoid shuffle when partition keys mismatch, 
> but join expressions are compatible
> ---
>
> Key: SPARK-41413
> URL: https://issues.apache.org/jira/browse/SPARK-41413
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.1
>Reporter: Chao Sun
>Assignee: Apache Spark
>Priority: Major
>
> Currently when checking whether two sides of a Storage Partitioned Join are 
> compatible, we requires both the partition expressions as well as the 
> partition keys are compatible. However, this condition could be relaxed so 
> that we only require the former. In the case that the latter is not 
> compatible, we can calculate a common superset of keys and push down the 
> information to both sides of the join, and use empty partitions for the 
> missing keys.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41413) Storage-Partitioned Join should avoid shuffle when partition keys mismatch, but join expressions are compatible

2022-12-06 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41413:


Assignee: (was: Apache Spark)

> Storage-Partitioned Join should avoid shuffle when partition keys mismatch, 
> but join expressions are compatible
> ---
>
> Key: SPARK-41413
> URL: https://issues.apache.org/jira/browse/SPARK-41413
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.1
>Reporter: Chao Sun
>Priority: Major
>
> Currently when checking whether two sides of a Storage Partitioned Join are 
> compatible, we requires both the partition expressions as well as the 
> partition keys are compatible. However, this condition could be relaxed so 
> that we only require the former. In the case that the latter is not 
> compatible, we can calculate a common superset of keys and push down the 
> information to both sides of the join, and use empty partitions for the 
> missing keys.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-41366) DF.groupby.agg() API should be compatible

2022-12-06 Thread Jira



 [ 
https://issues.apache.org/jira/browse/SPARK-41366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hövell resolved SPARK-41366.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

> DF.groupby.agg() API should be compatible
> -
>
> Key: SPARK-41366
> URL: https://issues.apache.org/jira/browse/SPARK-41366
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-41416) Rewrite self join in in predicate to aggregate

2022-12-06 Thread Wan Kun (Jira)

Wan Kun created SPARK-41416:
---

 Summary: Rewrite self join in in predicate to aggregate
 Key: SPARK-41416
 URL: https://issues.apache.org/jira/browse/SPARK-41416
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.4.0
Reporter: Wan Kun


Transforms the SelfJoin resulting in duplicate rows used for IN predicate to 
aggregation.
For IN predicate, duplicate rows does not have any value. It will be overhead.

Ex: TPCDS Q95: following CTE is used only in IN predicates for only one column 
comparison ({@code ws_order_number}).
This results in exponential increase in Joined rows with too many duplicate 
rows.


{code:java}
WITH ws_wh AS
(
   SELECT ws1.ws_order_number,
  ws1.ws_warehouse_sk wh1,
  ws2.ws_warehouse_sk wh2
   FROM   web_sales ws1,
  web_sales ws2
   WHERE  ws1.ws_order_number = ws2.ws_order_number
   ANDws1.ws_warehouse_sk <> ws2.ws_warehouse_sk)
{code}



Could be optimized as below:


{code:java}
WITH ws_wh AS
(SELECT ws_order_number
  FROM  web_sales
  GROUP BY ws_order_number
  HAVING COUNT(DISTINCT ws_warehouse_sk) > 1)
{code}


Optimized CTE scans table only once and results in unique rows.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41416) Rewrite self join in in predicate to aggregate

2022-12-06 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17644091#comment-17644091
 ] 

Apache Spark commented on SPARK-41416:
--

User 'wankunde' has created a pull request for this issue:
https://github.com/apache/spark/pull/38951

> Rewrite self join in in predicate to aggregate
> --
>
> Key: SPARK-41416
> URL: https://issues.apache.org/jira/browse/SPARK-41416
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Wan Kun
>Priority: Major
>
> Transforms the SelfJoin resulting in duplicate rows used for IN predicate to 
> aggregation.
> For IN predicate, duplicate rows does not have any value. It will be overhead.
> Ex: TPCDS Q95: following CTE is used only in IN predicates for only one 
> column comparison ({@code ws_order_number}).
> This results in exponential increase in Joined rows with too many duplicate 
> rows.
> {code:java}
> WITH ws_wh AS
> (
>SELECT ws1.ws_order_number,
>   ws1.ws_warehouse_sk wh1,
>   ws2.ws_warehouse_sk wh2
>FROM   web_sales ws1,
>   web_sales ws2
>WHERE  ws1.ws_order_number = ws2.ws_order_number
>ANDws1.ws_warehouse_sk <> ws2.ws_warehouse_sk)
> {code}
> Could be optimized as below:
> {code:java}
> WITH ws_wh AS
> (SELECT ws_order_number
>   FROM  web_sales
>   GROUP BY ws_order_number
>   HAVING COUNT(DISTINCT ws_warehouse_sk) > 1)
> {code}
> Optimized CTE scans table only once and results in unique rows.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41416) Rewrite self join in in predicate to aggregate

2022-12-06 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41416:


Assignee: Apache Spark

> Rewrite self join in in predicate to aggregate
> --
>
> Key: SPARK-41416
> URL: https://issues.apache.org/jira/browse/SPARK-41416
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Wan Kun
>Assignee: Apache Spark
>Priority: Major
>
> Transforms the SelfJoin resulting in duplicate rows used for IN predicate to 
> aggregation.
> For IN predicate, duplicate rows does not have any value. It will be overhead.
> Ex: TPCDS Q95: following CTE is used only in IN predicates for only one 
> column comparison ({@code ws_order_number}).
> This results in exponential increase in Joined rows with too many duplicate 
> rows.
> {code:java}
> WITH ws_wh AS
> (
>SELECT ws1.ws_order_number,
>   ws1.ws_warehouse_sk wh1,
>   ws2.ws_warehouse_sk wh2
>FROM   web_sales ws1,
>   web_sales ws2
>WHERE  ws1.ws_order_number = ws2.ws_order_number
>ANDws1.ws_warehouse_sk <> ws2.ws_warehouse_sk)
> {code}
> Could be optimized as below:
> {code:java}
> WITH ws_wh AS
> (SELECT ws_order_number
>   FROM  web_sales
>   GROUP BY ws_order_number
>   HAVING COUNT(DISTINCT ws_warehouse_sk) > 1)
> {code}
> Optimized CTE scans table only once and results in unique rows.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41416) Rewrite self join in in predicate to aggregate

2022-12-06 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41416:


Assignee: (was: Apache Spark)

> Rewrite self join in in predicate to aggregate
> --
>
> Key: SPARK-41416
> URL: https://issues.apache.org/jira/browse/SPARK-41416
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Wan Kun
>Priority: Major
>
> Transforms the SelfJoin resulting in duplicate rows used for IN predicate to 
> aggregation.
> For IN predicate, duplicate rows does not have any value. It will be overhead.
> Ex: TPCDS Q95: following CTE is used only in IN predicates for only one 
> column comparison ({@code ws_order_number}).
> This results in exponential increase in Joined rows with too many duplicate 
> rows.
> {code:java}
> WITH ws_wh AS
> (
>SELECT ws1.ws_order_number,
>   ws1.ws_warehouse_sk wh1,
>   ws2.ws_warehouse_sk wh2
>FROM   web_sales ws1,
>   web_sales ws2
>WHERE  ws1.ws_order_number = ws2.ws_order_number
>ANDws1.ws_warehouse_sk <> ws2.ws_warehouse_sk)
> {code}
> Could be optimized as below:
> {code:java}
> WITH ws_wh AS
> (SELECT ws_order_number
>   FROM  web_sales
>   GROUP BY ws_order_number
>   HAVING COUNT(DISTINCT ws_warehouse_sk) > 1)
> {code}
> Optimized CTE scans table only once and results in unique rows.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-39865) Show proper error messages on the overflow errors of table insert

2022-12-06 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-39865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17644092#comment-17644092
 ] 

Apache Spark commented on SPARK-39865:
--

User 'gengliangwang' has created a pull request for this issue:
https://github.com/apache/spark/pull/38952

> Show proper error messages on the overflow errors of table insert
> -
>
> Key: SPARK-39865
> URL: https://issues.apache.org/jira/browse/SPARK-39865
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0, 3.4.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
> Fix For: 3.3.1
>
>
> In Spark 3.3, the error message of ANSI CAST is improved. However, the table 
> insertion is using the same CAST expression:
> {code:java}
> > create table tiny(i tinyint);
> > insert into tiny values (1000);
> org.apache.spark.SparkArithmeticException[CAST_OVERFLOW]: The value 1000 of 
> the type "INT" cannot be cast to "TINYINT" due to an overflow. Use `try_cast` 
> to tolerate overflow and return NULL instead. If necessary set 
> "spark.sql.ansi.enabled" to "false" to bypass this error.
> {code}
>  
> Showing the hint of `If necessary set "spark.sql.ansi.enabled" to "false" to 
> bypass this error` doesn't help at all. This PR is to fix the error message. 
> After changes, the error message of this example will become:
> {code:java}
> org.apache.spark.SparkArithmeticException: [CAST_OVERFLOW_IN_TABLE_INSERT] 
> Fail to insert a value of "INT" type into the "TINYINT" type column `i` due 
> to an overflow. Use `try_cast` on the input value to tolerate overflow and 
> return NULL instead.{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41369) Refactor connect directory structure

2022-12-06 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17644094#comment-17644094
 ] 

Apache Spark commented on SPARK-41369:
--

User 'hvanhovell' has created a pull request for this issue:
https://github.com/apache/spark/pull/38953

> Refactor connect directory structure
> 
>
> Key: SPARK-41369
> URL: https://issues.apache.org/jira/browse/SPARK-41369
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.3.2, 3.4.0
>Reporter: Venkata Sai Akhil Gudesa
>Priority: Major
>
> Currently, `spark/connector/connect/` is a single module that contains both 
> the "server"/service as well as the protobuf definitions.
> However, this module can be split into multiple modules - "server" and 
> "common". This brings the advantage of separating out the protobuf generation 
> from the core "server" module for efficient reuse.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-41417) Assign a name to the error class _LEGACY_ERROR_TEMP_0019

2022-12-06 Thread Yang Jie (Jira)

Yang Jie created SPARK-41417:


 Summary: Assign a name to the error class _LEGACY_ERROR_TEMP_0019
 Key: SPARK-41417
 URL: https://issues.apache.org/jira/browse/SPARK-41417
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core, SQL
Affects Versions: 3.4.0
Reporter: Yang Jie






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-41369) Refactor connect directory structure

2022-12-06 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-41369.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 38953
[https://github.com/apache/spark/pull/38953]

> Refactor connect directory structure
> 
>
> Key: SPARK-41369
> URL: https://issues.apache.org/jira/browse/SPARK-41369
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.3.2, 3.4.0
>Reporter: Venkata Sai Akhil Gudesa
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.4.0
>
>
> Currently, `spark/connector/connect/` is a single module that contains both 
> the "server"/service as well as the protobuf definitions.
> However, this module can be split into multiple modules - "server" and 
> "common". This brings the advantage of separating out the protobuf generation 
> from the core "server" module for efficient reuse.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41369) Refactor connect directory structure

2022-12-06 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-41369:


Assignee: Hyukjin Kwon

> Refactor connect directory structure
> 
>
> Key: SPARK-41369
> URL: https://issues.apache.org/jira/browse/SPARK-41369
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.3.2, 3.4.0
>Reporter: Venkata Sai Akhil Gudesa
>Assignee: Hyukjin Kwon
>Priority: Major
>
> Currently, `spark/connector/connect/` is a single module that contains both 
> the "server"/service as well as the protobuf definitions.
> However, this module can be split into multiple modules - "server" and 
> "common". This brings the advantage of separating out the protobuf generation 
> from the core "server" module for efficient reuse.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41417) Assign a name to the error class _LEGACY_ERROR_TEMP_0019

2022-12-06 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41417:


Assignee: (was: Apache Spark)

> Assign a name to the error class _LEGACY_ERROR_TEMP_0019
> 
>
> Key: SPARK-41417
> URL: https://issues.apache.org/jira/browse/SPARK-41417
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

1 2 >

1 - 100 of 170 matches

Mail list logo