[jira] [Comment Edited] (SPARK-41266) Spark does not parse timestamp strings when using the IN operator
[ https://issues.apache.org/jira/browse/SPARK-41266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17643706#comment-17643706 ] huldar chen edited comment on SPARK-41266 at 12/6/22 8:11 AM: -- You can try to use ANSI compliance: {code:java} spark.sql.ansi.enabled=true {code} In the default hive compliance: promotes all the way to StringType. In the ANSI compliance: promotes StringType to other data types. was (Author: huldar): You can try to use ANSI compliance: {code:java} spark.sql.ansi.enabled=true {code} > Spark does not parse timestamp strings when using the IN operator > - > > Key: SPARK-41266 > URL: https://issues.apache.org/jira/browse/SPARK-41266 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.1 > Environment: Windows 10, Spark 3.2.1 with Java 11 >Reporter: Laurens Versluis >Priority: Major > > Likely affects more versions, tested only with 3.2.1. > > Summary: > Spark will convert a timestamp string to a timestamp when using the equal > operator (=), yet won't do this when using the IN operator. > > Details: > While debugging an issue why we got no results on a query, we found out that > when using the equal symbol `=` in the WHERE clause combined with a > TimeStampType column that Spark will convert the string to a timestamp and > filter. > However, when using the IN operator (our query), it will not do so, and > perform a cast to string. We expected the behavior to be similar, or at least > that Spark realizes the IN clause operates on a TimeStampType column and thus > attempts to convert to timestamp first before falling back to string > comparison. > > *Minimal reproducible example:* > Suppose we have a one-line dataset with the follow contents and schema: > > {noformat} > ++ > |starttime | > ++ > |2019-08-11 19:33:05 | > ++ > root > |-- starttime: timestamp (nullable = true){noformat} > Then if we fire the following queries, we will not get results for the > IN-clause one using a timestamp string with timezone information: > > > {code:java} > // Works - Spark casts the argument to a string and the internal > representation of the time seems to match it... > singleCol.filter("starttime IN ('2019-08-11 19:33:05')").show(); > // Works > singleCol.filter("starttime = '2019-08-11 19:33:05'").show(); > // Works > singleCol.filter("starttime = '2019-08-11T19:33:05Z'").show(); > // Doesn't work > singleCol.filter("starttime IN ('2019-08-11T19:33:05Z')").show(); > //Works > singleCol.filter("starttime IN > (to_timestamp('2019-08-11T19:33:05Z'))").show(); {code} > > We can see from the output that a cast to string is taking place: > {noformat} > [...] isnotnull(starttime#59),(cast(starttime#59 as string) = 2019-08-11 > 19:33:05){noformat} > Since the = operator does work, it would be consistent if operators such as > the IN operator would have similar, consistent behavior. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41403) Implement DataFrame.describe
Ruifeng Zheng created SPARK-41403: - Summary: Implement DataFrame.describe Key: SPARK-41403 URL: https://issues.apache.org/jira/browse/SPARK-41403 Project: Spark Issue Type: Sub-task Components: Connect, PySpark Affects Versions: 3.4.0 Reporter: Ruifeng Zheng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41403) Implement DataFrame.describe
[ https://issues.apache.org/jira/browse/SPARK-41403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17643714#comment-17643714 ] Ruifeng Zheng commented on SPARK-41403: --- [~beliefer] Jiaan, would you want to have a try? You may refer to https://issues.apache.org/jira/browse/SPARK-40852 > Implement DataFrame.describe > > > Key: SPARK-41403 > URL: https://issues.apache.org/jira/browse/SPARK-41403 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41404) Support `ColumnarBatchSuite#testRandomRows` to test more primitive dataType
Yang Jie created SPARK-41404: Summary: Support `ColumnarBatchSuite#testRandomRows` to test more primitive dataType Key: SPARK-41404 URL: https://issues.apache.org/jira/browse/SPARK-41404 Project: Spark Issue Type: Improvement Components: Tests Affects Versions: 3.4.0 Reporter: Yang Jie -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41404) Support `ColumnarBatchSuite#testRandomRows` to test more primitive dataType
[ https://issues.apache.org/jira/browse/SPARK-41404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41404: Assignee: Apache Spark > Support `ColumnarBatchSuite#testRandomRows` to test more primitive dataType > --- > > Key: SPARK-41404 > URL: https://issues.apache.org/jira/browse/SPARK-41404 > Project: Spark > Issue Type: Improvement > Components: Tests >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41404) Support `ColumnarBatchSuite#testRandomRows` to test more primitive dataType
[ https://issues.apache.org/jira/browse/SPARK-41404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17643718#comment-17643718 ] Apache Spark commented on SPARK-41404: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/38933 > Support `ColumnarBatchSuite#testRandomRows` to test more primitive dataType > --- > > Key: SPARK-41404 > URL: https://issues.apache.org/jira/browse/SPARK-41404 > Project: Spark > Issue Type: Improvement > Components: Tests >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41404) Support `ColumnarBatchSuite#testRandomRows` to test more primitive dataType
[ https://issues.apache.org/jira/browse/SPARK-41404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41404: Assignee: (was: Apache Spark) > Support `ColumnarBatchSuite#testRandomRows` to test more primitive dataType > --- > > Key: SPARK-41404 > URL: https://issues.apache.org/jira/browse/SPARK-41404 > Project: Spark > Issue Type: Improvement > Components: Tests >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41403) Implement DataFrame.describe
[ https://issues.apache.org/jira/browse/SPARK-41403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17643733#comment-17643733 ] jiaan.geng commented on SPARK-41403: [~podongfeng] Thank you for your ping. I will try to do this! > Implement DataFrame.describe > > > Key: SPARK-41403 > URL: https://issues.apache.org/jira/browse/SPARK-41403 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41405) centralize the column resolution logic
Wenchen Fan created SPARK-41405: --- Summary: centralize the column resolution logic Key: SPARK-41405 URL: https://issues.apache.org/jira/browse/SPARK-41405 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.4.0 Reporter: Wenchen Fan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41405) centralize the column resolution logic
[ https://issues.apache.org/jira/browse/SPARK-41405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17643742#comment-17643742 ] Apache Spark commented on SPARK-41405: -- User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/3 > centralize the column resolution logic > -- > > Key: SPARK-41405 > URL: https://issues.apache.org/jira/browse/SPARK-41405 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41405) centralize the column resolution logic
[ https://issues.apache.org/jira/browse/SPARK-41405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41405: Assignee: Apache Spark > centralize the column resolution logic > -- > > Key: SPARK-41405 > URL: https://issues.apache.org/jira/browse/SPARK-41405 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Wenchen Fan >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41405) centralize the column resolution logic
[ https://issues.apache.org/jira/browse/SPARK-41405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41405: Assignee: (was: Apache Spark) > centralize the column resolution logic > -- > > Key: SPARK-41405 > URL: https://issues.apache.org/jira/browse/SPARK-41405 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41405) centralize the column resolution logic
[ https://issues.apache.org/jira/browse/SPARK-41405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17643743#comment-17643743 ] Apache Spark commented on SPARK-41405: -- User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/3 > centralize the column resolution logic > -- > > Key: SPARK-41405 > URL: https://issues.apache.org/jira/browse/SPARK-41405 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41317) PySpark write API for Spark Connect
[ https://issues.apache.org/jira/browse/SPARK-41317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17643746#comment-17643746 ] Apache Spark commented on SPARK-41317: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/38934 > PySpark write API for Spark Connect > --- > > Key: SPARK-41317 > URL: https://issues.apache.org/jira/browse/SPARK-41317 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41317) PySpark write API for Spark Connect
[ https://issues.apache.org/jira/browse/SPARK-41317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17643747#comment-17643747 ] Apache Spark commented on SPARK-41317: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/38934 > PySpark write API for Spark Connect > --- > > Key: SPARK-41317 > URL: https://issues.apache.org/jira/browse/SPARK-41317 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41121) Upgrade sbt-assembly from 1.2.0 to 2.0.0
[ https://issues.apache.org/jira/browse/SPARK-41121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-41121. --- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 38637 [https://github.com/apache/spark/pull/38637] > Upgrade sbt-assembly from 1.2.0 to 2.0.0 > > > Key: SPARK-41121 > URL: https://issues.apache.org/jira/browse/SPARK-41121 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41121) Upgrade sbt-assembly from 1.2.0 to 2.0.0
[ https://issues.apache.org/jira/browse/SPARK-41121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-41121: - Assignee: BingKun Pan > Upgrade sbt-assembly from 1.2.0 to 2.0.0 > > > Key: SPARK-41121 > URL: https://issues.apache.org/jira/browse/SPARK-41121 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41406) Refactor error message for `NUM_COLUMNS_MISMATCH` to make it more generic
BingKun Pan created SPARK-41406: --- Summary: Refactor error message for `NUM_COLUMNS_MISMATCH` to make it more generic Key: SPARK-41406 URL: https://issues.apache.org/jira/browse/SPARK-41406 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.4.0 Reporter: BingKun Pan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41392) spark builds against hadoop trunk/3.4.0-SNAPSHOT fail in scala-maven plugin
[ https://issues.apache.org/jira/browse/SPARK-41392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17643774#comment-17643774 ] Steve Loughran commented on SPARK-41392: may relate to the bouncy castle 1.68 update of HADOOP-1756 -but this is also in the 3.3.5/3.3 branches and spark is happy there. so there must be more to it > spark builds against hadoop trunk/3.4.0-SNAPSHOT fail in scala-maven plugin > --- > > Key: SPARK-41392 > URL: https://issues.apache.org/jira/browse/SPARK-41392 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.4.0 >Reporter: Steve Loughran >Priority: Minor > > on hadoop trunk (but not the 3.3.x line), spark builds fail with a CNFE > {code} > net.alchim31.maven:scala-maven-plugin:4.7.2:testCompile: > org/bouncycastle/jce/provider/BouncyCastleProvider > {code} > full stack > {code} > [ERROR] Failed to execute goal > net.alchim31.maven:scala-maven-plugin:4.7.2:testCompile > (scala-test-compile-first) on project spark-sql_2.12: Execution > scala-test-compile-first of goal > net.alchim31.maven:scala-maven-plugin:4.7.2:testCompile failed: A required > class was missing while executing > net.alchim31.maven:scala-maven-plugin:4.7.2:testCompile: > org/bouncycastle/jce/provider/BouncyCastleProvider > [ERROR] - > [ERROR] realm =plugin>net.alchim31.maven:scala-maven-plugin:4.7.2 > [ERROR] strategy = org.codehaus.plexus.classworlds.strategy.SelfFirstStrategy > [ERROR] urls[0] = > file:/Users/stevel/.m2/repository/net/alchim31/maven/scala-maven-plugin/4.7.2/scala-maven-plugin-4.7.2.jar > [ERROR] urls[1] = > file:/Users/stevel/.m2/repository/org/apache/maven/shared/maven-dependency-tree/3.2.0/maven-dependency-tree-3.2.0.jar > [ERROR] urls[2] = > file:/Users/stevel/.m2/repository/org/eclipse/aether/aether-util/1.0.0.v20140518/aether-util-1.0.0.v20140518.jar > [ERROR] urls[3] = > file:/Users/stevel/.m2/repository/org/apache/maven/reporting/maven-reporting-api/3.1.1/maven-reporting-api-3.1.1.jar > [ERROR] urls[4] = > file:/Users/stevel/.m2/repository/org/apache/maven/doxia/doxia-sink-api/1.11.1/doxia-sink-api-1.11.1.jar > [ERROR] urls[5] = > file:/Users/stevel/.m2/repository/org/apache/maven/doxia/doxia-logging-api/1.11.1/doxia-logging-api-1.11.1.jar > [ERROR] urls[6] = > file:/Users/stevel/.m2/repository/org/apache/maven/maven-archiver/3.6.0/maven-archiver-3.6.0.jar > [ERROR] urls[7] = > file:/Users/stevel/.m2/repository/org/codehaus/plexus/plexus-io/3.4.0/plexus-io-3.4.0.jar > [ERROR] urls[8] = > file:/Users/stevel/.m2/repository/org/codehaus/plexus/plexus-interpolation/1.26/plexus-interpolation-1.26.jar > [ERROR] urls[9] = > file:/Users/stevel/.m2/repository/org/apache/commons/commons-exec/1.3/commons-exec-1.3.jar > [ERROR] urls[10] = > file:/Users/stevel/.m2/repository/org/codehaus/plexus/plexus-utils/3.4.2/plexus-utils-3.4.2.jar > [ERROR] urls[11] = > file:/Users/stevel/.m2/repository/org/codehaus/plexus/plexus-archiver/4.5.0/plexus-archiver-4.5.0.jar > [ERROR] urls[12] = > file:/Users/stevel/.m2/repository/commons-io/commons-io/2.11.0/commons-io-2.11.0.jar > [ERROR] urls[13] = > file:/Users/stevel/.m2/repository/org/apache/commons/commons-compress/1.21/commons-compress-1.21.jar > [ERROR] urls[14] = > file:/Users/stevel/.m2/repository/org/iq80/snappy/snappy/0.4/snappy-0.4.jar > [ERROR] urls[15] = > file:/Users/stevel/.m2/repository/org/tukaani/xz/1.9/xz-1.9.jar > [ERROR] urls[16] = > file:/Users/stevel/.m2/repository/com/github/luben/zstd-jni/1.5.2-4/zstd-jni-1.5.2-4.jar > [ERROR] urls[17] = > file:/Users/stevel/.m2/repository/org/scala-sbt/zinc_2.13/1.7.1/zinc_2.13-1.7.1.jar > [ERROR] urls[18] = > file:/Users/stevel/.m2/repository/org/scala-lang/scala-library/2.13.8/scala-library-2.13.8.jar > [ERROR] urls[19] = > file:/Users/stevel/.m2/repository/org/scala-sbt/zinc-core_2.13/1.7.1/zinc-core_2.13-1.7.1.jar > [ERROR] urls[20] = > file:/Users/stevel/.m2/repository/org/scala-sbt/zinc-apiinfo_2.13/1.7.1/zinc-apiinfo_2.13-1.7.1.jar > [ERROR] urls[21] = > file:/Users/stevel/.m2/repository/org/scala-sbt/compiler-bridge_2.13/1.7.1/compiler-bridge_2.13-1.7.1.jar > [ERROR] urls[22] = > file:/Users/stevel/.m2/repository/org/scala-sbt/zinc-classpath_2.13/1.7.1/zinc-classpath_2.13-1.7.1.jar > [ERROR] urls[23] = > file:/Users/stevel/.m2/repository/org/scala-lang/scala-compiler/2.13.8/scala-compiler-2.13.8.jar > [ERROR] urls[24] = > file:/Users/stevel/.m2/repository/org/scala-sbt/compiler-interface/1.7.1/compiler-interface-1.7.1.jar > [ERROR] urls[25] = > file:/Users/stevel/.m2/repository/org/scala-sbt/util-interface/1.7.0/util-interface-1.7.0.jar > [ERROR] urls[26] = > file:/Users/stevel/.m2/repository/org/scala-sbt/zinc-persist-core-assembly/1.7
[jira] [Commented] (SPARK-41319) when-otherwise support
[ https://issues.apache.org/jira/browse/SPARK-41319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17643776#comment-17643776 ] Apache Spark commented on SPARK-41319: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/38935 > when-otherwise support > -- > > Key: SPARK-41319 > URL: https://issues.apache.org/jira/browse/SPARK-41319 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Major > > 1, add protobuf message for expression 'CaseWhen'; > 2, support the 'Column.\{when, otherwise\}' methods in Spark Connect. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41319) when-otherwise support
[ https://issues.apache.org/jira/browse/SPARK-41319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17643775#comment-17643775 ] Apache Spark commented on SPARK-41319: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/38935 > when-otherwise support > -- > > Key: SPARK-41319 > URL: https://issues.apache.org/jira/browse/SPARK-41319 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Major > > 1, add protobuf message for expression 'CaseWhen'; > 2, support the 'Column.\{when, otherwise\}' methods in Spark Connect. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41319) when-otherwise support
[ https://issues.apache.org/jira/browse/SPARK-41319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41319: Assignee: Apache Spark > when-otherwise support > -- > > Key: SPARK-41319 > URL: https://issues.apache.org/jira/browse/SPARK-41319 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Apache Spark >Priority: Major > > 1, add protobuf message for expression 'CaseWhen'; > 2, support the 'Column.\{when, otherwise\}' methods in Spark Connect. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41319) when-otherwise support
[ https://issues.apache.org/jira/browse/SPARK-41319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41319: Assignee: (was: Apache Spark) > when-otherwise support > -- > > Key: SPARK-41319 > URL: https://issues.apache.org/jira/browse/SPARK-41319 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Major > > 1, add protobuf message for expression 'CaseWhen'; > 2, support the 'Column.\{when, otherwise\}' methods in Spark Connect. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28869) Roll over event log files
[ https://issues.apache.org/jira/browse/SPARK-28869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ranga Reddy updated SPARK-28869: Attachment: application_1670216197043_0012.log > Roll over event log files > - > > Key: SPARK-28869 > URL: https://issues.apache.org/jira/browse/SPARK-28869 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Jungtaek Lim >Assignee: Jungtaek Lim >Priority: Major > Fix For: 3.0.0 > > Attachments: application_1670216197043_0012.log > > > This issue tracks the effort on rolling over event log files in driver and > let SHS replay the multiple event log files correctly. > This issue doesn't deal with overall size of event log, as well as no > guarantee when deleting old event log files. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28869) Roll over event log files
[ https://issues.apache.org/jira/browse/SPARK-28869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17643804#comment-17643804 ] Ranga Reddy commented on SPARK-28869: - Hi [~kabhwan] I have enabled the eventlog rolling for the spark streaming network word count example, but event log files are not compacted. *Configuration Parameters:* {code:java} spark.eventLog.rolling.enabled=true spark.eventLog.rolling.maxFileSize=10m spark.history.fs.eventLog.rolling.maxFilesToRetain=2 spark.history.fs.cleaner.interval=1800{code} *Event log file list:* [^application_1670216197043_0012.log] ^Could you please check the issue.^ > Roll over event log files > - > > Key: SPARK-28869 > URL: https://issues.apache.org/jira/browse/SPARK-28869 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Jungtaek Lim >Assignee: Jungtaek Lim >Priority: Major > Fix For: 3.0.0 > > Attachments: application_1670216197043_0012.log > > > This issue tracks the effort on rolling over event log files in driver and > let SHS replay the multiple event log files correctly. > This issue doesn't deal with overall size of event log, as well as no > guarantee when deleting old event log files. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41406) Refactor error message for `NUM_COLUMNS_MISMATCH` to make it more generic
[ https://issues.apache.org/jira/browse/SPARK-41406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17643845#comment-17643845 ] Apache Spark commented on SPARK-41406: -- User 'panbingkun' has created a pull request for this issue: https://github.com/apache/spark/pull/38937 > Refactor error message for `NUM_COLUMNS_MISMATCH` to make it more generic > - > > Key: SPARK-41406 > URL: https://issues.apache.org/jira/browse/SPARK-41406 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: BingKun Pan >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41406) Refactor error message for `NUM_COLUMNS_MISMATCH` to make it more generic
[ https://issues.apache.org/jira/browse/SPARK-41406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41406: Assignee: Apache Spark > Refactor error message for `NUM_COLUMNS_MISMATCH` to make it more generic > - > > Key: SPARK-41406 > URL: https://issues.apache.org/jira/browse/SPARK-41406 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: BingKun Pan >Assignee: Apache Spark >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41406) Refactor error message for `NUM_COLUMNS_MISMATCH` to make it more generic
[ https://issues.apache.org/jira/browse/SPARK-41406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41406: Assignee: (was: Apache Spark) > Refactor error message for `NUM_COLUMNS_MISMATCH` to make it more generic > - > > Key: SPARK-41406 > URL: https://issues.apache.org/jira/browse/SPARK-41406 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: BingKun Pan >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41403) Implement DataFrame.describe
[ https://issues.apache.org/jira/browse/SPARK-41403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41403: Assignee: Apache Spark > Implement DataFrame.describe > > > Key: SPARK-41403 > URL: https://issues.apache.org/jira/browse/SPARK-41403 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41403) Implement DataFrame.describe
[ https://issues.apache.org/jira/browse/SPARK-41403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41403: Assignee: (was: Apache Spark) > Implement DataFrame.describe > > > Key: SPARK-41403 > URL: https://issues.apache.org/jira/browse/SPARK-41403 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41403) Implement DataFrame.describe
[ https://issues.apache.org/jira/browse/SPARK-41403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17643850#comment-17643850 ] Apache Spark commented on SPARK-41403: -- User 'beliefer' has created a pull request for this issue: https://github.com/apache/spark/pull/38938 > Implement DataFrame.describe > > > Key: SPARK-41403 > URL: https://issues.apache.org/jira/browse/SPARK-41403 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41407) Pull out v1 write to WriteFiles
XiDuo You created SPARK-41407: - Summary: Pull out v1 write to WriteFiles Key: SPARK-41407 URL: https://issues.apache.org/jira/browse/SPARK-41407 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.4.0 Reporter: XiDuo You Add new plan WriteFiles to do write files for v1writes. We can make v1 write support whole stage codegen in future. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41407) Pull out v1 write to WriteFiles
[ https://issues.apache.org/jira/browse/SPARK-41407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17643859#comment-17643859 ] Apache Spark commented on SPARK-41407: -- User 'ulysses-you' has created a pull request for this issue: https://github.com/apache/spark/pull/38939 > Pull out v1 write to WriteFiles > --- > > Key: SPARK-41407 > URL: https://issues.apache.org/jira/browse/SPARK-41407 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: XiDuo You >Priority: Major > > Add new plan WriteFiles to do write files for v1writes. > We can make v1 write support whole stage codegen in future. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41407) Pull out v1 write to WriteFiles
[ https://issues.apache.org/jira/browse/SPARK-41407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41407: Assignee: Apache Spark > Pull out v1 write to WriteFiles > --- > > Key: SPARK-41407 > URL: https://issues.apache.org/jira/browse/SPARK-41407 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: XiDuo You >Assignee: Apache Spark >Priority: Major > > Add new plan WriteFiles to do write files for v1writes. > We can make v1 write support whole stage codegen in future. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41407) Pull out v1 write to WriteFiles
[ https://issues.apache.org/jira/browse/SPARK-41407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41407: Assignee: (was: Apache Spark) > Pull out v1 write to WriteFiles > --- > > Key: SPARK-41407 > URL: https://issues.apache.org/jira/browse/SPARK-41407 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: XiDuo You >Priority: Major > > Add new plan WriteFiles to do write files for v1writes. > We can make v1 write support whole stage codegen in future. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41407) Pull out v1 write to WriteFiles
[ https://issues.apache.org/jira/browse/SPARK-41407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17643861#comment-17643861 ] Apache Spark commented on SPARK-41407: -- User 'ulysses-you' has created a pull request for this issue: https://github.com/apache/spark/pull/38939 > Pull out v1 write to WriteFiles > --- > > Key: SPARK-41407 > URL: https://issues.apache.org/jira/browse/SPARK-41407 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: XiDuo You >Priority: Major > > Add new plan WriteFiles to do write files for v1writes. > We can make v1 write support whole stage codegen in future. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41408) Upgrade scala-maven-plugin to 4.8.0
Yang Jie created SPARK-41408: Summary: Upgrade scala-maven-plugin to 4.8.0 Key: SPARK-41408 URL: https://issues.apache.org/jira/browse/SPARK-41408 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 3.4.0 Reporter: Yang Jie -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41408) Upgrade scala-maven-plugin to 4.8.0
[ https://issues.apache.org/jira/browse/SPARK-41408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17643892#comment-17643892 ] Apache Spark commented on SPARK-41408: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/38936 > Upgrade scala-maven-plugin to 4.8.0 > --- > > Key: SPARK-41408 > URL: https://issues.apache.org/jira/browse/SPARK-41408 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41408) Upgrade scala-maven-plugin to 4.8.0
[ https://issues.apache.org/jira/browse/SPARK-41408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41408: Assignee: Apache Spark > Upgrade scala-maven-plugin to 4.8.0 > --- > > Key: SPARK-41408 > URL: https://issues.apache.org/jira/browse/SPARK-41408 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41408) Upgrade scala-maven-plugin to 4.8.0
[ https://issues.apache.org/jira/browse/SPARK-41408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41408: Assignee: (was: Apache Spark) > Upgrade scala-maven-plugin to 4.8.0 > --- > > Key: SPARK-41408 > URL: https://issues.apache.org/jira/browse/SPARK-41408 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41409) Reuse `WRONG_NUM_ARGS` instead of `_LEGACY_ERROR_TEMP_1043`
Yang Jie created SPARK-41409: Summary: Reuse `WRONG_NUM_ARGS` instead of `_LEGACY_ERROR_TEMP_1043` Key: SPARK-41409 URL: https://issues.apache.org/jira/browse/SPARK-41409 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.4.0 Reporter: Yang Jie -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41409) Reuse `WRONG_NUM_ARGS` instead of `_LEGACY_ERROR_TEMP_1043`
[ https://issues.apache.org/jira/browse/SPARK-41409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41409: Assignee: (was: Apache Spark) > Reuse `WRONG_NUM_ARGS` instead of `_LEGACY_ERROR_TEMP_1043` > --- > > Key: SPARK-41409 > URL: https://issues.apache.org/jira/browse/SPARK-41409 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41409) Reuse `WRONG_NUM_ARGS` instead of `_LEGACY_ERROR_TEMP_1043`
[ https://issues.apache.org/jira/browse/SPARK-41409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41409: Assignee: Apache Spark > Reuse `WRONG_NUM_ARGS` instead of `_LEGACY_ERROR_TEMP_1043` > --- > > Key: SPARK-41409 > URL: https://issues.apache.org/jira/browse/SPARK-41409 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41409) Reuse `WRONG_NUM_ARGS` instead of `_LEGACY_ERROR_TEMP_1043`
[ https://issues.apache.org/jira/browse/SPARK-41409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17643929#comment-17643929 ] Apache Spark commented on SPARK-41409: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/38940 > Reuse `WRONG_NUM_ARGS` instead of `_LEGACY_ERROR_TEMP_1043` > --- > > Key: SPARK-41409 > URL: https://issues.apache.org/jira/browse/SPARK-41409 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41409) Reuse `WRONG_NUM_ARGS` instead of `_LEGACY_ERROR_TEMP_1043`
[ https://issues.apache.org/jira/browse/SPARK-41409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17643930#comment-17643930 ] Apache Spark commented on SPARK-41409: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/38940 > Reuse `WRONG_NUM_ARGS` instead of `_LEGACY_ERROR_TEMP_1043` > --- > > Key: SPARK-41409 > URL: https://issues.apache.org/jira/browse/SPARK-41409 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41398) Relax constraints on Storage-Partitioned Join when partition keys after runtime filtering do not match
[ https://issues.apache.org/jira/browse/SPARK-41398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-41398: - Assignee: Chao Sun > Relax constraints on Storage-Partitioned Join when partition keys after > runtime filtering do not match > -- > > Key: SPARK-41398 > URL: https://issues.apache.org/jira/browse/SPARK-41398 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.1 >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41398) Relax constraints on Storage-Partitioned Join when partition keys after runtime filtering do not match
[ https://issues.apache.org/jira/browse/SPARK-41398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-41398. --- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 38924 [https://github.com/apache/spark/pull/38924] > Relax constraints on Storage-Partitioned Join when partition keys after > runtime filtering do not match > -- > > Key: SPARK-41398 > URL: https://issues.apache.org/jira/browse/SPARK-41398 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.1 >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41393) Upgrade slf4j to 2.0.5
[ https://issues.apache.org/jira/browse/SPARK-41393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-41393. --- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 38918 [https://github.com/apache/spark/pull/38918] > Upgrade slf4j to 2.0.5 > -- > > Key: SPARK-41393 > URL: https://issues.apache.org/jira/browse/SPARK-41393 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > Fix For: 3.4.0 > > > https://www.slf4j.org/news.html#2.0.5 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41393) Upgrade slf4j to 2.0.5
[ https://issues.apache.org/jira/browse/SPARK-41393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-41393: - Assignee: Yang Jie > Upgrade slf4j to 2.0.5 > -- > > Key: SPARK-41393 > URL: https://issues.apache.org/jira/browse/SPARK-41393 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > > https://www.slf4j.org/news.html#2.0.5 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41410) Support PVC-oriented executor pod allocation
Dongjoon Hyun created SPARK-41410: - Summary: Support PVC-oriented executor pod allocation Key: SPARK-41410 URL: https://issues.apache.org/jira/browse/SPARK-41410 Project: Spark Issue Type: New Feature Components: Kubernetes Affects Versions: 3.4.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41410) Support PVC-oriented executor pod allocation
[ https://issues.apache.org/jira/browse/SPARK-41410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41410: Assignee: Apache Spark > Support PVC-oriented executor pod allocation > > > Key: SPARK-41410 > URL: https://issues.apache.org/jira/browse/SPARK-41410 > Project: Spark > Issue Type: New Feature > Components: Kubernetes >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41410) Support PVC-oriented executor pod allocation
[ https://issues.apache.org/jira/browse/SPARK-41410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17643973#comment-17643973 ] Apache Spark commented on SPARK-41410: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/38943 > Support PVC-oriented executor pod allocation > > > Key: SPARK-41410 > URL: https://issues.apache.org/jira/browse/SPARK-41410 > Project: Spark > Issue Type: New Feature > Components: Kubernetes >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41410) Support PVC-oriented executor pod allocation
[ https://issues.apache.org/jira/browse/SPARK-41410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41410: Assignee: (was: Apache Spark) > Support PVC-oriented executor pod allocation > > > Key: SPARK-41410 > URL: https://issues.apache.org/jira/browse/SPARK-41410 > Project: Spark > Issue Type: New Feature > Components: Kubernetes >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41369) Refactor connect directory structure
[ https://issues.apache.org/jira/browse/SPARK-41369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17643976#comment-17643976 ] Apache Spark commented on SPARK-41369: -- User 'hvanhovell' has created a pull request for this issue: https://github.com/apache/spark/pull/38944 > Refactor connect directory structure > > > Key: SPARK-41369 > URL: https://issues.apache.org/jira/browse/SPARK-41369 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.3.2, 3.4.0 >Reporter: Venkata Sai Akhil Gudesa >Priority: Major > > Currently, `spark/connector/connect/` is a single module that contains both > the "server"/service as well as the protobuf definitions. > However, this module can be split into multiple modules - "server" and > "common". This brings the advantage of separating out the protobuf generation > from the core "server" module for efficient reuse. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41411) Multi-Stateful Operator watermark support bug fix
Wei Liu created SPARK-41411: --- Summary: Multi-Stateful Operator watermark support bug fix Key: SPARK-41411 URL: https://issues.apache.org/jira/browse/SPARK-41411 Project: Spark Issue Type: Bug Components: Structured Streaming Affects Versions: 3.3.2, 3.4.0 Reporter: Wei Liu A typo in passing event time watermark to`StreamingSymmetricHashJoinExec` causes logic errrors. With the bug, the query would work with no error reported but producing incorrect results. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41411) Multi-Stateful Operator watermark support bug fix
[ https://issues.apache.org/jira/browse/SPARK-41411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17644006#comment-17644006 ] Wei Liu commented on SPARK-41411: - PR: https://github.com/apache/spark/pull/38945 > Multi-Stateful Operator watermark support bug fix > - > > Key: SPARK-41411 > URL: https://issues.apache.org/jira/browse/SPARK-41411 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 3.3.2, 3.4.0 >Reporter: Wei Liu >Priority: Major > > A typo in passing event time watermark to`StreamingSymmetricHashJoinExec` > causes logic errrors. With the bug, the query would work with no error > reported but producing incorrect results. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41411) Multi-Stateful Operator watermark support bug fix
[ https://issues.apache.org/jira/browse/SPARK-41411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41411: Assignee: (was: Apache Spark) > Multi-Stateful Operator watermark support bug fix > - > > Key: SPARK-41411 > URL: https://issues.apache.org/jira/browse/SPARK-41411 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 3.3.2, 3.4.0 >Reporter: Wei Liu >Priority: Major > > A typo in passing event time watermark to`StreamingSymmetricHashJoinExec` > causes logic errrors. With the bug, the query would work with no error > reported but producing incorrect results. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41411) Multi-Stateful Operator watermark support bug fix
[ https://issues.apache.org/jira/browse/SPARK-41411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41411: Assignee: Apache Spark > Multi-Stateful Operator watermark support bug fix > - > > Key: SPARK-41411 > URL: https://issues.apache.org/jira/browse/SPARK-41411 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 3.3.2, 3.4.0 >Reporter: Wei Liu >Assignee: Apache Spark >Priority: Major > > A typo in passing event time watermark to`StreamingSymmetricHashJoinExec` > causes logic errrors. With the bug, the query would work with no error > reported but producing incorrect results. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41411) Multi-Stateful Operator watermark support bug fix
[ https://issues.apache.org/jira/browse/SPARK-41411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17644007#comment-17644007 ] Apache Spark commented on SPARK-41411: -- User 'WweiL' has created a pull request for this issue: https://github.com/apache/spark/pull/38945 > Multi-Stateful Operator watermark support bug fix > - > > Key: SPARK-41411 > URL: https://issues.apache.org/jira/browse/SPARK-41411 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 3.3.2, 3.4.0 >Reporter: Wei Liu >Priority: Major > > A typo in passing event time watermark to`StreamingSymmetricHashJoinExec` > causes logic errrors. With the bug, the query would work with no error > reported but producing incorrect results. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41411) Multi-Stateful Operator watermark support bug fix
[ https://issues.apache.org/jira/browse/SPARK-41411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17644008#comment-17644008 ] Apache Spark commented on SPARK-41411: -- User 'WweiL' has created a pull request for this issue: https://github.com/apache/spark/pull/38945 > Multi-Stateful Operator watermark support bug fix > - > > Key: SPARK-41411 > URL: https://issues.apache.org/jira/browse/SPARK-41411 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 3.3.2, 3.4.0 >Reporter: Wei Liu >Priority: Major > > A typo in passing event time watermark to`StreamingSymmetricHashJoinExec` > causes logic errrors. With the bug, the query would work with no error > reported but producing incorrect results. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] (SPARK-41411) Multi-Stateful Operator watermark support bug fix
[ https://issues.apache.org/jira/browse/SPARK-41411 ] Wei Liu deleted comment on SPARK-41411: - was (Author: JIRAUSER295948): PR: https://github.com/apache/spark/pull/38945 > Multi-Stateful Operator watermark support bug fix > - > > Key: SPARK-41411 > URL: https://issues.apache.org/jira/browse/SPARK-41411 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 3.3.2, 3.4.0 >Reporter: Wei Liu >Priority: Major > > A typo in passing event time watermark to`StreamingSymmetricHashJoinExec` > causes logic errrors. With the bug, the query would work with no error > reported but producing incorrect results. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41411) Multi-Stateful Operator watermark support bug fix
[ https://issues.apache.org/jira/browse/SPARK-41411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Liu updated SPARK-41411: Affects Version/s: (was: 3.3.2) > Multi-Stateful Operator watermark support bug fix > - > > Key: SPARK-41411 > URL: https://issues.apache.org/jira/browse/SPARK-41411 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 3.4.0 >Reporter: Wei Liu >Priority: Major > > A typo in passing event time watermark to`StreamingSymmetricHashJoinExec` > causes logic errrors. With the bug, the query would work with no error > reported but producing incorrect results. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41412) Implement `Cast`
Rui Wang created SPARK-41412: Summary: Implement `Cast` Key: SPARK-41412 URL: https://issues.apache.org/jira/browse/SPARK-41412 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Rui Wang Assignee: Rui Wang -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41413) Storage-Partitioned Join should avoid shuffle when partition keys mismatch, but join expressions are compatible
Chao Sun created SPARK-41413: Summary: Storage-Partitioned Join should avoid shuffle when partition keys mismatch, but join expressions are compatible Key: SPARK-41413 URL: https://issues.apache.org/jira/browse/SPARK-41413 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.3.1 Reporter: Chao Sun Currently when checking whether two sides of a Storage Partitioned Join are compatible, we requires both the partition expressions as well as the partition keys are compatible. However, this condition could be relaxed so that we only require the former. In the case that the latter is not compatible, we can calculate a common superset of keys and push down the information to both sides of the join, and use empty partitions for the missing keys. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38918) Nested column pruning should filter out attributes that do not belong to the current relation
[ https://issues.apache.org/jira/browse/SPARK-38918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17644041#comment-17644041 ] Wing Yew Poon commented on SPARK-38918: --- It seems that this is fixed in 3.2.2 ([7c0b9e6e|https://github.com/apache/spark/commit/7c0b9e6e6f680db45c1e2602b85753d9b521bb58]), but for some reason, 3.2.2 is not in the Fixed Version/s. Can we please correct this? Probably because of this, this issue does not appear in https://spark.apache.org/releases/spark-release-3-2-2.html. > Nested column pruning should filter out attributes that do not belong to the > current relation > - > > Key: SPARK-38918 > URL: https://issues.apache.org/jira/browse/SPARK-38918 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > Fix For: 3.1.3, 3.0.4, 3.3.0, 3.4.0 > > > `SchemaPruning` currently does not check if the root field of a nested column > belongs to the current relation. This can happen when the filter contains > correlated subqueries, where the children field can contain attributes from > both the inner and the outer query. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41414) Implement data/timestamp functions
Xinrong Meng created SPARK-41414: Summary: Implement data/timestamp functions Key: SPARK-41414 URL: https://issues.apache.org/jira/browse/SPARK-41414 Project: Spark Issue Type: Sub-task Components: Connect, PySpark Affects Versions: 3.4.0 Reporter: Xinrong Meng Implement data/timestamp functions -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41410) Support PVC-oriented executor pod allocation
[ https://issues.apache.org/jira/browse/SPARK-41410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-41410. --- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 38943 [https://github.com/apache/spark/pull/38943] > Support PVC-oriented executor pod allocation > > > Key: SPARK-41410 > URL: https://issues.apache.org/jira/browse/SPARK-41410 > Project: Spark > Issue Type: New Feature > Components: Kubernetes >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41410) Support PVC-oriented executor pod allocation
[ https://issues.apache.org/jira/browse/SPARK-41410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-41410: - Assignee: Dongjoon Hyun > Support PVC-oriented executor pod allocation > > > Key: SPARK-41410 > URL: https://issues.apache.org/jira/browse/SPARK-41410 > Project: Spark > Issue Type: New Feature > Components: Kubernetes >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41414) Implement date/timestamp functions
[ https://issues.apache.org/jira/browse/SPARK-41414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-41414: - Summary: Implement date/timestamp functions (was: Implement data/timestamp functions) > Implement date/timestamp functions > -- > > Key: SPARK-41414 > URL: https://issues.apache.org/jira/browse/SPARK-41414 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Priority: Major > > Implement data/timestamp functions -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41414) Implement date/timestamp functions
[ https://issues.apache.org/jira/browse/SPARK-41414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41414: Assignee: Apache Spark > Implement date/timestamp functions > -- > > Key: SPARK-41414 > URL: https://issues.apache.org/jira/browse/SPARK-41414 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Assignee: Apache Spark >Priority: Major > > Implement data/timestamp functions -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41414) Implement date/timestamp functions
[ https://issues.apache.org/jira/browse/SPARK-41414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41414: Assignee: (was: Apache Spark) > Implement date/timestamp functions > -- > > Key: SPARK-41414 > URL: https://issues.apache.org/jira/browse/SPARK-41414 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Priority: Major > > Implement data/timestamp functions -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41414) Implement date/timestamp functions
[ https://issues.apache.org/jira/browse/SPARK-41414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17644049#comment-17644049 ] Apache Spark commented on SPARK-41414: -- User 'xinrong-meng' has created a pull request for this issue: https://github.com/apache/spark/pull/38946 > Implement date/timestamp functions > -- > > Key: SPARK-41414 > URL: https://issues.apache.org/jira/browse/SPARK-41414 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Priority: Major > > Implement data/timestamp functions -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41344) Reading V2 datasource masks underlying error
[ https://issues.apache.org/jira/browse/SPARK-41344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17644061#comment-17644061 ] Pablo Langa Blanco commented on SPARK-41344: In this case the provider has been detected as DataSourceV2 and also implements SupportsCatalogOptions, so if it fails at that point, it does not make sense to try it as DataSource V1. The CatalogV2Util.loadTable function catches NoSuchTableException, NoSuchDatabaseException and NoSuchNamespaceException to return an option, which makes sense in other places where it is used, but not at this point. Maybe the best solution is to have another function that does not catch those exceptions to use in this case and does not return an option. > Reading V2 datasource masks underlying error > > > Key: SPARK-41344 > URL: https://issues.apache.org/jira/browse/SPARK-41344 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0, 3.3.1, 3.4.0 >Reporter: Kevin Cheung >Priority: Critical > Attachments: image-2022-12-03-09-24-43-285.png > > > In Spark 3.3, > # DataSourceV2Utils, the loadV2Source calls: > {*}(CatalogV2Util.loadTable(catalog, ident, timeTravel).get{*}, > Some(catalog), Some(ident)). > # CatalogV2Util.scala, when it tries to *loadTable(x,x,x)* and it fails with > any of these exceptions NoSuchTableException, NoSuchDatabaseException, > NoSuchNamespaceException, it would return None > # Coming back to DataSourceV2Utils, None was previously returned and calling > None.get results in a cryptic error technically "correct", but the *original > exceptions NoSuchTableException, NoSuchDatabaseException, > NoSuchNamespaceException are thrown away.* > > *Ask:* > Retain the original error and propagate this to the user. Prior to Spark 3.3, > the *original error* was shown and this seems like a design flaw. > > *Sample user facing error:* > None.get > java.util.NoSuchElementException: None.get > at scala.None$.get(Option.scala:529) > at scala.None$.get(Option.scala:527) > at > org.apache.spark.sql.execution.datasources.v2.DataSourceV2Utils$.loadV2Source(DataSourceV2Utils.scala:129) > at > org.apache.spark.sql.DataFrameReader.$anonfun$load$1(DataFrameReader.scala:209) > at scala.Option.flatMap(Option.scala:271) > at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:207) > at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:171) > > *DataSourceV2Utils.scala - CatalogV2Util.loadTable(x,x,x).get* > [https://github.com/apache/spark/blob/7fd654c0142ab9e4002882da4e65d3b25bebd26c/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Utils.scala#L137] > *CatalogV2Util.scala - Option(catalog.asTableCatalog.loadTable(ident))* > {*}{{*}}[https://github.com/apache/spark/blob/7fd654c0142ab9e4002882da4e65d3b25bebd26c/sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogV2Util.scala#L341] > *CatalogV2Util.scala - catching the exceptions and return None* > [https://github.com/apache/spark/blob/7fd654c0142ab9e4002882da4e65d3b25bebd26c/sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogV2Util.scala#L344] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41411) Multi-Stateful Operator watermark support bug fix
[ https://issues.apache.org/jira/browse/SPARK-41411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim reassigned SPARK-41411: Assignee: Wei Liu > Multi-Stateful Operator watermark support bug fix > - > > Key: SPARK-41411 > URL: https://issues.apache.org/jira/browse/SPARK-41411 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 3.4.0 >Reporter: Wei Liu >Assignee: Wei Liu >Priority: Major > > A typo in passing event time watermark to`StreamingSymmetricHashJoinExec` > causes logic errrors. With the bug, the query would work with no error > reported but producing incorrect results. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41411) Multi-Stateful Operator watermark support bug fix
[ https://issues.apache.org/jira/browse/SPARK-41411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim resolved SPARK-41411. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 38945 [https://github.com/apache/spark/pull/38945] > Multi-Stateful Operator watermark support bug fix > - > > Key: SPARK-41411 > URL: https://issues.apache.org/jira/browse/SPARK-41411 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 3.4.0 >Reporter: Wei Liu >Assignee: Wei Liu >Priority: Major > Fix For: 3.4.0 > > > A typo in passing event time watermark to`StreamingSymmetricHashJoinExec` > causes logic errrors. With the bug, the query would work with no error > reported but producing incorrect results. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41231) Built-in SQL Function Improvement
[ https://issues.apache.org/jira/browse/SPARK-41231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41231: Assignee: (was: Apache Spark) > Built-in SQL Function Improvement > - > > Key: SPARK-41231 > URL: https://issues.apache.org/jira/browse/SPARK-41231 > Project: Spark > Issue Type: New Feature > Components: PySpark, SQL >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41231) Built-in SQL Function Improvement
[ https://issues.apache.org/jira/browse/SPARK-41231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17644066#comment-17644066 ] Apache Spark commented on SPARK-41231: -- User 'navinvishy' has created a pull request for this issue: https://github.com/apache/spark/pull/38947 > Built-in SQL Function Improvement > - > > Key: SPARK-41231 > URL: https://issues.apache.org/jira/browse/SPARK-41231 > Project: Spark > Issue Type: New Feature > Components: PySpark, SQL >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41231) Built-in SQL Function Improvement
[ https://issues.apache.org/jira/browse/SPARK-41231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41231: Assignee: Apache Spark > Built-in SQL Function Improvement > - > > Key: SPARK-41231 > URL: https://issues.apache.org/jira/browse/SPARK-41231 > Project: Spark > Issue Type: New Feature > Components: PySpark, SQL >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41233) High-order function: array_prepend
[ https://issues.apache.org/jira/browse/SPARK-41233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17644070#comment-17644070 ] Navin Viswanath commented on SPARK-41233: - PR : [https://github.com/apache/spark/pull/38947] > High-order function: array_prepend > -- > > Key: SPARK-41233 > URL: https://issues.apache.org/jira/browse/SPARK-41233 > Project: Spark > Issue Type: Sub-task > Components: PySpark, SQL >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Major > > refer to > https://docs.snowflake.com/en/developer-guide/snowpark/reference/python/api/snowflake.snowpark.functions.array_prepend.html -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41410) Support PVC-oriented executor pod allocation
[ https://issues.apache.org/jira/browse/SPARK-41410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17644080#comment-17644080 ] Apache Spark commented on SPARK-41410: -- User 'tedyu' has created a pull request for this issue: https://github.com/apache/spark/pull/38948 > Support PVC-oriented executor pod allocation > > > Key: SPARK-41410 > URL: https://issues.apache.org/jira/browse/SPARK-41410 > Project: Spark > Issue Type: New Feature > Components: Kubernetes >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41410) Support PVC-oriented executor pod allocation
[ https://issues.apache.org/jira/browse/SPARK-41410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17644079#comment-17644079 ] Apache Spark commented on SPARK-41410: -- User 'tedyu' has created a pull request for this issue: https://github.com/apache/spark/pull/38948 > Support PVC-oriented executor pod allocation > > > Key: SPARK-41410 > URL: https://issues.apache.org/jira/browse/SPARK-41410 > Project: Spark > Issue Type: New Feature > Components: Kubernetes >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41415) SASL Request Retry
Aravind Patnam created SPARK-41415: -- Summary: SASL Request Retry Key: SPARK-41415 URL: https://issues.apache.org/jira/browse/SPARK-41415 Project: Spark Issue Type: Task Components: Shuffle Affects Versions: 3.2.4 Reporter: Aravind Patnam -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41415) SASL Request Retries
[ https://issues.apache.org/jira/browse/SPARK-41415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aravind Patnam updated SPARK-41415: --- Summary: SASL Request Retries (was: SASL Request Retry) > SASL Request Retries > > > Key: SPARK-41415 > URL: https://issues.apache.org/jira/browse/SPARK-41415 > Project: Spark > Issue Type: Task > Components: Shuffle >Affects Versions: 3.2.4 >Reporter: Aravind Patnam >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41410) Support PVC-oriented executor pod allocation
[ https://issues.apache.org/jira/browse/SPARK-41410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17644082#comment-17644082 ] Apache Spark commented on SPARK-41410: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/38949 > Support PVC-oriented executor pod allocation > > > Key: SPARK-41410 > URL: https://issues.apache.org/jira/browse/SPARK-41410 > Project: Spark > Issue Type: New Feature > Components: Kubernetes >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41410) Support PVC-oriented executor pod allocation
[ https://issues.apache.org/jira/browse/SPARK-41410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17644083#comment-17644083 ] Apache Spark commented on SPARK-41410: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/38949 > Support PVC-oriented executor pod allocation > > > Key: SPARK-41410 > URL: https://issues.apache.org/jira/browse/SPARK-41410 > Project: Spark > Issue Type: New Feature > Components: Kubernetes >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41344) Reading V2 datasource masks underlying error
[ https://issues.apache.org/jira/browse/SPARK-41344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17644087#comment-17644087 ] Zhen Wang commented on SPARK-41344: --- [~planga82] Thanks for your reply, I have submitted a PR [https://github.com/apache/spark/pull/38871], can you help me review it? > Maybe the best solution is to have another function that does not catch those > exceptions to use in this case and does not return an option. Does this mean we need to add a new method in CatalogV2Util? > Reading V2 datasource masks underlying error > > > Key: SPARK-41344 > URL: https://issues.apache.org/jira/browse/SPARK-41344 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0, 3.3.1, 3.4.0 >Reporter: Kevin Cheung >Priority: Critical > Attachments: image-2022-12-03-09-24-43-285.png > > > In Spark 3.3, > # DataSourceV2Utils, the loadV2Source calls: > {*}(CatalogV2Util.loadTable(catalog, ident, timeTravel).get{*}, > Some(catalog), Some(ident)). > # CatalogV2Util.scala, when it tries to *loadTable(x,x,x)* and it fails with > any of these exceptions NoSuchTableException, NoSuchDatabaseException, > NoSuchNamespaceException, it would return None > # Coming back to DataSourceV2Utils, None was previously returned and calling > None.get results in a cryptic error technically "correct", but the *original > exceptions NoSuchTableException, NoSuchDatabaseException, > NoSuchNamespaceException are thrown away.* > > *Ask:* > Retain the original error and propagate this to the user. Prior to Spark 3.3, > the *original error* was shown and this seems like a design flaw. > > *Sample user facing error:* > None.get > java.util.NoSuchElementException: None.get > at scala.None$.get(Option.scala:529) > at scala.None$.get(Option.scala:527) > at > org.apache.spark.sql.execution.datasources.v2.DataSourceV2Utils$.loadV2Source(DataSourceV2Utils.scala:129) > at > org.apache.spark.sql.DataFrameReader.$anonfun$load$1(DataFrameReader.scala:209) > at scala.Option.flatMap(Option.scala:271) > at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:207) > at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:171) > > *DataSourceV2Utils.scala - CatalogV2Util.loadTable(x,x,x).get* > [https://github.com/apache/spark/blob/7fd654c0142ab9e4002882da4e65d3b25bebd26c/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Utils.scala#L137] > *CatalogV2Util.scala - Option(catalog.asTableCatalog.loadTable(ident))* > {*}{{*}}[https://github.com/apache/spark/blob/7fd654c0142ab9e4002882da4e65d3b25bebd26c/sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogV2Util.scala#L341] > *CatalogV2Util.scala - catching the exceptions and return None* > [https://github.com/apache/spark/blob/7fd654c0142ab9e4002882da4e65d3b25bebd26c/sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogV2Util.scala#L344] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41413) Storage-Partitioned Join should avoid shuffle when partition keys mismatch, but join expressions are compatible
[ https://issues.apache.org/jira/browse/SPARK-41413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17644088#comment-17644088 ] Apache Spark commented on SPARK-41413: -- User 'sunchao' has created a pull request for this issue: https://github.com/apache/spark/pull/38950 > Storage-Partitioned Join should avoid shuffle when partition keys mismatch, > but join expressions are compatible > --- > > Key: SPARK-41413 > URL: https://issues.apache.org/jira/browse/SPARK-41413 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.1 >Reporter: Chao Sun >Priority: Major > > Currently when checking whether two sides of a Storage Partitioned Join are > compatible, we requires both the partition expressions as well as the > partition keys are compatible. However, this condition could be relaxed so > that we only require the former. In the case that the latter is not > compatible, we can calculate a common superset of keys and push down the > information to both sides of the join, and use empty partitions for the > missing keys. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41413) Storage-Partitioned Join should avoid shuffle when partition keys mismatch, but join expressions are compatible
[ https://issues.apache.org/jira/browse/SPARK-41413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41413: Assignee: Apache Spark > Storage-Partitioned Join should avoid shuffle when partition keys mismatch, > but join expressions are compatible > --- > > Key: SPARK-41413 > URL: https://issues.apache.org/jira/browse/SPARK-41413 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.1 >Reporter: Chao Sun >Assignee: Apache Spark >Priority: Major > > Currently when checking whether two sides of a Storage Partitioned Join are > compatible, we requires both the partition expressions as well as the > partition keys are compatible. However, this condition could be relaxed so > that we only require the former. In the case that the latter is not > compatible, we can calculate a common superset of keys and push down the > information to both sides of the join, and use empty partitions for the > missing keys. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41413) Storage-Partitioned Join should avoid shuffle when partition keys mismatch, but join expressions are compatible
[ https://issues.apache.org/jira/browse/SPARK-41413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41413: Assignee: (was: Apache Spark) > Storage-Partitioned Join should avoid shuffle when partition keys mismatch, > but join expressions are compatible > --- > > Key: SPARK-41413 > URL: https://issues.apache.org/jira/browse/SPARK-41413 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.1 >Reporter: Chao Sun >Priority: Major > > Currently when checking whether two sides of a Storage Partitioned Join are > compatible, we requires both the partition expressions as well as the > partition keys are compatible. However, this condition could be relaxed so > that we only require the former. In the case that the latter is not > compatible, we can calculate a common superset of keys and push down the > information to both sides of the join, and use empty partitions for the > missing keys. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41366) DF.groupby.agg() API should be compatible
[ https://issues.apache.org/jira/browse/SPARK-41366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Herman van Hövell resolved SPARK-41366. --- Fix Version/s: 3.4.0 Resolution: Fixed > DF.groupby.agg() API should be compatible > - > > Key: SPARK-41366 > URL: https://issues.apache.org/jira/browse/SPARK-41366 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Martin Grund >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41416) Rewrite self join in in predicate to aggregate
Wan Kun created SPARK-41416: --- Summary: Rewrite self join in in predicate to aggregate Key: SPARK-41416 URL: https://issues.apache.org/jira/browse/SPARK-41416 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.4.0 Reporter: Wan Kun Transforms the SelfJoin resulting in duplicate rows used for IN predicate to aggregation. For IN predicate, duplicate rows does not have any value. It will be overhead. Ex: TPCDS Q95: following CTE is used only in IN predicates for only one column comparison ({@code ws_order_number}). This results in exponential increase in Joined rows with too many duplicate rows. {code:java} WITH ws_wh AS ( SELECT ws1.ws_order_number, ws1.ws_warehouse_sk wh1, ws2.ws_warehouse_sk wh2 FROM web_sales ws1, web_sales ws2 WHERE ws1.ws_order_number = ws2.ws_order_number ANDws1.ws_warehouse_sk <> ws2.ws_warehouse_sk) {code} Could be optimized as below: {code:java} WITH ws_wh AS (SELECT ws_order_number FROM web_sales GROUP BY ws_order_number HAVING COUNT(DISTINCT ws_warehouse_sk) > 1) {code} Optimized CTE scans table only once and results in unique rows. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41416) Rewrite self join in in predicate to aggregate
[ https://issues.apache.org/jira/browse/SPARK-41416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17644091#comment-17644091 ] Apache Spark commented on SPARK-41416: -- User 'wankunde' has created a pull request for this issue: https://github.com/apache/spark/pull/38951 > Rewrite self join in in predicate to aggregate > -- > > Key: SPARK-41416 > URL: https://issues.apache.org/jira/browse/SPARK-41416 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Wan Kun >Priority: Major > > Transforms the SelfJoin resulting in duplicate rows used for IN predicate to > aggregation. > For IN predicate, duplicate rows does not have any value. It will be overhead. > Ex: TPCDS Q95: following CTE is used only in IN predicates for only one > column comparison ({@code ws_order_number}). > This results in exponential increase in Joined rows with too many duplicate > rows. > {code:java} > WITH ws_wh AS > ( >SELECT ws1.ws_order_number, > ws1.ws_warehouse_sk wh1, > ws2.ws_warehouse_sk wh2 >FROM web_sales ws1, > web_sales ws2 >WHERE ws1.ws_order_number = ws2.ws_order_number >ANDws1.ws_warehouse_sk <> ws2.ws_warehouse_sk) > {code} > Could be optimized as below: > {code:java} > WITH ws_wh AS > (SELECT ws_order_number > FROM web_sales > GROUP BY ws_order_number > HAVING COUNT(DISTINCT ws_warehouse_sk) > 1) > {code} > Optimized CTE scans table only once and results in unique rows. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41416) Rewrite self join in in predicate to aggregate
[ https://issues.apache.org/jira/browse/SPARK-41416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41416: Assignee: Apache Spark > Rewrite self join in in predicate to aggregate > -- > > Key: SPARK-41416 > URL: https://issues.apache.org/jira/browse/SPARK-41416 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Wan Kun >Assignee: Apache Spark >Priority: Major > > Transforms the SelfJoin resulting in duplicate rows used for IN predicate to > aggregation. > For IN predicate, duplicate rows does not have any value. It will be overhead. > Ex: TPCDS Q95: following CTE is used only in IN predicates for only one > column comparison ({@code ws_order_number}). > This results in exponential increase in Joined rows with too many duplicate > rows. > {code:java} > WITH ws_wh AS > ( >SELECT ws1.ws_order_number, > ws1.ws_warehouse_sk wh1, > ws2.ws_warehouse_sk wh2 >FROM web_sales ws1, > web_sales ws2 >WHERE ws1.ws_order_number = ws2.ws_order_number >ANDws1.ws_warehouse_sk <> ws2.ws_warehouse_sk) > {code} > Could be optimized as below: > {code:java} > WITH ws_wh AS > (SELECT ws_order_number > FROM web_sales > GROUP BY ws_order_number > HAVING COUNT(DISTINCT ws_warehouse_sk) > 1) > {code} > Optimized CTE scans table only once and results in unique rows. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41416) Rewrite self join in in predicate to aggregate
[ https://issues.apache.org/jira/browse/SPARK-41416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41416: Assignee: (was: Apache Spark) > Rewrite self join in in predicate to aggregate > -- > > Key: SPARK-41416 > URL: https://issues.apache.org/jira/browse/SPARK-41416 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Wan Kun >Priority: Major > > Transforms the SelfJoin resulting in duplicate rows used for IN predicate to > aggregation. > For IN predicate, duplicate rows does not have any value. It will be overhead. > Ex: TPCDS Q95: following CTE is used only in IN predicates for only one > column comparison ({@code ws_order_number}). > This results in exponential increase in Joined rows with too many duplicate > rows. > {code:java} > WITH ws_wh AS > ( >SELECT ws1.ws_order_number, > ws1.ws_warehouse_sk wh1, > ws2.ws_warehouse_sk wh2 >FROM web_sales ws1, > web_sales ws2 >WHERE ws1.ws_order_number = ws2.ws_order_number >ANDws1.ws_warehouse_sk <> ws2.ws_warehouse_sk) > {code} > Could be optimized as below: > {code:java} > WITH ws_wh AS > (SELECT ws_order_number > FROM web_sales > GROUP BY ws_order_number > HAVING COUNT(DISTINCT ws_warehouse_sk) > 1) > {code} > Optimized CTE scans table only once and results in unique rows. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39865) Show proper error messages on the overflow errors of table insert
[ https://issues.apache.org/jira/browse/SPARK-39865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17644092#comment-17644092 ] Apache Spark commented on SPARK-39865: -- User 'gengliangwang' has created a pull request for this issue: https://github.com/apache/spark/pull/38952 > Show proper error messages on the overflow errors of table insert > - > > Key: SPARK-39865 > URL: https://issues.apache.org/jira/browse/SPARK-39865 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0, 3.4.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > Fix For: 3.3.1 > > > In Spark 3.3, the error message of ANSI CAST is improved. However, the table > insertion is using the same CAST expression: > {code:java} > > create table tiny(i tinyint); > > insert into tiny values (1000); > org.apache.spark.SparkArithmeticException[CAST_OVERFLOW]: The value 1000 of > the type "INT" cannot be cast to "TINYINT" due to an overflow. Use `try_cast` > to tolerate overflow and return NULL instead. If necessary set > "spark.sql.ansi.enabled" to "false" to bypass this error. > {code} > > Showing the hint of `If necessary set "spark.sql.ansi.enabled" to "false" to > bypass this error` doesn't help at all. This PR is to fix the error message. > After changes, the error message of this example will become: > {code:java} > org.apache.spark.SparkArithmeticException: [CAST_OVERFLOW_IN_TABLE_INSERT] > Fail to insert a value of "INT" type into the "TINYINT" type column `i` due > to an overflow. Use `try_cast` on the input value to tolerate overflow and > return NULL instead.{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41369) Refactor connect directory structure
[ https://issues.apache.org/jira/browse/SPARK-41369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17644094#comment-17644094 ] Apache Spark commented on SPARK-41369: -- User 'hvanhovell' has created a pull request for this issue: https://github.com/apache/spark/pull/38953 > Refactor connect directory structure > > > Key: SPARK-41369 > URL: https://issues.apache.org/jira/browse/SPARK-41369 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.3.2, 3.4.0 >Reporter: Venkata Sai Akhil Gudesa >Priority: Major > > Currently, `spark/connector/connect/` is a single module that contains both > the "server"/service as well as the protobuf definitions. > However, this module can be split into multiple modules - "server" and > "common". This brings the advantage of separating out the protobuf generation > from the core "server" module for efficient reuse. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41417) Assign a name to the error class _LEGACY_ERROR_TEMP_0019
Yang Jie created SPARK-41417: Summary: Assign a name to the error class _LEGACY_ERROR_TEMP_0019 Key: SPARK-41417 URL: https://issues.apache.org/jira/browse/SPARK-41417 Project: Spark Issue Type: Sub-task Components: Spark Core, SQL Affects Versions: 3.4.0 Reporter: Yang Jie -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41369) Refactor connect directory structure
[ https://issues.apache.org/jira/browse/SPARK-41369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-41369. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 38953 [https://github.com/apache/spark/pull/38953] > Refactor connect directory structure > > > Key: SPARK-41369 > URL: https://issues.apache.org/jira/browse/SPARK-41369 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.3.2, 3.4.0 >Reporter: Venkata Sai Akhil Gudesa >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.4.0 > > > Currently, `spark/connector/connect/` is a single module that contains both > the "server"/service as well as the protobuf definitions. > However, this module can be split into multiple modules - "server" and > "common". This brings the advantage of separating out the protobuf generation > from the core "server" module for efficient reuse. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41369) Refactor connect directory structure
[ https://issues.apache.org/jira/browse/SPARK-41369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-41369: Assignee: Hyukjin Kwon > Refactor connect directory structure > > > Key: SPARK-41369 > URL: https://issues.apache.org/jira/browse/SPARK-41369 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.3.2, 3.4.0 >Reporter: Venkata Sai Akhil Gudesa >Assignee: Hyukjin Kwon >Priority: Major > > Currently, `spark/connector/connect/` is a single module that contains both > the "server"/service as well as the protobuf definitions. > However, this module can be split into multiple modules - "server" and > "common". This brings the advantage of separating out the protobuf generation > from the core "server" module for efficient reuse. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41417) Assign a name to the error class _LEGACY_ERROR_TEMP_0019
[ https://issues.apache.org/jira/browse/SPARK-41417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41417: Assignee: (was: Apache Spark) > Assign a name to the error class _LEGACY_ERROR_TEMP_0019 > > > Key: SPARK-41417 > URL: https://issues.apache.org/jira/browse/SPARK-41417 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org