[jira] [Commented] (SPARK-39280) Speed up Timestamp type inference with user-provided format in JSON/CSV data source
[ https://issues.apache.org/jira/browse/SPARK-39280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17722078#comment-17722078 ] ASF GitHub Bot commented on SPARK-39280: User 'Hisoka-X' has created a pull request for this issue: https://github.com/apache/spark/pull/41078 > Speed up Timestamp type inference with user-provided format in JSON/CSV data > source > --- > > Key: SPARK-39280 > URL: https://issues.apache.org/jira/browse/SPARK-39280 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Gengliang Wang >Priority: Minor > > The optimization of {{DefaultTimestampFormatter}} has been implemented in > [#36562|https://github.com/apache/spark/pull/36562] , this ticket adds the > optimization of user-provided format. The basic logic is to prevent the > formatter from throwing exceptions, and then use catch to determine whether > the parsing is successful. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39280) Speed up Timestamp type inference with user-provided format in JSON/CSV data source
[ https://issues.apache.org/jira/browse/SPARK-39280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17722079#comment-17722079 ] ASF GitHub Bot commented on SPARK-39280: User 'Hisoka-X' has created a pull request for this issue: https://github.com/apache/spark/pull/41078 > Speed up Timestamp type inference with user-provided format in JSON/CSV data > source > --- > > Key: SPARK-39280 > URL: https://issues.apache.org/jira/browse/SPARK-39280 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Gengliang Wang >Priority: Minor > > The optimization of {{DefaultTimestampFormatter}} has been implemented in > [#36562|https://github.com/apache/spark/pull/36562] , this ticket adds the > optimization of user-provided format. The basic logic is to prevent the > formatter from throwing exceptions, and then use catch to determine whether > the parsing is successful. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40887) Allow Spark on K8s to integrate w/ Log Service
[ https://issues.apache.org/jira/browse/SPARK-40887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17722105#comment-17722105 ] Ignite TC Bot commented on SPARK-40887: --- User 'turboFei' has created a pull request for this issue: https://github.com/apache/spark/pull/41139 > Allow Spark on K8s to integrate w/ Log Service > -- > > Key: SPARK-40887 > URL: https://issues.apache.org/jira/browse/SPARK-40887 > Project: Spark > Issue Type: New Feature > Components: Kubernetes >Affects Versions: 3.4.0 >Reporter: Cheng Pan >Assignee: Apache Spark >Priority: Major > > https://docs.google.com/document/d/1MfB39LD4B4Rp7MDRxZbMKMbdNSe6V6mBmMQ-gkCnM-0/edit?usp=sharing -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43487) Wrong error message used for `ambiguousRelationAliasNameInNestedCTEError`
[ https://issues.apache.org/jira/browse/SPARK-43487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Johan Lasperas updated SPARK-43487: --- Description: The batch of errors migrated to error classes as part of spark-40540 contains an error that got mixed up with the wrong error message: [ambiguousRelationAliasNameInNestedCTEError|https://github.com/apache/spark/commit/43a6b932759865c45ccf36f3e9cf6898c1b762da#diff-744ac13f6fe074fddeab09b407404bffa2386f54abc83c501e6e1fe618f6db56R1983] uses the same error message as the following commandUnsupportedInV2TableError: {code:java} WITH t AS (SELECT 1), t2 AS ( WITH t AS (SELECT 2) SELECT * FROM t) SELECT * FROM t2; AnalysisException: t is not supported for v2 tables {code} The error should be: {code:java} AnalysisException: Name tis ambiguous in nested CTE. Please set spark.sql.legacy.ctePrecedencePolicy to CORRECTED so that name defined in inner CTE takes precedence. If set it to LEGACY, outer CTE definitions will take precedence. See more details in SPARK-28228.{code} was: The batch of errors migrated to error classes as part of spark-40540 contains an error that got mixed up with the wrong error message: [ambiguousRelationAliasNameInNestedCTEError|https://github.com/apache/spark/commit/43a6b932759865c45ccf36f3e9cf6898c1b762da#diff-744ac13f6fe074fddeab09b407404bffa2386f54abc83c501e6e1fe618f6db56R1983] uses the same error message as the following commandUnsupportedInV2TableError: ``` WITH t AS (SELECT 1), t2 AS ( WITH t AS (SELECT 2) SELECT * FROM t) SELECT * FROM t2; AnalysisException: t is not supported for v2 tables ``` The error should be: ``` AnalysisException: Name tis ambiguous in nested CTE. Please set spark.sql.legacy.ctePrecedencePolicy to CORRECTED so that name defined in inner CTE takes precedence. If set it to LEGACY, outer CTE definitions will take precedence. See more details in SPARK-28228. ``` > Wrong error message used for `ambiguousRelationAliasNameInNestedCTEError` > - > > Key: SPARK-43487 > URL: https://issues.apache.org/jira/browse/SPARK-43487 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Johan Lasperas >Priority: Minor > > The batch of errors migrated to error classes as part of spark-40540 contains > an error that got mixed up with the wrong error message: > [ambiguousRelationAliasNameInNestedCTEError|https://github.com/apache/spark/commit/43a6b932759865c45ccf36f3e9cf6898c1b762da#diff-744ac13f6fe074fddeab09b407404bffa2386f54abc83c501e6e1fe618f6db56R1983] > uses the same error message as the following > commandUnsupportedInV2TableError: > > {code:java} > WITH t AS (SELECT 1), t2 AS ( WITH t AS (SELECT 2) SELECT * FROM t) SELECT * > FROM t2; > AnalysisException: t is not supported for v2 tables > {code} > The error should be: > {code:java} > AnalysisException: Name tis ambiguous in nested CTE. > Please set spark.sql.legacy.ctePrecedencePolicy to CORRECTED so that name > defined in inner CTE takes precedence. If set it to LEGACY, outer CTE > definitions will take precedence. See more details in SPARK-28228.{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43487) Wrong error message used for `ambiguousRelationAliasNameInNestedCTEError`
Johan Lasperas created SPARK-43487: -- Summary: Wrong error message used for `ambiguousRelationAliasNameInNestedCTEError` Key: SPARK-43487 URL: https://issues.apache.org/jira/browse/SPARK-43487 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.4.0 Reporter: Johan Lasperas The batch of errors migrated to error classes as part of spark-40540 contains an error that got mixed up with the wrong error message: [ambiguousRelationAliasNameInNestedCTEError|https://github.com/apache/spark/commit/43a6b932759865c45ccf36f3e9cf6898c1b762da#diff-744ac13f6fe074fddeab09b407404bffa2386f54abc83c501e6e1fe618f6db56R1983] uses the same error message as the following commandUnsupportedInV2TableError: ``` WITH t AS (SELECT 1), t2 AS ( WITH t AS (SELECT 2) SELECT * FROM t) SELECT * FROM t2; AnalysisException: t is not supported for v2 tables ``` The error should be: ``` AnalysisException: Name tis ambiguous in nested CTE. Please set spark.sql.legacy.ctePrecedencePolicy to CORRECTED so that name defined in inner CTE takes precedence. If set it to LEGACY, outer CTE definitions will take precedence. See more details in SPARK-28228. ``` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40129) Decimal multiply can produce the wrong answer because it rounds twice
[ https://issues.apache.org/jira/browse/SPARK-40129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1774#comment-1774 ] Jia Fan commented on SPARK-40129: - https://github.com/apache/spark/pull/41156 > Decimal multiply can produce the wrong answer because it rounds twice > - > > Key: SPARK-40129 > URL: https://issues.apache.org/jira/browse/SPARK-40129 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0, 3.3.0, 3.4.0 >Reporter: Robert Joseph Evans >Priority: Major > > This looks like it has been around for a long time, but I have reproduced it > in 3.2.0+ > The example here is multiplying Decimal(38, 10) by another Decimal(38, 10), > but I think it can be reproduced with other number combinations, and possibly > with divide too. > {code:java} > Seq("9173594185998001607642838421.5479932913").toDF.selectExpr("CAST(value as > DECIMAL(38,10)) as a").selectExpr("a * CAST(-12 as > DECIMAL(38,10))").show(truncate=false) > {code} > This produces an answer in Spark of > {{-110083130231976019291714061058.575920}} But if I do the calculation in > regular java BigDecimal I get {{-110083130231976019291714061058.575919}} > {code:java} > BigDecimal l = new BigDecimal("9173594185998001607642838421.5479932913"); > BigDecimal r = new BigDecimal("-12.00"); > BigDecimal prod = l.multiply(r); > BigDecimal rounded_prod = prod.setScale(6, RoundingMode.HALF_UP); > {code} > Spark does essentially all of the same operations, but it used Decimal to do > it instead of java's BigDecimal directly. Spark, by way of Decimal, will set > a MathContext for the multiply operation that has a max precision of 38 and > will do half up rounding. That means that the result of the multiply > operation in Spark is {{{}-110083130231976019291714061058.57591950{}}}, but > for the java BigDecimal code the result is > {{{}-110083130231976019291714061058.575919495600{}}}. Then in > CheckOverflow for 3.2.0 and 3.3.0 or in just the regular Multiply expression > in 3.4.0 the setScale is called (as a part of Decimal.setPrecision). At that > point the already rounded number is rounded yet again resulting in what is > arguably a wrong answer by Spark. > I have not fully tested this, but it looks like we could just remove the > MathContext entirely in Decimal, or set it to UNLIMITED. All of the decimal > operations appear to have their own overflow and rounding anyways. If we want > to potentially reduce the total memory usage, we could also set the max > precision to 39 and truncate (round down) the result in the math context > instead. That would then let us round the result correctly in setPrecision > afterwards. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43484) Kafka/Kinesis Assembly should not package hadoop-client-runtime
[ https://issues.apache.org/jira/browse/SPARK-43484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun reassigned SPARK-43484: Assignee: Cheng Pan > Kafka/Kinesis Assembly should not package hadoop-client-runtime > --- > > Key: SPARK-43484 > URL: https://issues.apache.org/jira/browse/SPARK-43484 > Project: Spark > Issue Type: Bug > Components: Build, Structured Streaming >Affects Versions: 3.2.4, 3.3.2, 3.4.0, 3.5.0 >Reporter: Cheng Pan >Assignee: Cheng Pan >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43484) Kafka/Kinesis Assembly should not package hadoop-client-runtime
[ https://issues.apache.org/jira/browse/SPARK-43484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun resolved SPARK-43484. -- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41152 [https://github.com/apache/spark/pull/41152] > Kafka/Kinesis Assembly should not package hadoop-client-runtime > --- > > Key: SPARK-43484 > URL: https://issues.apache.org/jira/browse/SPARK-43484 > Project: Spark > Issue Type: Bug > Components: Build, Structured Streaming >Affects Versions: 3.2.4, 3.3.2, 3.4.0, 3.5.0 >Reporter: Cheng Pan >Assignee: Cheng Pan >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40129) Decimal multiply can produce the wrong answer because it rounds twice
[ https://issues.apache.org/jira/browse/SPARK-40129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1778#comment-1778 ] Hudson commented on SPARK-40129: User 'Hisoka-X' has created a pull request for this issue: https://github.com/apache/spark/pull/41156 > Decimal multiply can produce the wrong answer because it rounds twice > - > > Key: SPARK-40129 > URL: https://issues.apache.org/jira/browse/SPARK-40129 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0, 3.3.0, 3.4.0 >Reporter: Robert Joseph Evans >Priority: Major > > This looks like it has been around for a long time, but I have reproduced it > in 3.2.0+ > The example here is multiplying Decimal(38, 10) by another Decimal(38, 10), > but I think it can be reproduced with other number combinations, and possibly > with divide too. > {code:java} > Seq("9173594185998001607642838421.5479932913").toDF.selectExpr("CAST(value as > DECIMAL(38,10)) as a").selectExpr("a * CAST(-12 as > DECIMAL(38,10))").show(truncate=false) > {code} > This produces an answer in Spark of > {{-110083130231976019291714061058.575920}} But if I do the calculation in > regular java BigDecimal I get {{-110083130231976019291714061058.575919}} > {code:java} > BigDecimal l = new BigDecimal("9173594185998001607642838421.5479932913"); > BigDecimal r = new BigDecimal("-12.00"); > BigDecimal prod = l.multiply(r); > BigDecimal rounded_prod = prod.setScale(6, RoundingMode.HALF_UP); > {code} > Spark does essentially all of the same operations, but it used Decimal to do > it instead of java's BigDecimal directly. Spark, by way of Decimal, will set > a MathContext for the multiply operation that has a max precision of 38 and > will do half up rounding. That means that the result of the multiply > operation in Spark is {{{}-110083130231976019291714061058.57591950{}}}, but > for the java BigDecimal code the result is > {{{}-110083130231976019291714061058.575919495600{}}}. Then in > CheckOverflow for 3.2.0 and 3.3.0 or in just the regular Multiply expression > in 3.4.0 the setScale is called (as a part of Decimal.setPrecision). At that > point the already rounded number is rounded yet again resulting in what is > arguably a wrong answer by Spark. > I have not fully tested this, but it looks like we could just remove the > MathContext entirely in Decimal, or set it to UNLIMITED. All of the decimal > operations appear to have their own overflow and rounding anyways. If we want > to potentially reduce the total memory usage, we could also set the max > precision to 39 and truncate (round down) the result in the math context > instead. That would then let us round the result correctly in setPrecision > afterwards. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43484) Kafka/Kinesis Assembly should not package hadoop-client-runtime
[ https://issues.apache.org/jira/browse/SPARK-43484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1779#comment-1779 ] Hudson commented on SPARK-43484: User 'pan3793' has created a pull request for this issue: https://github.com/apache/spark/pull/41152 > Kafka/Kinesis Assembly should not package hadoop-client-runtime > --- > > Key: SPARK-43484 > URL: https://issues.apache.org/jira/browse/SPARK-43484 > Project: Spark > Issue Type: Bug > Components: Build, Structured Streaming >Affects Versions: 3.2.4, 3.3.2, 3.4.0, 3.5.0 >Reporter: Cheng Pan >Assignee: Cheng Pan >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43488) bitmap function
yiku123 created SPARK-43488: --- Summary: bitmap function Key: SPARK-43488 URL: https://issues.apache.org/jira/browse/SPARK-43488 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 3.4.0 Reporter: yiku123 Fix For: 3.4.1 maybe spark need to have some bitmap functions? example like bitmapBuild 、bitmapAnd、bitmapAndCardinality in clickhouse or other OLAP engine。 This is often used in user profiling applications but i don't find in spark h2. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43488) bitmap function
[ https://issues.apache.org/jira/browse/SPARK-43488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yiku123 updated SPARK-43488: Labels: patch (was: ) > bitmap function > --- > > Key: SPARK-43488 > URL: https://issues.apache.org/jira/browse/SPARK-43488 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.4.0 >Reporter: yiku123 >Priority: Major > Labels: patch > Fix For: 3.4.1 > > > maybe spark need to have some bitmap functions? example like bitmapBuild > 、bitmapAnd、bitmapAndCardinality in clickhouse or other OLAP engine。 > This is often used in user profiling applications but i don't find in spark > > > h2. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43489) Remove protobuf 2.5.0
Cheng Pan created SPARK-43489: - Summary: Remove protobuf 2.5.0 Key: SPARK-43489 URL: https://issues.apache.org/jira/browse/SPARK-43489 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 3.5.0 Reporter: Cheng Pan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40500) Use `pd.items` instead of `pd.iteritems`
[ https://issues.apache.org/jira/browse/SPARK-40500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17722262#comment-17722262 ] Jim Huang commented on SPARK-40500: --- Thank you all involved for fixing this issue! This issue exists in Spark 3.3.x and Spark 3.2.x. What is the thought process that this same fix can or will be back-ported to the older Spark versions? > Use `pd.items` instead of `pd.iteritems` > > > Key: SPARK-40500 > URL: https://issues.apache.org/jira/browse/SPARK-40500 > Project: Spark > Issue Type: Improvement > Components: Pandas API on Spark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Minor > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-40500) Use `pd.items` instead of `pd.iteritems`
[ https://issues.apache.org/jira/browse/SPARK-40500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17722262#comment-17722262 ] Jim Huang edited comment on SPARK-40500 at 5/12/23 6:58 PM: Thank you all involved for fixing this issue! This issue exists in Spark 3.3.x and Spark 3.2.x when paired up with a recent version of Pandas. What is the thought process that this same fix can or will be back-ported to the older Spark versions? was (Author: jimhuang): Thank you all involved for fixing this issue! This issue exists in Spark 3.3.x and Spark 3.2.x. What is the thought process that this same fix can or will be back-ported to the older Spark versions? > Use `pd.items` instead of `pd.iteritems` > > > Key: SPARK-40500 > URL: https://issues.apache.org/jira/browse/SPARK-40500 > Project: Spark > Issue Type: Improvement > Components: Pandas API on Spark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Minor > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-40500) Use `pd.items` instead of `pd.iteritems`
[ https://issues.apache.org/jira/browse/SPARK-40500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17722262#comment-17722262 ] Jim Huang edited comment on SPARK-40500 at 5/12/23 6:59 PM: Thank you all involved for fixing this issue! This issue exists in Spark 3.3.x and Spark 3.2.x when paired up with a recent version of Pandas where `pd.iteritems` have being deprecated. What is the thought process that this same fix can or will be back-ported to the older Spark versions? was (Author: jimhuang): Thank you all involved for fixing this issue! This issue exists in Spark 3.3.x and Spark 3.2.x when paired up with a recent version of Pandas. What is the thought process that this same fix can or will be back-ported to the older Spark versions? > Use `pd.items` instead of `pd.iteritems` > > > Key: SPARK-40500 > URL: https://issues.apache.org/jira/browse/SPARK-40500 > Project: Spark > Issue Type: Improvement > Components: Pandas API on Spark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Minor > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43272) Replace reflection w/ direct calling for `SparkHadoopUtil#createFile`
[ https://issues.apache.org/jira/browse/SPARK-43272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun resolved SPARK-43272. -- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 40945 [https://github.com/apache/spark/pull/40945] > Replace reflection w/ direct calling for `SparkHadoopUtil#createFile` > -- > > Key: SPARK-43272 > URL: https://issues.apache.org/jira/browse/SPARK-43272 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.5.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43272) Replace reflection w/ direct calling for `SparkHadoopUtil#createFile`
[ https://issues.apache.org/jira/browse/SPARK-43272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun reassigned SPARK-43272: Assignee: Yang Jie > Replace reflection w/ direct calling for `SparkHadoopUtil#createFile` > -- > > Key: SPARK-43272 > URL: https://issues.apache.org/jira/browse/SPARK-43272 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.5.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43486) number of files read is incorrect if it is bucket table
[ https://issues.apache.org/jira/browse/SPARK-43486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17722308#comment-17722308 ] BingKun Pan commented on SPARK-43486: - Can I do it? [~yumwang] > number of files read is incorrect if it is bucket table > --- > > Key: SPARK-43486 > URL: https://issues.apache.org/jira/browse/SPARK-43486 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Yuming Wang >Priority: Major > Attachments: screenshot-1.png > > > !screenshot-1.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43486) number of files read is incorrect if it is bucket table
[ https://issues.apache.org/jira/browse/SPARK-43486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17722313#comment-17722313 ] Yuming Wang commented on SPARK-43486: - [~panbingkun] Yes, please. > number of files read is incorrect if it is bucket table > --- > > Key: SPARK-43486 > URL: https://issues.apache.org/jira/browse/SPARK-43486 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Yuming Wang >Priority: Major > Attachments: screenshot-1.png > > > !screenshot-1.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43490) Upgrade sbt to 1.8.3
BingKun Pan created SPARK-43490: --- Summary: Upgrade sbt to 1.8.3 Key: SPARK-43490 URL: https://issues.apache.org/jira/browse/SPARK-43490 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 3.5.0 Reporter: BingKun Pan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43488) bitmap function
[ https://issues.apache.org/jira/browse/SPARK-43488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-43488: Fix Version/s: (was: 3.4.1) > bitmap function > --- > > Key: SPARK-43488 > URL: https://issues.apache.org/jira/browse/SPARK-43488 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.4.0 >Reporter: yiku123 >Priority: Major > Labels: patch > > maybe spark need to have some bitmap functions? example like bitmapBuild > 、bitmapAnd、bitmapAndCardinality in clickhouse or other OLAP engine。 > This is often used in user profiling applications but i don't find in spark > > > h2. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43487) Wrong error message used for `ambiguousRelationAliasNameInNestedCTEError`
[ https://issues.apache.org/jira/browse/SPARK-43487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17722316#comment-17722316 ] BingKun Pan commented on SPARK-43487: - I work on it. > Wrong error message used for `ambiguousRelationAliasNameInNestedCTEError` > - > > Key: SPARK-43487 > URL: https://issues.apache.org/jira/browse/SPARK-43487 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Johan Lasperas >Priority: Minor > > The batch of errors migrated to error classes as part of spark-40540 contains > an error that got mixed up with the wrong error message: > [ambiguousRelationAliasNameInNestedCTEError|https://github.com/apache/spark/commit/43a6b932759865c45ccf36f3e9cf6898c1b762da#diff-744ac13f6fe074fddeab09b407404bffa2386f54abc83c501e6e1fe618f6db56R1983] > uses the same error message as the following > commandUnsupportedInV2TableError: > > {code:java} > WITH t AS (SELECT 1), t2 AS ( WITH t AS (SELECT 2) SELECT * FROM t) SELECT * > FROM t2; > AnalysisException: t is not supported for v2 tables > {code} > The error should be: > {code:java} > AnalysisException: Name tis ambiguous in nested CTE. > Please set spark.sql.legacy.ctePrecedencePolicy to CORRECTED so that name > defined in inner CTE takes precedence. If set it to LEGACY, outer CTE > definitions will take precedence. See more details in SPARK-28228.{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43487) Wrong error message used for `ambiguousRelationAliasNameInNestedCTEError`
[ https://issues.apache.org/jira/browse/SPARK-43487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17722320#comment-17722320 ] Snoot.io commented on SPARK-43487: -- User 'panbingkun' has created a pull request for this issue: https://github.com/apache/spark/pull/41161 > Wrong error message used for `ambiguousRelationAliasNameInNestedCTEError` > - > > Key: SPARK-43487 > URL: https://issues.apache.org/jira/browse/SPARK-43487 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Johan Lasperas >Priority: Minor > > The batch of errors migrated to error classes as part of spark-40540 contains > an error that got mixed up with the wrong error message: > [ambiguousRelationAliasNameInNestedCTEError|https://github.com/apache/spark/commit/43a6b932759865c45ccf36f3e9cf6898c1b762da#diff-744ac13f6fe074fddeab09b407404bffa2386f54abc83c501e6e1fe618f6db56R1983] > uses the same error message as the following > commandUnsupportedInV2TableError: > > {code:java} > WITH t AS (SELECT 1), t2 AS ( WITH t AS (SELECT 2) SELECT * FROM t) SELECT * > FROM t2; > AnalysisException: t is not supported for v2 tables > {code} > The error should be: > {code:java} > AnalysisException: Name tis ambiguous in nested CTE. > Please set spark.sql.legacy.ctePrecedencePolicy to CORRECTED so that name > defined in inner CTE takes precedence. If set it to LEGACY, outer CTE > definitions will take precedence. See more details in SPARK-28228.{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43487) Wrong error message used for `ambiguousRelationAliasNameInNestedCTEError`
[ https://issues.apache.org/jira/browse/SPARK-43487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17722319#comment-17722319 ] Snoot.io commented on SPARK-43487: -- User 'panbingkun' has created a pull request for this issue: https://github.com/apache/spark/pull/41161 > Wrong error message used for `ambiguousRelationAliasNameInNestedCTEError` > - > > Key: SPARK-43487 > URL: https://issues.apache.org/jira/browse/SPARK-43487 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Johan Lasperas >Priority: Minor > > The batch of errors migrated to error classes as part of spark-40540 contains > an error that got mixed up with the wrong error message: > [ambiguousRelationAliasNameInNestedCTEError|https://github.com/apache/spark/commit/43a6b932759865c45ccf36f3e9cf6898c1b762da#diff-744ac13f6fe074fddeab09b407404bffa2386f54abc83c501e6e1fe618f6db56R1983] > uses the same error message as the following > commandUnsupportedInV2TableError: > > {code:java} > WITH t AS (SELECT 1), t2 AS ( WITH t AS (SELECT 2) SELECT * FROM t) SELECT * > FROM t2; > AnalysisException: t is not supported for v2 tables > {code} > The error should be: > {code:java} > AnalysisException: Name tis ambiguous in nested CTE. > Please set spark.sql.legacy.ctePrecedencePolicy to CORRECTED so that name > defined in inner CTE takes precedence. If set it to LEGACY, outer CTE > definitions will take precedence. See more details in SPARK-28228.{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43491) In expression not compatible with EqualTo Expression
KuijianLiu created SPARK-43491: -- Summary: In expression not compatible with EqualTo Expression Key: SPARK-43491 URL: https://issues.apache.org/jira/browse/SPARK-43491 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.3.1 Reporter: KuijianLiu The query results of Spark SQL 3.1.1 and Hive SQL 3.1.0 are inconsistent with same sql. Spark SQL calculates `{{{}0 in ('00')`{}}} as false, which act different from `{{{}=`{}}} keyword, but Hive calculates true. Hive is compatible with the `{{{}in`{}}} keyword in 3.1.0, but SparkSQL does not. It's better when dataTypes of elements in `{{{}In`{}}} expression are the same, it should behaviour as same as BinaryComparison like ` {{{}EqualTo`{}}}. Test SQL: {code:java} scala> spark.sql("select 1 as test where 0 = '00'").show ++ |test| ++ | 1| ++ scala> spark.sql("select 1 as test where 0 in ('00')").show ++ |test| ++ ++ {code} !image-2023-05-13-13-06-24-551.png! !image-2023-05-13-13-07-49-194.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43491) In expression not compatible with EqualTo Expression
[ https://issues.apache.org/jira/browse/SPARK-43491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] KuijianLiu updated SPARK-43491: --- Description: The query results of Spark SQL 3.1.1 and Hive SQL 3.1.0 are inconsistent with same sql. Spark SQL calculates `{{{}0 in ('00')`{}}} as false, which act different from `{{{}=`{}}} keyword, but Hive calculates true. Hive is compatible with the `{{{}in`{}}} keyword in 3.1.0, but SparkSQL does not. It's better when dataTypes of elements in `{{{}In`{}}} expression are the same, it should behaviour as same as BinaryComparison like ` {{{}EqualTo`{}}}. Test SQL: {code:java} scala> spark.sql("select 1 as test where 0 = '00'").show ++ |test| ++ | 1| ++ scala> spark.sql("select 1 as test where 0 in ('00')").show ++ |test| ++ ++ scala> spark.sql("select 1 as test where 0 = '00'").explain(true) == Parsed Logical Plan == 'Project [1 AS test#23] +- 'Filter (0 = 00) +- OneRowRelation== Analyzed Logical Plan == test: int Project [1 AS test#23] +- Filter (0 = cast(00 as int)) +- OneRowRelation== Optimized Logical Plan == Project [1 AS test#23] +- OneRowRelation== Physical Plan == *(1) Project [1 AS test#23] +- *(1) Scan OneRowRelation[] scala> spark.sql("select 1 as test where 0 in ('00')").explain(true) == Parsed Logical Plan == 'Project [1 AS test#25] +- 'Filter 0 IN (00) +- OneRowRelation== Analyzed Logical Plan == test: int Project [1 AS test#25] +- Filter cast(0 as string) IN (cast(00 as string)) +- OneRowRelation== Optimized Logical Plan == LocalRelation , [test#25]== Physical Plan == LocalTableScan , [test#25] {code} was: The query results of Spark SQL 3.1.1 and Hive SQL 3.1.0 are inconsistent with same sql. Spark SQL calculates `{{{}0 in ('00')`{}}} as false, which act different from `{{{}=`{}}} keyword, but Hive calculates true. Hive is compatible with the `{{{}in`{}}} keyword in 3.1.0, but SparkSQL does not. It's better when dataTypes of elements in `{{{}In`{}}} expression are the same, it should behaviour as same as BinaryComparison like ` {{{}EqualTo`{}}}. Test SQL: {code:java} scala> spark.sql("select 1 as test where 0 = '00'").show ++ |test| ++ | 1| ++ scala> spark.sql("select 1 as test where 0 in ('00')").show ++ |test| ++ ++ {code} !image-2023-05-13-13-06-24-551.png! !image-2023-05-13-13-07-49-194.png! > In expression not compatible with EqualTo Expression > > > Key: SPARK-43491 > URL: https://issues.apache.org/jira/browse/SPARK-43491 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.1 >Reporter: KuijianLiu >Priority: Minor > > The query results of Spark SQL 3.1.1 and Hive SQL 3.1.0 are inconsistent > with same sql. Spark SQL calculates `{{{}0 in ('00')`{}}} as false, which act > different from `{{{}=`{}}} keyword, but Hive calculates true. Hive is > compatible with the `{{{}in`{}}} keyword in 3.1.0, but SparkSQL does not. > It's better when dataTypes of elements in `{{{}In`{}}} expression are the > same, it should behaviour as same as BinaryComparison like ` {{{}EqualTo`{}}}. > Test SQL: > {code:java} > scala> spark.sql("select 1 as test where 0 = '00'").show > ++ > |test| > ++ > | 1| > ++ > scala> spark.sql("select 1 as test where 0 in ('00')").show > ++ > |test| > ++ > ++ > scala> spark.sql("select 1 as test where 0 = '00'").explain(true) > == Parsed Logical Plan == > 'Project [1 AS test#23] > +- 'Filter (0 = 00) > +- OneRowRelation== Analyzed Logical Plan == > test: int > Project [1 AS test#23] > +- Filter (0 = cast(00 as int)) > +- OneRowRelation== Optimized Logical Plan == > Project [1 AS test#23] > +- OneRowRelation== Physical Plan == > *(1) Project [1 AS test#23] > +- *(1) Scan OneRowRelation[] > scala> spark.sql("select 1 as test where 0 in ('00')").explain(true) > == Parsed Logical Plan == > 'Project [1 AS test#25] > +- 'Filter 0 IN (00) > +- OneRowRelation== Analyzed Logical Plan == > test: int > Project [1 AS test#25] > +- Filter cast(0 as string) IN (cast(00 as string)) > +- OneRowRelation== Optimized Logical Plan == > LocalRelation , [test#25]== Physical Plan == > LocalTableScan , [test#25] > {code} > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43491) In expression not compatible with EqualTo Expression
[ https://issues.apache.org/jira/browse/SPARK-43491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] KuijianLiu updated SPARK-43491: --- Description: The query results of Spark SQL 3.1.1 and Hive SQL 3.1.0 are inconsistent with same sql. Spark SQL calculates `{{{}0 in ('00')`{}}} as false, which act different from `{{{}=`{}}} keyword, but Hive calculates true. Hive is compatible with the `{{{}in`{}}} keyword in 3.1.0, but SparkSQL does not. It's better when dataTypes of elements in `{{{}In`{}}} expression are the same, it should behaviour as same as BinaryComparison like ` {{{}EqualTo`{}}}. Test SQL: {code:java} scala> spark.sql("select 1 as test where 0 = '00'").show ++ |test| ++ | 1| ++ scala> spark.sql("select 1 as test where 0 in ('00')").show ++ |test| ++ ++ scala> spark.sql("select 1 as test where 0 = '00'").explain(true) == Parsed Logical Plan == 'Project [1 AS test#23] +- 'Filter (0 = 00) +- OneRowRelation== Analyzed Logical Plan == test: int Project [1 AS test#23] +- Filter (0 = cast(00 as int)) +- OneRowRelation== Optimized Logical Plan == Project [1 AS test#23] +- OneRowRelation== Physical Plan == *(1) Project [1 AS test#23] +- *(1) Scan OneRowRelation[] scala> spark.sql("select 1 as test where 0 in ('00')").explain(true) == Parsed Logical Plan == 'Project [1 AS test#25] +- 'Filter 0 IN (00) +- OneRowRelation== Analyzed Logical Plan == test: int Project [1 AS test#25] +- Filter cast(0 as string) IN (cast(00 as string)) +- OneRowRelation== Optimized Logical Plan == LocalRelation , [test#25]== Physical Plan == LocalTableScan , [test#25] {code} !image-2023-05-13-13-14-55-853.png! was: The query results of Spark SQL 3.1.1 and Hive SQL 3.1.0 are inconsistent with same sql. Spark SQL calculates `{{{}0 in ('00')`{}}} as false, which act different from `{{{}=`{}}} keyword, but Hive calculates true. Hive is compatible with the `{{{}in`{}}} keyword in 3.1.0, but SparkSQL does not. It's better when dataTypes of elements in `{{{}In`{}}} expression are the same, it should behaviour as same as BinaryComparison like ` {{{}EqualTo`{}}}. Test SQL: {code:java} scala> spark.sql("select 1 as test where 0 = '00'").show ++ |test| ++ | 1| ++ scala> spark.sql("select 1 as test where 0 in ('00')").show ++ |test| ++ ++ scala> spark.sql("select 1 as test where 0 = '00'").explain(true) == Parsed Logical Plan == 'Project [1 AS test#23] +- 'Filter (0 = 00) +- OneRowRelation== Analyzed Logical Plan == test: int Project [1 AS test#23] +- Filter (0 = cast(00 as int)) +- OneRowRelation== Optimized Logical Plan == Project [1 AS test#23] +- OneRowRelation== Physical Plan == *(1) Project [1 AS test#23] +- *(1) Scan OneRowRelation[] scala> spark.sql("select 1 as test where 0 in ('00')").explain(true) == Parsed Logical Plan == 'Project [1 AS test#25] +- 'Filter 0 IN (00) +- OneRowRelation== Analyzed Logical Plan == test: int Project [1 AS test#25] +- Filter cast(0 as string) IN (cast(00 as string)) +- OneRowRelation== Optimized Logical Plan == LocalRelation , [test#25]== Physical Plan == LocalTableScan , [test#25] {code} > In expression not compatible with EqualTo Expression > > > Key: SPARK-43491 > URL: https://issues.apache.org/jira/browse/SPARK-43491 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.1 >Reporter: KuijianLiu >Priority: Minor > Attachments: image-2023-05-13-13-14-55-853.png > > > The query results of Spark SQL 3.1.1 and Hive SQL 3.1.0 are inconsistent > with same sql. Spark SQL calculates `{{{}0 in ('00')`{}}} as false, which act > different from `{{{}=`{}}} keyword, but Hive calculates true. Hive is > compatible with the `{{{}in`{}}} keyword in 3.1.0, but SparkSQL does not. > It's better when dataTypes of elements in `{{{}In`{}}} expression are the > same, it should behaviour as same as BinaryComparison like ` {{{}EqualTo`{}}}. > Test SQL: > {code:java} > scala> spark.sql("select 1 as test where 0 = '00'").show > ++ > |test| > ++ > | 1| > ++ > scala> spark.sql("select 1 as test where 0 in ('00')").show > ++ > |test| > ++ > ++ > scala> spark.sql("select 1 as test where 0 = '00'").explain(true) > == Parsed Logical Plan == > 'Project [1 AS test#23] > +- 'Filter (0 = 00) > +- OneRowRelation== Analyzed Logical Plan == > test: int > Project [1 AS test#23] > +- Filter (0 = cast(00 as int)) > +- OneRowRelation== Optimized Logical Plan == > Project [1 AS test#23] > +- OneRowRelation== Physical Plan == > *(1) Project [1 AS test#23] > +- *(1) Scan OneRowRelation[] > scala> spark.sql("select 1 as test where 0 in ('00')").explain(true) > == Parsed Logical Plan == > 'Project [1 AS test#25] > +- 'Filter 0 IN (00) > +- OneRowRelation== An
[jira] [Updated] (SPARK-43491) In expression not compatible with EqualTo Expression
[ https://issues.apache.org/jira/browse/SPARK-43491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] KuijianLiu updated SPARK-43491: --- Attachment: image-2023-05-13-13-14-55-853.png > In expression not compatible with EqualTo Expression > > > Key: SPARK-43491 > URL: https://issues.apache.org/jira/browse/SPARK-43491 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.1 >Reporter: KuijianLiu >Priority: Minor > Attachments: image-2023-05-13-13-14-55-853.png > > > The query results of Spark SQL 3.1.1 and Hive SQL 3.1.0 are inconsistent > with same sql. Spark SQL calculates `{{{}0 in ('00')`{}}} as false, which act > different from `{{{}=`{}}} keyword, but Hive calculates true. Hive is > compatible with the `{{{}in`{}}} keyword in 3.1.0, but SparkSQL does not. > It's better when dataTypes of elements in `{{{}In`{}}} expression are the > same, it should behaviour as same as BinaryComparison like ` {{{}EqualTo`{}}}. > Test SQL: > {code:java} > scala> spark.sql("select 1 as test where 0 = '00'").show > ++ > |test| > ++ > | 1| > ++ > scala> spark.sql("select 1 as test where 0 in ('00')").show > ++ > |test| > ++ > ++ > scala> spark.sql("select 1 as test where 0 = '00'").explain(true) > == Parsed Logical Plan == > 'Project [1 AS test#23] > +- 'Filter (0 = 00) > +- OneRowRelation== Analyzed Logical Plan == > test: int > Project [1 AS test#23] > +- Filter (0 = cast(00 as int)) > +- OneRowRelation== Optimized Logical Plan == > Project [1 AS test#23] > +- OneRowRelation== Physical Plan == > *(1) Project [1 AS test#23] > +- *(1) Scan OneRowRelation[] > scala> spark.sql("select 1 as test where 0 in ('00')").explain(true) > == Parsed Logical Plan == > 'Project [1 AS test#25] > +- 'Filter 0 IN (00) > +- OneRowRelation== Analyzed Logical Plan == > test: int > Project [1 AS test#25] > +- Filter cast(0 as string) IN (cast(00 as string)) > +- OneRowRelation== Optimized Logical Plan == > LocalRelation , [test#25]== Physical Plan == > LocalTableScan , [test#25] > {code} > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43491) In expression not compatible with EqualTo Expression
[ https://issues.apache.org/jira/browse/SPARK-43491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] KuijianLiu updated SPARK-43491: --- Description: The query results of Spark SQL 3.1.1 and Hive SQL 3.1.0 are inconsistent with same sql. Spark SQL calculates `{{{}0 in ('00')`{}}} as false, which act different from `{{{}=`{}}} keyword, but Hive calculates true. Hive is compatible with the `{{{}in`{}}} keyword in 3.1.0, but SparkSQL does not. It's better when dataTypes of elements in `{{{}In`{}}} expression are the same, it should behaviour as same as BinaryComparison like ` {{{}EqualTo`{}}}. Test SQL: {code:java} scala> spark.sql("select 1 as test where 0 = '00'").show ++ |test| ++ | 1| ++ scala> spark.sql("select 1 as test where 0 in ('00')").show ++ |test| ++ ++{code} !image-2023-05-13-13-15-50-685.png! !image-2023-05-13-13-14-55-853.png! was: The query results of Spark SQL 3.1.1 and Hive SQL 3.1.0 are inconsistent with same sql. Spark SQL calculates `{{{}0 in ('00')`{}}} as false, which act different from `{{{}=`{}}} keyword, but Hive calculates true. Hive is compatible with the `{{{}in`{}}} keyword in 3.1.0, but SparkSQL does not. It's better when dataTypes of elements in `{{{}In`{}}} expression are the same, it should behaviour as same as BinaryComparison like ` {{{}EqualTo`{}}}. Test SQL: {code:java} scala> spark.sql("select 1 as test where 0 = '00'").show ++ |test| ++ | 1| ++ scala> spark.sql("select 1 as test where 0 in ('00')").show ++ |test| ++ ++ scala> spark.sql("select 1 as test where 0 = '00'").explain(true) == Parsed Logical Plan == 'Project [1 AS test#23] +- 'Filter (0 = 00) +- OneRowRelation== Analyzed Logical Plan == test: int Project [1 AS test#23] +- Filter (0 = cast(00 as int)) +- OneRowRelation== Optimized Logical Plan == Project [1 AS test#23] +- OneRowRelation== Physical Plan == *(1) Project [1 AS test#23] +- *(1) Scan OneRowRelation[] scala> spark.sql("select 1 as test where 0 in ('00')").explain(true) == Parsed Logical Plan == 'Project [1 AS test#25] +- 'Filter 0 IN (00) +- OneRowRelation== Analyzed Logical Plan == test: int Project [1 AS test#25] +- Filter cast(0 as string) IN (cast(00 as string)) +- OneRowRelation== Optimized Logical Plan == LocalRelation , [test#25]== Physical Plan == LocalTableScan , [test#25] {code} !image-2023-05-13-13-14-55-853.png! > In expression not compatible with EqualTo Expression > > > Key: SPARK-43491 > URL: https://issues.apache.org/jira/browse/SPARK-43491 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.1 >Reporter: KuijianLiu >Priority: Minor > Attachments: image-2023-05-13-13-14-55-853.png, > image-2023-05-13-13-15-50-685.png > > > The query results of Spark SQL 3.1.1 and Hive SQL 3.1.0 are inconsistent > with same sql. Spark SQL calculates `{{{}0 in ('00')`{}}} as false, which act > different from `{{{}=`{}}} keyword, but Hive calculates true. Hive is > compatible with the `{{{}in`{}}} keyword in 3.1.0, but SparkSQL does not. > It's better when dataTypes of elements in `{{{}In`{}}} expression are the > same, it should behaviour as same as BinaryComparison like ` {{{}EqualTo`{}}}. > Test SQL: > {code:java} > scala> spark.sql("select 1 as test where 0 = '00'").show > ++ > |test| > ++ > | 1| > ++ > scala> spark.sql("select 1 as test where 0 in ('00')").show > ++ > |test| > ++ > ++{code} > !image-2023-05-13-13-15-50-685.png! > !image-2023-05-13-13-14-55-853.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43491) In expression not compatible with EqualTo Expression
[ https://issues.apache.org/jira/browse/SPARK-43491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] KuijianLiu updated SPARK-43491: --- Attachment: image-2023-05-13-13-15-50-685.png > In expression not compatible with EqualTo Expression > > > Key: SPARK-43491 > URL: https://issues.apache.org/jira/browse/SPARK-43491 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.1 >Reporter: KuijianLiu >Priority: Minor > Attachments: image-2023-05-13-13-14-55-853.png, > image-2023-05-13-13-15-50-685.png > > > The query results of Spark SQL 3.1.1 and Hive SQL 3.1.0 are inconsistent > with same sql. Spark SQL calculates `{{{}0 in ('00')`{}}} as false, which act > different from `{{{}=`{}}} keyword, but Hive calculates true. Hive is > compatible with the `{{{}in`{}}} keyword in 3.1.0, but SparkSQL does not. > It's better when dataTypes of elements in `{{{}In`{}}} expression are the > same, it should behaviour as same as BinaryComparison like ` {{{}EqualTo`{}}}. > Test SQL: > {code:java} > scala> spark.sql("select 1 as test where 0 = '00'").show > ++ > |test| > ++ > | 1| > ++ > scala> spark.sql("select 1 as test where 0 in ('00')").show > ++ > |test| > ++ > ++ > scala> spark.sql("select 1 as test where 0 = '00'").explain(true) > == Parsed Logical Plan == > 'Project [1 AS test#23] > +- 'Filter (0 = 00) > +- OneRowRelation== Analyzed Logical Plan == > test: int > Project [1 AS test#23] > +- Filter (0 = cast(00 as int)) > +- OneRowRelation== Optimized Logical Plan == > Project [1 AS test#23] > +- OneRowRelation== Physical Plan == > *(1) Project [1 AS test#23] > +- *(1) Scan OneRowRelation[] > scala> spark.sql("select 1 as test where 0 in ('00')").explain(true) > == Parsed Logical Plan == > 'Project [1 AS test#25] > +- 'Filter 0 IN (00) > +- OneRowRelation== Analyzed Logical Plan == > test: int > Project [1 AS test#25] > +- Filter cast(0 as string) IN (cast(00 as string)) > +- OneRowRelation== Optimized Logical Plan == > LocalRelation , [test#25]== Physical Plan == > LocalTableScan , [test#25] > {code} > > !image-2023-05-13-13-14-55-853.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org