[jira] [Commented] (SPARK-39280) Speed up Timestamp type inference with user-provided format in JSON/CSV data source

2023-05-12 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17722078#comment-17722078
 ] 

ASF GitHub Bot commented on SPARK-39280:


User 'Hisoka-X' has created a pull request for this issue:
https://github.com/apache/spark/pull/41078

> Speed up Timestamp type inference with user-provided format in JSON/CSV data 
> source
> ---
>
> Key: SPARK-39280
> URL: https://issues.apache.org/jira/browse/SPARK-39280
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Priority: Minor
>
> The optimization of {{DefaultTimestampFormatter}} has been implemented in 
> [#36562|https://github.com/apache/spark/pull/36562] , this ticket adds the 
> optimization of user-provided format. The basic logic is to prevent the 
> formatter from throwing exceptions, and then use catch to determine whether 
> the parsing is successful.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39280) Speed up Timestamp type inference with user-provided format in JSON/CSV data source

2023-05-12 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17722079#comment-17722079
 ] 

ASF GitHub Bot commented on SPARK-39280:


User 'Hisoka-X' has created a pull request for this issue:
https://github.com/apache/spark/pull/41078

> Speed up Timestamp type inference with user-provided format in JSON/CSV data 
> source
> ---
>
> Key: SPARK-39280
> URL: https://issues.apache.org/jira/browse/SPARK-39280
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Priority: Minor
>
> The optimization of {{DefaultTimestampFormatter}} has been implemented in 
> [#36562|https://github.com/apache/spark/pull/36562] , this ticket adds the 
> optimization of user-provided format. The basic logic is to prevent the 
> formatter from throwing exceptions, and then use catch to determine whether 
> the parsing is successful.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40887) Allow Spark on K8s to integrate w/ Log Service

2023-05-12 Thread Ignite TC Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17722105#comment-17722105
 ] 

Ignite TC Bot commented on SPARK-40887:
---

User 'turboFei' has created a pull request for this issue:
https://github.com/apache/spark/pull/41139

> Allow Spark on K8s to integrate w/ Log Service
> --
>
> Key: SPARK-40887
> URL: https://issues.apache.org/jira/browse/SPARK-40887
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 3.4.0
>Reporter: Cheng Pan
>Assignee: Apache Spark
>Priority: Major
>
> https://docs.google.com/document/d/1MfB39LD4B4Rp7MDRxZbMKMbdNSe6V6mBmMQ-gkCnM-0/edit?usp=sharing



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43487) Wrong error message used for `ambiguousRelationAliasNameInNestedCTEError`

2023-05-12 Thread Johan Lasperas (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Johan Lasperas updated SPARK-43487:
---
Description: 
The batch of errors migrated to error classes as part of spark-40540 contains 
an error that got mixed up with the wrong error message:

[ambiguousRelationAliasNameInNestedCTEError|https://github.com/apache/spark/commit/43a6b932759865c45ccf36f3e9cf6898c1b762da#diff-744ac13f6fe074fddeab09b407404bffa2386f54abc83c501e6e1fe618f6db56R1983]
 uses the same error message as the following commandUnsupportedInV2TableError:

 
{code:java}
WITH t AS (SELECT 1), t2 AS ( WITH t AS (SELECT 2) SELECT * FROM t) SELECT * 
FROM t2;
AnalysisException: t is not supported for v2 tables
{code}
The error should be:
{code:java}
AnalysisException: Name tis ambiguous in nested CTE.
Please set spark.sql.legacy.ctePrecedencePolicy to CORRECTED so that name 
defined in inner CTE takes precedence. If set it to LEGACY, outer CTE 
definitions will take precedence. See more details in SPARK-28228.{code}

  was:
The batch of errors migrated to error classes as part of spark-40540 contains 
an error that got mixed up with the wrong error message:

[ambiguousRelationAliasNameInNestedCTEError|https://github.com/apache/spark/commit/43a6b932759865c45ccf36f3e9cf6898c1b762da#diff-744ac13f6fe074fddeab09b407404bffa2386f54abc83c501e6e1fe618f6db56R1983]
 uses the same error message as the following commandUnsupportedInV2TableError:

```

WITH t AS (SELECT 1), t2 AS ( WITH t AS (SELECT 2) SELECT * FROM t) SELECT * 
FROM t2;

 

AnalysisException: t is not supported for v2 tables

```

The error should be:

```

AnalysisException: Name tis ambiguous in nested CTE.
Please set spark.sql.legacy.ctePrecedencePolicy to CORRECTED so that name 
defined in inner CTE takes precedence. If set it to LEGACY, outer CTE 
definitions will take precedence. See more details in SPARK-28228.

```


> Wrong error message used for `ambiguousRelationAliasNameInNestedCTEError`
> -
>
> Key: SPARK-43487
> URL: https://issues.apache.org/jira/browse/SPARK-43487
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Johan Lasperas
>Priority: Minor
>
> The batch of errors migrated to error classes as part of spark-40540 contains 
> an error that got mixed up with the wrong error message:
> [ambiguousRelationAliasNameInNestedCTEError|https://github.com/apache/spark/commit/43a6b932759865c45ccf36f3e9cf6898c1b762da#diff-744ac13f6fe074fddeab09b407404bffa2386f54abc83c501e6e1fe618f6db56R1983]
>  uses the same error message as the following 
> commandUnsupportedInV2TableError:
>  
> {code:java}
> WITH t AS (SELECT 1), t2 AS ( WITH t AS (SELECT 2) SELECT * FROM t) SELECT * 
> FROM t2;
> AnalysisException: t is not supported for v2 tables
> {code}
> The error should be:
> {code:java}
> AnalysisException: Name tis ambiguous in nested CTE.
> Please set spark.sql.legacy.ctePrecedencePolicy to CORRECTED so that name 
> defined in inner CTE takes precedence. If set it to LEGACY, outer CTE 
> definitions will take precedence. See more details in SPARK-28228.{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43487) Wrong error message used for `ambiguousRelationAliasNameInNestedCTEError`

2023-05-12 Thread Johan Lasperas (Jira)
Johan Lasperas created SPARK-43487:
--

 Summary: Wrong error message used for 
`ambiguousRelationAliasNameInNestedCTEError`
 Key: SPARK-43487
 URL: https://issues.apache.org/jira/browse/SPARK-43487
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.4.0
Reporter: Johan Lasperas


The batch of errors migrated to error classes as part of spark-40540 contains 
an error that got mixed up with the wrong error message:

[ambiguousRelationAliasNameInNestedCTEError|https://github.com/apache/spark/commit/43a6b932759865c45ccf36f3e9cf6898c1b762da#diff-744ac13f6fe074fddeab09b407404bffa2386f54abc83c501e6e1fe618f6db56R1983]
 uses the same error message as the following commandUnsupportedInV2TableError:

```

WITH t AS (SELECT 1), t2 AS ( WITH t AS (SELECT 2) SELECT * FROM t) SELECT * 
FROM t2;

 

AnalysisException: t is not supported for v2 tables

```

The error should be:

```

AnalysisException: Name tis ambiguous in nested CTE.
Please set spark.sql.legacy.ctePrecedencePolicy to CORRECTED so that name 
defined in inner CTE takes precedence. If set it to LEGACY, outer CTE 
definitions will take precedence. See more details in SPARK-28228.

```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40129) Decimal multiply can produce the wrong answer because it rounds twice

2023-05-12 Thread Jia Fan (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1774#comment-1774
 ] 

Jia Fan commented on SPARK-40129:
-

https://github.com/apache/spark/pull/41156

> Decimal multiply can produce the wrong answer because it rounds twice
> -
>
> Key: SPARK-40129
> URL: https://issues.apache.org/jira/browse/SPARK-40129
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0, 3.3.0, 3.4.0
>Reporter: Robert Joseph Evans
>Priority: Major
>
> This looks like it has been around for a long time, but I have reproduced it 
> in 3.2.0+
> The example here is multiplying Decimal(38, 10) by another Decimal(38, 10), 
> but I think it can be reproduced with other number combinations, and possibly 
> with divide too.
> {code:java}
> Seq("9173594185998001607642838421.5479932913").toDF.selectExpr("CAST(value as 
> DECIMAL(38,10)) as a").selectExpr("a * CAST(-12 as 
> DECIMAL(38,10))").show(truncate=false)
> {code}
> This produces an answer in Spark of 
> {{-110083130231976019291714061058.575920}} But if I do the calculation in 
> regular java BigDecimal I get {{-110083130231976019291714061058.575919}}
> {code:java}
> BigDecimal l = new BigDecimal("9173594185998001607642838421.5479932913");
> BigDecimal r = new BigDecimal("-12.00");
> BigDecimal prod = l.multiply(r);
> BigDecimal rounded_prod = prod.setScale(6, RoundingMode.HALF_UP);
> {code}
> Spark does essentially all of the same operations, but it used Decimal to do 
> it instead of java's BigDecimal directly. Spark, by way of Decimal, will set 
> a MathContext for the multiply operation that has a max precision of 38 and 
> will do half up rounding. That means that the result of the multiply 
> operation in Spark is {{{}-110083130231976019291714061058.57591950{}}}, but 
> for the java BigDecimal code the result is 
> {{{}-110083130231976019291714061058.575919495600{}}}. Then in 
> CheckOverflow for 3.2.0 and 3.3.0 or in just the regular Multiply expression 
> in 3.4.0 the setScale is called (as a part of Decimal.setPrecision). At that 
> point the already rounded number is rounded yet again resulting in what is 
> arguably a wrong answer by Spark.
> I have not fully tested this, but it looks like we could just remove the 
> MathContext entirely in Decimal, or set it to UNLIMITED. All of the decimal 
> operations appear to have their own overflow and rounding anyways. If we want 
> to potentially reduce the total memory usage, we could also set the max 
> precision to 39 and truncate (round down) the result in the math context 
> instead.  That would then let us round the result correctly in setPrecision 
> afterwards.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-43484) Kafka/Kinesis Assembly should not package hadoop-client-runtime

2023-05-12 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun reassigned SPARK-43484:


Assignee: Cheng Pan

> Kafka/Kinesis Assembly should not package hadoop-client-runtime
> ---
>
> Key: SPARK-43484
> URL: https://issues.apache.org/jira/browse/SPARK-43484
> Project: Spark
>  Issue Type: Bug
>  Components: Build, Structured Streaming
>Affects Versions: 3.2.4, 3.3.2, 3.4.0, 3.5.0
>Reporter: Cheng Pan
>Assignee: Cheng Pan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-43484) Kafka/Kinesis Assembly should not package hadoop-client-runtime

2023-05-12 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun resolved SPARK-43484.
--
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 41152
[https://github.com/apache/spark/pull/41152]

> Kafka/Kinesis Assembly should not package hadoop-client-runtime
> ---
>
> Key: SPARK-43484
> URL: https://issues.apache.org/jira/browse/SPARK-43484
> Project: Spark
>  Issue Type: Bug
>  Components: Build, Structured Streaming
>Affects Versions: 3.2.4, 3.3.2, 3.4.0, 3.5.0
>Reporter: Cheng Pan
>Assignee: Cheng Pan
>Priority: Major
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40129) Decimal multiply can produce the wrong answer because it rounds twice

2023-05-12 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1778#comment-1778
 ] 

Hudson commented on SPARK-40129:


User 'Hisoka-X' has created a pull request for this issue:
https://github.com/apache/spark/pull/41156

> Decimal multiply can produce the wrong answer because it rounds twice
> -
>
> Key: SPARK-40129
> URL: https://issues.apache.org/jira/browse/SPARK-40129
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0, 3.3.0, 3.4.0
>Reporter: Robert Joseph Evans
>Priority: Major
>
> This looks like it has been around for a long time, but I have reproduced it 
> in 3.2.0+
> The example here is multiplying Decimal(38, 10) by another Decimal(38, 10), 
> but I think it can be reproduced with other number combinations, and possibly 
> with divide too.
> {code:java}
> Seq("9173594185998001607642838421.5479932913").toDF.selectExpr("CAST(value as 
> DECIMAL(38,10)) as a").selectExpr("a * CAST(-12 as 
> DECIMAL(38,10))").show(truncate=false)
> {code}
> This produces an answer in Spark of 
> {{-110083130231976019291714061058.575920}} But if I do the calculation in 
> regular java BigDecimal I get {{-110083130231976019291714061058.575919}}
> {code:java}
> BigDecimal l = new BigDecimal("9173594185998001607642838421.5479932913");
> BigDecimal r = new BigDecimal("-12.00");
> BigDecimal prod = l.multiply(r);
> BigDecimal rounded_prod = prod.setScale(6, RoundingMode.HALF_UP);
> {code}
> Spark does essentially all of the same operations, but it used Decimal to do 
> it instead of java's BigDecimal directly. Spark, by way of Decimal, will set 
> a MathContext for the multiply operation that has a max precision of 38 and 
> will do half up rounding. That means that the result of the multiply 
> operation in Spark is {{{}-110083130231976019291714061058.57591950{}}}, but 
> for the java BigDecimal code the result is 
> {{{}-110083130231976019291714061058.575919495600{}}}. Then in 
> CheckOverflow for 3.2.0 and 3.3.0 or in just the regular Multiply expression 
> in 3.4.0 the setScale is called (as a part of Decimal.setPrecision). At that 
> point the already rounded number is rounded yet again resulting in what is 
> arguably a wrong answer by Spark.
> I have not fully tested this, but it looks like we could just remove the 
> MathContext entirely in Decimal, or set it to UNLIMITED. All of the decimal 
> operations appear to have their own overflow and rounding anyways. If we want 
> to potentially reduce the total memory usage, we could also set the max 
> precision to 39 and truncate (round down) the result in the math context 
> instead.  That would then let us round the result correctly in setPrecision 
> afterwards.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43484) Kafka/Kinesis Assembly should not package hadoop-client-runtime

2023-05-12 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1779#comment-1779
 ] 

Hudson commented on SPARK-43484:


User 'pan3793' has created a pull request for this issue:
https://github.com/apache/spark/pull/41152

> Kafka/Kinesis Assembly should not package hadoop-client-runtime
> ---
>
> Key: SPARK-43484
> URL: https://issues.apache.org/jira/browse/SPARK-43484
> Project: Spark
>  Issue Type: Bug
>  Components: Build, Structured Streaming
>Affects Versions: 3.2.4, 3.3.2, 3.4.0, 3.5.0
>Reporter: Cheng Pan
>Assignee: Cheng Pan
>Priority: Major
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43488) bitmap function

2023-05-12 Thread yiku123 (Jira)
yiku123 created SPARK-43488:
---

 Summary: bitmap function
 Key: SPARK-43488
 URL: https://issues.apache.org/jira/browse/SPARK-43488
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 3.4.0
Reporter: yiku123
 Fix For: 3.4.1


maybe spark need to have some bitmap functions? example  like bitmapBuild 
、bitmapAnd、bitmapAndCardinality in clickhouse or other OLAP engine。

This is often used in user profiling applications but i don't find in spark

 

 
h2.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43488) bitmap function

2023-05-12 Thread yiku123 (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yiku123 updated SPARK-43488:

Labels: patch  (was: )

> bitmap function
> ---
>
> Key: SPARK-43488
> URL: https://issues.apache.org/jira/browse/SPARK-43488
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: yiku123
>Priority: Major
>  Labels: patch
> Fix For: 3.4.1
>
>
> maybe spark need to have some bitmap functions? example  like bitmapBuild 
> 、bitmapAnd、bitmapAndCardinality in clickhouse or other OLAP engine。
> This is often used in user profiling applications but i don't find in spark
>  
>  
> h2.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43489) Remove protobuf 2.5.0

2023-05-12 Thread Cheng Pan (Jira)
Cheng Pan created SPARK-43489:
-

 Summary: Remove protobuf 2.5.0
 Key: SPARK-43489
 URL: https://issues.apache.org/jira/browse/SPARK-43489
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.5.0
Reporter: Cheng Pan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40500) Use `pd.items` instead of `pd.iteritems`

2023-05-12 Thread Jim Huang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17722262#comment-17722262
 ] 

Jim Huang commented on SPARK-40500:
---

Thank you all involved for fixing this issue!
This issue exists in Spark 3.3.x and Spark 3.2.x.  
What is the thought process that this same fix can or will be back-ported to 
the older Spark versions?

> Use `pd.items` instead of `pd.iteritems`
> 
>
> Key: SPARK-40500
> URL: https://issues.apache.org/jira/browse/SPARK-40500
> Project: Spark
>  Issue Type: Improvement
>  Components: Pandas API on Spark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Minor
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-40500) Use `pd.items` instead of `pd.iteritems`

2023-05-12 Thread Jim Huang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17722262#comment-17722262
 ] 

Jim Huang edited comment on SPARK-40500 at 5/12/23 6:58 PM:


Thank you all involved for fixing this issue!
This issue exists in Spark 3.3.x and Spark 3.2.x when paired up with a recent 
version of Pandas.  
What is the thought process that this same fix can or will be back-ported to 
the older Spark versions?


was (Author: jimhuang):
Thank you all involved for fixing this issue!
This issue exists in Spark 3.3.x and Spark 3.2.x.  
What is the thought process that this same fix can or will be back-ported to 
the older Spark versions?

> Use `pd.items` instead of `pd.iteritems`
> 
>
> Key: SPARK-40500
> URL: https://issues.apache.org/jira/browse/SPARK-40500
> Project: Spark
>  Issue Type: Improvement
>  Components: Pandas API on Spark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Minor
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-40500) Use `pd.items` instead of `pd.iteritems`

2023-05-12 Thread Jim Huang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17722262#comment-17722262
 ] 

Jim Huang edited comment on SPARK-40500 at 5/12/23 6:59 PM:


Thank you all involved for fixing this issue!
This issue exists in Spark 3.3.x and Spark 3.2.x when paired up with a recent 
version of Pandas where `pd.iteritems` have being deprecated.  
What is the thought process that this same fix can or will be back-ported to 
the older Spark versions?


was (Author: jimhuang):
Thank you all involved for fixing this issue!
This issue exists in Spark 3.3.x and Spark 3.2.x when paired up with a recent 
version of Pandas.  
What is the thought process that this same fix can or will be back-ported to 
the older Spark versions?

> Use `pd.items` instead of `pd.iteritems`
> 
>
> Key: SPARK-40500
> URL: https://issues.apache.org/jira/browse/SPARK-40500
> Project: Spark
>  Issue Type: Improvement
>  Components: Pandas API on Spark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Minor
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-43272) Replace reflection w/ direct calling for `SparkHadoopUtil#createFile`

2023-05-12 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun resolved SPARK-43272.
--
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 40945
[https://github.com/apache/spark/pull/40945]

> Replace reflection w/ direct calling for  `SparkHadoopUtil#createFile`
> --
>
> Key: SPARK-43272
> URL: https://issues.apache.org/jira/browse/SPARK-43272
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-43272) Replace reflection w/ direct calling for `SparkHadoopUtil#createFile`

2023-05-12 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun reassigned SPARK-43272:


Assignee: Yang Jie

> Replace reflection w/ direct calling for  `SparkHadoopUtil#createFile`
> --
>
> Key: SPARK-43272
> URL: https://issues.apache.org/jira/browse/SPARK-43272
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43486) number of files read is incorrect if it is bucket table

2023-05-12 Thread BingKun Pan (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17722308#comment-17722308
 ] 

BingKun Pan commented on SPARK-43486:
-

Can I do it? [~yumwang] 

> number of files read is incorrect if it is bucket table
> ---
>
> Key: SPARK-43486
> URL: https://issues.apache.org/jira/browse/SPARK-43486
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yuming Wang
>Priority: Major
> Attachments: screenshot-1.png
>
>
>  !screenshot-1.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43486) number of files read is incorrect if it is bucket table

2023-05-12 Thread Yuming Wang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17722313#comment-17722313
 ] 

Yuming Wang commented on SPARK-43486:
-

[~panbingkun] Yes, please.

> number of files read is incorrect if it is bucket table
> ---
>
> Key: SPARK-43486
> URL: https://issues.apache.org/jira/browse/SPARK-43486
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yuming Wang
>Priority: Major
> Attachments: screenshot-1.png
>
>
>  !screenshot-1.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43490) Upgrade sbt to 1.8.3

2023-05-12 Thread BingKun Pan (Jira)
BingKun Pan created SPARK-43490:
---

 Summary: Upgrade sbt to 1.8.3
 Key: SPARK-43490
 URL: https://issues.apache.org/jira/browse/SPARK-43490
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.5.0
Reporter: BingKun Pan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43488) bitmap function

2023-05-12 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-43488:

Fix Version/s: (was: 3.4.1)

> bitmap function
> ---
>
> Key: SPARK-43488
> URL: https://issues.apache.org/jira/browse/SPARK-43488
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: yiku123
>Priority: Major
>  Labels: patch
>
> maybe spark need to have some bitmap functions? example  like bitmapBuild 
> 、bitmapAnd、bitmapAndCardinality in clickhouse or other OLAP engine。
> This is often used in user profiling applications but i don't find in spark
>  
>  
> h2.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43487) Wrong error message used for `ambiguousRelationAliasNameInNestedCTEError`

2023-05-12 Thread BingKun Pan (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17722316#comment-17722316
 ] 

BingKun Pan commented on SPARK-43487:
-

I work on it.

> Wrong error message used for `ambiguousRelationAliasNameInNestedCTEError`
> -
>
> Key: SPARK-43487
> URL: https://issues.apache.org/jira/browse/SPARK-43487
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Johan Lasperas
>Priority: Minor
>
> The batch of errors migrated to error classes as part of spark-40540 contains 
> an error that got mixed up with the wrong error message:
> [ambiguousRelationAliasNameInNestedCTEError|https://github.com/apache/spark/commit/43a6b932759865c45ccf36f3e9cf6898c1b762da#diff-744ac13f6fe074fddeab09b407404bffa2386f54abc83c501e6e1fe618f6db56R1983]
>  uses the same error message as the following 
> commandUnsupportedInV2TableError:
>  
> {code:java}
> WITH t AS (SELECT 1), t2 AS ( WITH t AS (SELECT 2) SELECT * FROM t) SELECT * 
> FROM t2;
> AnalysisException: t is not supported for v2 tables
> {code}
> The error should be:
> {code:java}
> AnalysisException: Name tis ambiguous in nested CTE.
> Please set spark.sql.legacy.ctePrecedencePolicy to CORRECTED so that name 
> defined in inner CTE takes precedence. If set it to LEGACY, outer CTE 
> definitions will take precedence. See more details in SPARK-28228.{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43487) Wrong error message used for `ambiguousRelationAliasNameInNestedCTEError`

2023-05-12 Thread Snoot.io (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17722320#comment-17722320
 ] 

Snoot.io commented on SPARK-43487:
--

User 'panbingkun' has created a pull request for this issue:
https://github.com/apache/spark/pull/41161

> Wrong error message used for `ambiguousRelationAliasNameInNestedCTEError`
> -
>
> Key: SPARK-43487
> URL: https://issues.apache.org/jira/browse/SPARK-43487
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Johan Lasperas
>Priority: Minor
>
> The batch of errors migrated to error classes as part of spark-40540 contains 
> an error that got mixed up with the wrong error message:
> [ambiguousRelationAliasNameInNestedCTEError|https://github.com/apache/spark/commit/43a6b932759865c45ccf36f3e9cf6898c1b762da#diff-744ac13f6fe074fddeab09b407404bffa2386f54abc83c501e6e1fe618f6db56R1983]
>  uses the same error message as the following 
> commandUnsupportedInV2TableError:
>  
> {code:java}
> WITH t AS (SELECT 1), t2 AS ( WITH t AS (SELECT 2) SELECT * FROM t) SELECT * 
> FROM t2;
> AnalysisException: t is not supported for v2 tables
> {code}
> The error should be:
> {code:java}
> AnalysisException: Name tis ambiguous in nested CTE.
> Please set spark.sql.legacy.ctePrecedencePolicy to CORRECTED so that name 
> defined in inner CTE takes precedence. If set it to LEGACY, outer CTE 
> definitions will take precedence. See more details in SPARK-28228.{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43487) Wrong error message used for `ambiguousRelationAliasNameInNestedCTEError`

2023-05-12 Thread Snoot.io (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17722319#comment-17722319
 ] 

Snoot.io commented on SPARK-43487:
--

User 'panbingkun' has created a pull request for this issue:
https://github.com/apache/spark/pull/41161

> Wrong error message used for `ambiguousRelationAliasNameInNestedCTEError`
> -
>
> Key: SPARK-43487
> URL: https://issues.apache.org/jira/browse/SPARK-43487
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Johan Lasperas
>Priority: Minor
>
> The batch of errors migrated to error classes as part of spark-40540 contains 
> an error that got mixed up with the wrong error message:
> [ambiguousRelationAliasNameInNestedCTEError|https://github.com/apache/spark/commit/43a6b932759865c45ccf36f3e9cf6898c1b762da#diff-744ac13f6fe074fddeab09b407404bffa2386f54abc83c501e6e1fe618f6db56R1983]
>  uses the same error message as the following 
> commandUnsupportedInV2TableError:
>  
> {code:java}
> WITH t AS (SELECT 1), t2 AS ( WITH t AS (SELECT 2) SELECT * FROM t) SELECT * 
> FROM t2;
> AnalysisException: t is not supported for v2 tables
> {code}
> The error should be:
> {code:java}
> AnalysisException: Name tis ambiguous in nested CTE.
> Please set spark.sql.legacy.ctePrecedencePolicy to CORRECTED so that name 
> defined in inner CTE takes precedence. If set it to LEGACY, outer CTE 
> definitions will take precedence. See more details in SPARK-28228.{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43491) In expression not compatible with EqualTo Expression

2023-05-12 Thread KuijianLiu (Jira)
KuijianLiu created SPARK-43491:
--

 Summary: In expression not compatible with EqualTo Expression
 Key: SPARK-43491
 URL: https://issues.apache.org/jira/browse/SPARK-43491
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.3.1
Reporter: KuijianLiu


The query results of Spark SQL 3.1.1  and Hive SQL 3.1.0 are inconsistent with 
same sql. Spark SQL calculates `{{{}0 in ('00')`{}}} as false, which act 
different from `{{{}=`{}}} keyword, but Hive calculates true. Hive is 
compatible with the `{{{}in`{}}} keyword in 3.1.0, but SparkSQL does not.

It's better  when dataTypes of elements in `{{{}In`{}}} expression are the 
same, it should behaviour as same as BinaryComparison like ` {{{}EqualTo`{}}}.

Test SQL:
{code:java}
scala> spark.sql("select 1 as test where 0 = '00'").show
++
|test|
++
|   1|
++
scala> spark.sql("select 1 as test where 0 in ('00')").show
++
|test|
++
++
{code}
 

!image-2023-05-13-13-06-24-551.png!

!image-2023-05-13-13-07-49-194.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43491) In expression not compatible with EqualTo Expression

2023-05-12 Thread KuijianLiu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

KuijianLiu updated SPARK-43491:
---
Description: 
The query results of Spark SQL 3.1.1  and Hive SQL 3.1.0 are inconsistent with 
same sql. Spark SQL calculates `{{{}0 in ('00')`{}}} as false, which act 
different from `{{{}=`{}}} keyword, but Hive calculates true. Hive is 
compatible with the `{{{}in`{}}} keyword in 3.1.0, but SparkSQL does not.

It's better  when dataTypes of elements in `{{{}In`{}}} expression are the 
same, it should behaviour as same as BinaryComparison like ` {{{}EqualTo`{}}}.

Test SQL:
{code:java}
scala> spark.sql("select 1 as test where 0 = '00'").show
++
|test|
++
|   1|
++

scala> spark.sql("select 1 as test where 0 in ('00')").show
++
|test|
++
++

scala> spark.sql("select 1 as test where 0 = '00'").explain(true)
== Parsed Logical Plan ==
'Project [1 AS test#23]
+- 'Filter (0 = 00)
   +- OneRowRelation== Analyzed Logical Plan ==
test: int
Project [1 AS test#23]
+- Filter (0 = cast(00 as int))
   +- OneRowRelation== Optimized Logical Plan ==
Project [1 AS test#23]
+- OneRowRelation== Physical Plan ==
*(1) Project [1 AS test#23]
+- *(1) Scan OneRowRelation[]

scala> spark.sql("select 1 as test where 0 in ('00')").explain(true)
== Parsed Logical Plan ==
'Project [1 AS test#25]
+- 'Filter 0 IN (00)
   +- OneRowRelation== Analyzed Logical Plan ==
test: int
Project [1 AS test#25]
+- Filter cast(0 as string) IN (cast(00 as string))
   +- OneRowRelation== Optimized Logical Plan ==
LocalRelation , [test#25]== Physical Plan ==
LocalTableScan , [test#25]
 {code}
 

 

  was:
The query results of Spark SQL 3.1.1  and Hive SQL 3.1.0 are inconsistent with 
same sql. Spark SQL calculates `{{{}0 in ('00')`{}}} as false, which act 
different from `{{{}=`{}}} keyword, but Hive calculates true. Hive is 
compatible with the `{{{}in`{}}} keyword in 3.1.0, but SparkSQL does not.

It's better  when dataTypes of elements in `{{{}In`{}}} expression are the 
same, it should behaviour as same as BinaryComparison like ` {{{}EqualTo`{}}}.

Test SQL:
{code:java}
scala> spark.sql("select 1 as test where 0 = '00'").show
++
|test|
++
|   1|
++
scala> spark.sql("select 1 as test where 0 in ('00')").show
++
|test|
++
++
{code}
 

!image-2023-05-13-13-06-24-551.png!

!image-2023-05-13-13-07-49-194.png!


> In expression not compatible with EqualTo Expression
> 
>
> Key: SPARK-43491
> URL: https://issues.apache.org/jira/browse/SPARK-43491
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.1
>Reporter: KuijianLiu
>Priority: Minor
>
> The query results of Spark SQL 3.1.1  and Hive SQL 3.1.0 are inconsistent 
> with same sql. Spark SQL calculates `{{{}0 in ('00')`{}}} as false, which act 
> different from `{{{}=`{}}} keyword, but Hive calculates true. Hive is 
> compatible with the `{{{}in`{}}} keyword in 3.1.0, but SparkSQL does not.
> It's better  when dataTypes of elements in `{{{}In`{}}} expression are the 
> same, it should behaviour as same as BinaryComparison like ` {{{}EqualTo`{}}}.
> Test SQL:
> {code:java}
> scala> spark.sql("select 1 as test where 0 = '00'").show
> ++
> |test|
> ++
> |   1|
> ++
> scala> spark.sql("select 1 as test where 0 in ('00')").show
> ++
> |test|
> ++
> ++
> scala> spark.sql("select 1 as test where 0 = '00'").explain(true)
> == Parsed Logical Plan ==
> 'Project [1 AS test#23]
> +- 'Filter (0 = 00)
>    +- OneRowRelation== Analyzed Logical Plan ==
> test: int
> Project [1 AS test#23]
> +- Filter (0 = cast(00 as int))
>    +- OneRowRelation== Optimized Logical Plan ==
> Project [1 AS test#23]
> +- OneRowRelation== Physical Plan ==
> *(1) Project [1 AS test#23]
> +- *(1) Scan OneRowRelation[]
> scala> spark.sql("select 1 as test where 0 in ('00')").explain(true)
> == Parsed Logical Plan ==
> 'Project [1 AS test#25]
> +- 'Filter 0 IN (00)
>    +- OneRowRelation== Analyzed Logical Plan ==
> test: int
> Project [1 AS test#25]
> +- Filter cast(0 as string) IN (cast(00 as string))
>    +- OneRowRelation== Optimized Logical Plan ==
> LocalRelation , [test#25]== Physical Plan ==
> LocalTableScan , [test#25]
>  {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43491) In expression not compatible with EqualTo Expression

2023-05-12 Thread KuijianLiu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

KuijianLiu updated SPARK-43491:
---
Description: 
The query results of Spark SQL 3.1.1  and Hive SQL 3.1.0 are inconsistent with 
same sql. Spark SQL calculates `{{{}0 in ('00')`{}}} as false, which act 
different from `{{{}=`{}}} keyword, but Hive calculates true. Hive is 
compatible with the `{{{}in`{}}} keyword in 3.1.0, but SparkSQL does not.

It's better  when dataTypes of elements in `{{{}In`{}}} expression are the 
same, it should behaviour as same as BinaryComparison like ` {{{}EqualTo`{}}}.

Test SQL:
{code:java}
scala> spark.sql("select 1 as test where 0 = '00'").show
++
|test|
++
|   1|
++

scala> spark.sql("select 1 as test where 0 in ('00')").show
++
|test|
++
++

scala> spark.sql("select 1 as test where 0 = '00'").explain(true)
== Parsed Logical Plan ==
'Project [1 AS test#23]
+- 'Filter (0 = 00)
   +- OneRowRelation== Analyzed Logical Plan ==
test: int
Project [1 AS test#23]
+- Filter (0 = cast(00 as int))
   +- OneRowRelation== Optimized Logical Plan ==
Project [1 AS test#23]
+- OneRowRelation== Physical Plan ==
*(1) Project [1 AS test#23]
+- *(1) Scan OneRowRelation[]

scala> spark.sql("select 1 as test where 0 in ('00')").explain(true)
== Parsed Logical Plan ==
'Project [1 AS test#25]
+- 'Filter 0 IN (00)
   +- OneRowRelation== Analyzed Logical Plan ==
test: int
Project [1 AS test#25]
+- Filter cast(0 as string) IN (cast(00 as string))
   +- OneRowRelation== Optimized Logical Plan ==
LocalRelation , [test#25]== Physical Plan ==
LocalTableScan , [test#25]
 {code}
 

!image-2023-05-13-13-14-55-853.png!

  was:
The query results of Spark SQL 3.1.1  and Hive SQL 3.1.0 are inconsistent with 
same sql. Spark SQL calculates `{{{}0 in ('00')`{}}} as false, which act 
different from `{{{}=`{}}} keyword, but Hive calculates true. Hive is 
compatible with the `{{{}in`{}}} keyword in 3.1.0, but SparkSQL does not.

It's better  when dataTypes of elements in `{{{}In`{}}} expression are the 
same, it should behaviour as same as BinaryComparison like ` {{{}EqualTo`{}}}.

Test SQL:
{code:java}
scala> spark.sql("select 1 as test where 0 = '00'").show
++
|test|
++
|   1|
++

scala> spark.sql("select 1 as test where 0 in ('00')").show
++
|test|
++
++

scala> spark.sql("select 1 as test where 0 = '00'").explain(true)
== Parsed Logical Plan ==
'Project [1 AS test#23]
+- 'Filter (0 = 00)
   +- OneRowRelation== Analyzed Logical Plan ==
test: int
Project [1 AS test#23]
+- Filter (0 = cast(00 as int))
   +- OneRowRelation== Optimized Logical Plan ==
Project [1 AS test#23]
+- OneRowRelation== Physical Plan ==
*(1) Project [1 AS test#23]
+- *(1) Scan OneRowRelation[]

scala> spark.sql("select 1 as test where 0 in ('00')").explain(true)
== Parsed Logical Plan ==
'Project [1 AS test#25]
+- 'Filter 0 IN (00)
   +- OneRowRelation== Analyzed Logical Plan ==
test: int
Project [1 AS test#25]
+- Filter cast(0 as string) IN (cast(00 as string))
   +- OneRowRelation== Optimized Logical Plan ==
LocalRelation , [test#25]== Physical Plan ==
LocalTableScan , [test#25]
 {code}
 

 


> In expression not compatible with EqualTo Expression
> 
>
> Key: SPARK-43491
> URL: https://issues.apache.org/jira/browse/SPARK-43491
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.1
>Reporter: KuijianLiu
>Priority: Minor
> Attachments: image-2023-05-13-13-14-55-853.png
>
>
> The query results of Spark SQL 3.1.1  and Hive SQL 3.1.0 are inconsistent 
> with same sql. Spark SQL calculates `{{{}0 in ('00')`{}}} as false, which act 
> different from `{{{}=`{}}} keyword, but Hive calculates true. Hive is 
> compatible with the `{{{}in`{}}} keyword in 3.1.0, but SparkSQL does not.
> It's better  when dataTypes of elements in `{{{}In`{}}} expression are the 
> same, it should behaviour as same as BinaryComparison like ` {{{}EqualTo`{}}}.
> Test SQL:
> {code:java}
> scala> spark.sql("select 1 as test where 0 = '00'").show
> ++
> |test|
> ++
> |   1|
> ++
> scala> spark.sql("select 1 as test where 0 in ('00')").show
> ++
> |test|
> ++
> ++
> scala> spark.sql("select 1 as test where 0 = '00'").explain(true)
> == Parsed Logical Plan ==
> 'Project [1 AS test#23]
> +- 'Filter (0 = 00)
>    +- OneRowRelation== Analyzed Logical Plan ==
> test: int
> Project [1 AS test#23]
> +- Filter (0 = cast(00 as int))
>    +- OneRowRelation== Optimized Logical Plan ==
> Project [1 AS test#23]
> +- OneRowRelation== Physical Plan ==
> *(1) Project [1 AS test#23]
> +- *(1) Scan OneRowRelation[]
> scala> spark.sql("select 1 as test where 0 in ('00')").explain(true)
> == Parsed Logical Plan ==
> 'Project [1 AS test#25]
> +- 'Filter 0 IN (00)
>    +- OneRowRelation== An

[jira] [Updated] (SPARK-43491) In expression not compatible with EqualTo Expression

2023-05-12 Thread KuijianLiu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

KuijianLiu updated SPARK-43491:
---
Attachment: image-2023-05-13-13-14-55-853.png

> In expression not compatible with EqualTo Expression
> 
>
> Key: SPARK-43491
> URL: https://issues.apache.org/jira/browse/SPARK-43491
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.1
>Reporter: KuijianLiu
>Priority: Minor
> Attachments: image-2023-05-13-13-14-55-853.png
>
>
> The query results of Spark SQL 3.1.1  and Hive SQL 3.1.0 are inconsistent 
> with same sql. Spark SQL calculates `{{{}0 in ('00')`{}}} as false, which act 
> different from `{{{}=`{}}} keyword, but Hive calculates true. Hive is 
> compatible with the `{{{}in`{}}} keyword in 3.1.0, but SparkSQL does not.
> It's better  when dataTypes of elements in `{{{}In`{}}} expression are the 
> same, it should behaviour as same as BinaryComparison like ` {{{}EqualTo`{}}}.
> Test SQL:
> {code:java}
> scala> spark.sql("select 1 as test where 0 = '00'").show
> ++
> |test|
> ++
> |   1|
> ++
> scala> spark.sql("select 1 as test where 0 in ('00')").show
> ++
> |test|
> ++
> ++
> scala> spark.sql("select 1 as test where 0 = '00'").explain(true)
> == Parsed Logical Plan ==
> 'Project [1 AS test#23]
> +- 'Filter (0 = 00)
>    +- OneRowRelation== Analyzed Logical Plan ==
> test: int
> Project [1 AS test#23]
> +- Filter (0 = cast(00 as int))
>    +- OneRowRelation== Optimized Logical Plan ==
> Project [1 AS test#23]
> +- OneRowRelation== Physical Plan ==
> *(1) Project [1 AS test#23]
> +- *(1) Scan OneRowRelation[]
> scala> spark.sql("select 1 as test where 0 in ('00')").explain(true)
> == Parsed Logical Plan ==
> 'Project [1 AS test#25]
> +- 'Filter 0 IN (00)
>    +- OneRowRelation== Analyzed Logical Plan ==
> test: int
> Project [1 AS test#25]
> +- Filter cast(0 as string) IN (cast(00 as string))
>    +- OneRowRelation== Optimized Logical Plan ==
> LocalRelation , [test#25]== Physical Plan ==
> LocalTableScan , [test#25]
>  {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43491) In expression not compatible with EqualTo Expression

2023-05-12 Thread KuijianLiu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

KuijianLiu updated SPARK-43491:
---
Description: 
The query results of Spark SQL 3.1.1  and Hive SQL 3.1.0 are inconsistent with 
same sql. Spark SQL calculates `{{{}0 in ('00')`{}}} as false, which act 
different from `{{{}=`{}}} keyword, but Hive calculates true. Hive is 
compatible with the `{{{}in`{}}} keyword in 3.1.0, but SparkSQL does not.

It's better  when dataTypes of elements in `{{{}In`{}}} expression are the 
same, it should behaviour as same as BinaryComparison like ` {{{}EqualTo`{}}}.

Test SQL:
{code:java}
scala> spark.sql("select 1 as test where 0 = '00'").show
++
|test|
++
|   1|
++

scala> spark.sql("select 1 as test where 0 in ('00')").show
++
|test|
++
++{code}
!image-2023-05-13-13-15-50-685.png!

!image-2023-05-13-13-14-55-853.png!

  was:
The query results of Spark SQL 3.1.1  and Hive SQL 3.1.0 are inconsistent with 
same sql. Spark SQL calculates `{{{}0 in ('00')`{}}} as false, which act 
different from `{{{}=`{}}} keyword, but Hive calculates true. Hive is 
compatible with the `{{{}in`{}}} keyword in 3.1.0, but SparkSQL does not.

It's better  when dataTypes of elements in `{{{}In`{}}} expression are the 
same, it should behaviour as same as BinaryComparison like ` {{{}EqualTo`{}}}.

Test SQL:
{code:java}
scala> spark.sql("select 1 as test where 0 = '00'").show
++
|test|
++
|   1|
++

scala> spark.sql("select 1 as test where 0 in ('00')").show
++
|test|
++
++

scala> spark.sql("select 1 as test where 0 = '00'").explain(true)
== Parsed Logical Plan ==
'Project [1 AS test#23]
+- 'Filter (0 = 00)
   +- OneRowRelation== Analyzed Logical Plan ==
test: int
Project [1 AS test#23]
+- Filter (0 = cast(00 as int))
   +- OneRowRelation== Optimized Logical Plan ==
Project [1 AS test#23]
+- OneRowRelation== Physical Plan ==
*(1) Project [1 AS test#23]
+- *(1) Scan OneRowRelation[]

scala> spark.sql("select 1 as test where 0 in ('00')").explain(true)
== Parsed Logical Plan ==
'Project [1 AS test#25]
+- 'Filter 0 IN (00)
   +- OneRowRelation== Analyzed Logical Plan ==
test: int
Project [1 AS test#25]
+- Filter cast(0 as string) IN (cast(00 as string))
   +- OneRowRelation== Optimized Logical Plan ==
LocalRelation , [test#25]== Physical Plan ==
LocalTableScan , [test#25]
 {code}
 

!image-2023-05-13-13-14-55-853.png!


> In expression not compatible with EqualTo Expression
> 
>
> Key: SPARK-43491
> URL: https://issues.apache.org/jira/browse/SPARK-43491
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.1
>Reporter: KuijianLiu
>Priority: Minor
> Attachments: image-2023-05-13-13-14-55-853.png, 
> image-2023-05-13-13-15-50-685.png
>
>
> The query results of Spark SQL 3.1.1  and Hive SQL 3.1.0 are inconsistent 
> with same sql. Spark SQL calculates `{{{}0 in ('00')`{}}} as false, which act 
> different from `{{{}=`{}}} keyword, but Hive calculates true. Hive is 
> compatible with the `{{{}in`{}}} keyword in 3.1.0, but SparkSQL does not.
> It's better  when dataTypes of elements in `{{{}In`{}}} expression are the 
> same, it should behaviour as same as BinaryComparison like ` {{{}EqualTo`{}}}.
> Test SQL:
> {code:java}
> scala> spark.sql("select 1 as test where 0 = '00'").show
> ++
> |test|
> ++
> |   1|
> ++
> scala> spark.sql("select 1 as test where 0 in ('00')").show
> ++
> |test|
> ++
> ++{code}
> !image-2023-05-13-13-15-50-685.png!
> !image-2023-05-13-13-14-55-853.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43491) In expression not compatible with EqualTo Expression

2023-05-12 Thread KuijianLiu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

KuijianLiu updated SPARK-43491:
---
Attachment: image-2023-05-13-13-15-50-685.png

> In expression not compatible with EqualTo Expression
> 
>
> Key: SPARK-43491
> URL: https://issues.apache.org/jira/browse/SPARK-43491
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.1
>Reporter: KuijianLiu
>Priority: Minor
> Attachments: image-2023-05-13-13-14-55-853.png, 
> image-2023-05-13-13-15-50-685.png
>
>
> The query results of Spark SQL 3.1.1  and Hive SQL 3.1.0 are inconsistent 
> with same sql. Spark SQL calculates `{{{}0 in ('00')`{}}} as false, which act 
> different from `{{{}=`{}}} keyword, but Hive calculates true. Hive is 
> compatible with the `{{{}in`{}}} keyword in 3.1.0, but SparkSQL does not.
> It's better  when dataTypes of elements in `{{{}In`{}}} expression are the 
> same, it should behaviour as same as BinaryComparison like ` {{{}EqualTo`{}}}.
> Test SQL:
> {code:java}
> scala> spark.sql("select 1 as test where 0 = '00'").show
> ++
> |test|
> ++
> |   1|
> ++
> scala> spark.sql("select 1 as test where 0 in ('00')").show
> ++
> |test|
> ++
> ++
> scala> spark.sql("select 1 as test where 0 = '00'").explain(true)
> == Parsed Logical Plan ==
> 'Project [1 AS test#23]
> +- 'Filter (0 = 00)
>    +- OneRowRelation== Analyzed Logical Plan ==
> test: int
> Project [1 AS test#23]
> +- Filter (0 = cast(00 as int))
>    +- OneRowRelation== Optimized Logical Plan ==
> Project [1 AS test#23]
> +- OneRowRelation== Physical Plan ==
> *(1) Project [1 AS test#23]
> +- *(1) Scan OneRowRelation[]
> scala> spark.sql("select 1 as test where 0 in ('00')").explain(true)
> == Parsed Logical Plan ==
> 'Project [1 AS test#25]
> +- 'Filter 0 IN (00)
>    +- OneRowRelation== Analyzed Logical Plan ==
> test: int
> Project [1 AS test#25]
> +- Filter cast(0 as string) IN (cast(00 as string))
>    +- OneRowRelation== Optimized Logical Plan ==
> LocalRelation , [test#25]== Physical Plan ==
> LocalTableScan , [test#25]
>  {code}
>  
> !image-2023-05-13-13-14-55-853.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org