[jira] [Assigned] (SPARK-42667) Spark Connect: newSession API

2023-03-03 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42667:


Assignee: Rui Wang  (was: Apache Spark)

> Spark Connect: newSession API
> -
>
> Key: SPARK-42667
> URL: https://issues.apache.org/jira/browse/SPARK-42667
> Project: Spark
>  Issue Type: Task
>  Components: Connect
>Affects Versions: 3.4.1
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42667) Spark Connect: newSession API

2023-03-03 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42667:


Assignee: Apache Spark  (was: Rui Wang)

> Spark Connect: newSession API
> -
>
> Key: SPARK-42667
> URL: https://issues.apache.org/jira/browse/SPARK-42667
> Project: Spark
>  Issue Type: Task
>  Components: Connect
>Affects Versions: 3.4.1
>Reporter: Rui Wang
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42662) Support `withSequenceColumn` as PySpark DataFrame internal function.

2023-03-03 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696317#comment-17696317
 ] 

Apache Spark commented on SPARK-42662:
--

User 'itholic' has created a pull request for this issue:
https://github.com/apache/spark/pull/40270

> Support `withSequenceColumn` as PySpark DataFrame internal function.
> 
>
> Key: SPARK-42662
> URL: https://issues.apache.org/jira/browse/SPARK-42662
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Pandas API on Spark, PySpark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Turn `withSequenceColumn` into PySpark internal API to support the 
> distributed-sequence index of the pandas API on Spark in Spark Connect as 
> well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42662) Support `withSequenceColumn` as PySpark DataFrame internal function.

2023-03-03 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42662:


Assignee: (was: Apache Spark)

> Support `withSequenceColumn` as PySpark DataFrame internal function.
> 
>
> Key: SPARK-42662
> URL: https://issues.apache.org/jira/browse/SPARK-42662
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Pandas API on Spark, PySpark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Turn `withSequenceColumn` into PySpark internal API to support the 
> distributed-sequence index of the pandas API on Spark in Spark Connect as 
> well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42662) Support `withSequenceColumn` as PySpark DataFrame internal function.

2023-03-03 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696314#comment-17696314
 ] 

Apache Spark commented on SPARK-42662:
--

User 'itholic' has created a pull request for this issue:
https://github.com/apache/spark/pull/40270

> Support `withSequenceColumn` as PySpark DataFrame internal function.
> 
>
> Key: SPARK-42662
> URL: https://issues.apache.org/jira/browse/SPARK-42662
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Pandas API on Spark, PySpark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Turn `withSequenceColumn` into PySpark internal API to support the 
> distributed-sequence index of the pandas API on Spark in Spark Connect as 
> well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42662) Support `withSequenceColumn` as PySpark DataFrame internal function.

2023-03-03 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42662:


Assignee: Apache Spark

> Support `withSequenceColumn` as PySpark DataFrame internal function.
> 
>
> Key: SPARK-42662
> URL: https://issues.apache.org/jira/browse/SPARK-42662
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Pandas API on Spark, PySpark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Assignee: Apache Spark
>Priority: Major
>
> Turn `withSequenceColumn` into PySpark internal API to support the 
> distributed-sequence index of the pandas API on Spark in Spark Connect as 
> well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42258) pyspark.sql.functions should not expose typing.cast

2023-03-03 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42258:


Assignee: Apache Spark

> pyspark.sql.functions should not expose typing.cast
> ---
>
> Key: SPARK-42258
> URL: https://issues.apache.org/jira/browse/SPARK-42258
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.3.1
>Reporter: Furcy Pin
>Assignee: Apache Spark
>Priority: Minor
>
> In pyspark, the `pyspark.sql.functions` modules imports and exposes the 
> method `typing.cast`.
> This may lead to errors from users that can be hard to spot.
> *Example*
> It took me a few minutes to understand why the following code:
>  
> {code:java}
> from pyspark.sql import SparkSession
> from pyspark.sql import functions as f
> spark = SparkSession.builder.getOrCreate()
> df = spark.sql("""SELECT 1 as a""")
> df.withColumn("a", f.cast("STRING", f.col("a"))).printSchema()  {code}
> which executes without any problem, gives the following result:
>  
>  
> {code:java}
> root
> |-- a: integer (nullable = false){code}
> This is because `f.cast` here calls `typing.cast, and the correct syntax is:
> {code:java}
> df.withColumn("a", f.col("a").cast("STRING")).printSchema(){code}
>  
> which indeed gives:
> {code:java}
> root
>  |-- a: string (nullable = false) {code}
> *Suggestion of solution*
> Option 1: The methods imported in the module `pyspark.sql.functions` could be 
> obfuscated to prevent this. For instance:
> {code:java}
> from typing import cast as _cast{code}
> Option 2: only import `typing` and replace all occurrences of `cast` with 
> `typing.cast`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42258) pyspark.sql.functions should not expose typing.cast

2023-03-03 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42258:


Assignee: (was: Apache Spark)

> pyspark.sql.functions should not expose typing.cast
> ---
>
> Key: SPARK-42258
> URL: https://issues.apache.org/jira/browse/SPARK-42258
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.3.1
>Reporter: Furcy Pin
>Priority: Minor
>
> In pyspark, the `pyspark.sql.functions` modules imports and exposes the 
> method `typing.cast`.
> This may lead to errors from users that can be hard to spot.
> *Example*
> It took me a few minutes to understand why the following code:
>  
> {code:java}
> from pyspark.sql import SparkSession
> from pyspark.sql import functions as f
> spark = SparkSession.builder.getOrCreate()
> df = spark.sql("""SELECT 1 as a""")
> df.withColumn("a", f.cast("STRING", f.col("a"))).printSchema()  {code}
> which executes without any problem, gives the following result:
>  
>  
> {code:java}
> root
> |-- a: integer (nullable = false){code}
> This is because `f.cast` here calls `typing.cast, and the correct syntax is:
> {code:java}
> df.withColumn("a", f.col("a").cast("STRING")).printSchema(){code}
>  
> which indeed gives:
> {code:java}
> root
>  |-- a: string (nullable = false) {code}
> *Suggestion of solution*
> Option 1: The methods imported in the module `pyspark.sql.functions` could be 
> obfuscated to prevent this. For instance:
> {code:java}
> from typing import cast as _cast{code}
> Option 2: only import `typing` and replace all occurrences of `cast` with 
> `typing.cast`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42258) pyspark.sql.functions should not expose typing.cast

2023-03-03 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696237#comment-17696237
 ] 

Apache Spark commented on SPARK-42258:
--

User 'FurcyPin' has created a pull request for this issue:
https://github.com/apache/spark/pull/40271

> pyspark.sql.functions should not expose typing.cast
> ---
>
> Key: SPARK-42258
> URL: https://issues.apache.org/jira/browse/SPARK-42258
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.3.1
>Reporter: Furcy Pin
>Priority: Minor
>
> In pyspark, the `pyspark.sql.functions` modules imports and exposes the 
> method `typing.cast`.
> This may lead to errors from users that can be hard to spot.
> *Example*
> It took me a few minutes to understand why the following code:
>  
> {code:java}
> from pyspark.sql import SparkSession
> from pyspark.sql import functions as f
> spark = SparkSession.builder.getOrCreate()
> df = spark.sql("""SELECT 1 as a""")
> df.withColumn("a", f.cast("STRING", f.col("a"))).printSchema()  {code}
> which executes without any problem, gives the following result:
>  
>  
> {code:java}
> root
> |-- a: integer (nullable = false){code}
> This is because `f.cast` here calls `typing.cast, and the correct syntax is:
> {code:java}
> df.withColumn("a", f.col("a").cast("STRING")).printSchema(){code}
>  
> which indeed gives:
> {code:java}
> root
>  |-- a: string (nullable = false) {code}
> *Suggestion of solution*
> Option 1: The methods imported in the module `pyspark.sql.functions` could be 
> obfuscated to prevent this. For instance:
> {code:java}
> from typing import cast as _cast{code}
> Option 2: only import `typing` and replace all occurrences of `cast` with 
> `typing.cast`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42497) Support of pandas API on Spark for Spark Connect.

2023-03-03 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42497:


Assignee: (was: Apache Spark)

> Support of pandas API on Spark for Spark Connect.
> -
>
> Key: SPARK-42497
> URL: https://issues.apache.org/jira/browse/SPARK-42497
> Project: Spark
>  Issue Type: Umbrella
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Priority: Major
>
> We should enable `pandas API on Spark` on Spark Connect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42497) Support of pandas API on Spark for Spark Connect.

2023-03-03 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696206#comment-17696206
 ] 

Apache Spark commented on SPARK-42497:
--

User 'itholic' has created a pull request for this issue:
https://github.com/apache/spark/pull/40270

> Support of pandas API on Spark for Spark Connect.
> -
>
> Key: SPARK-42497
> URL: https://issues.apache.org/jira/browse/SPARK-42497
> Project: Spark
>  Issue Type: Umbrella
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Priority: Major
>
> We should enable `pandas API on Spark` on Spark Connect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42497) Support of pandas API on Spark for Spark Connect.

2023-03-03 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42497:


Assignee: Apache Spark

> Support of pandas API on Spark for Spark Connect.
> -
>
> Key: SPARK-42497
> URL: https://issues.apache.org/jira/browse/SPARK-42497
> Project: Spark
>  Issue Type: Umbrella
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Assignee: Apache Spark
>Priority: Major
>
> We should enable `pandas API on Spark` on Spark Connect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42500) ConstantPropagation support more cases

2023-03-03 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696177#comment-17696177
 ] 

Apache Spark commented on SPARK-42500:
--

User 'peter-toth' has created a pull request for this issue:
https://github.com/apache/spark/pull/40268

> ConstantPropagation support more cases
> --
>
> Key: SPARK-42500
> URL: https://issues.apache.org/jira/browse/SPARK-42500
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Yuming Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42500) ConstantPropagation support more cases

2023-03-03 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696176#comment-17696176
 ] 

Apache Spark commented on SPARK-42500:
--

User 'peter-toth' has created a pull request for this issue:
https://github.com/apache/spark/pull/40268

> ConstantPropagation support more cases
> --
>
> Key: SPARK-42500
> URL: https://issues.apache.org/jira/browse/SPARK-42500
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Yuming Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42653) Artifact transfer from Scala/JVM client to Server

2023-03-03 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696144#comment-17696144
 ] 

Apache Spark commented on SPARK-42653:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/40267

> Artifact transfer from Scala/JVM client to Server
> -
>
> Key: SPARK-42653
> URL: https://issues.apache.org/jira/browse/SPARK-42653
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Venkata Sai Akhil Gudesa
>Assignee: Venkata Sai Akhil Gudesa
>Priority: Major
> Fix For: 3.4.1
>
>
> In the decoupled client-server architecture of Spark Connect, a remote client 
> may use a local JAR or a new class in their UDF that may not be present on 
> the server. To handle these cases of missing "artifacts", we need to 
> implement a mechanism to transfer artifacts from the client side over to the 
> server side as per the protocol defined in 
> https://github.com/apache/spark/pull/40147 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42660) Infer filters for Join produced by IN and EXISTS clause (RewritePredicateSubquery rule)

2023-03-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696059#comment-17696059
 ] 

Apache Spark commented on SPARK-42660:
--

User 'mskapilks' has created a pull request for this issue:
https://github.com/apache/spark/pull/40266

> Infer filters for Join produced by IN and EXISTS clause 
> (RewritePredicateSubquery rule)
> ---
>
> Key: SPARK-42660
> URL: https://issues.apache.org/jira/browse/SPARK-42660
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.1
>Reporter: Kapil Singh
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42660) Infer filters for Join produced by IN and EXISTS clause (RewritePredicateSubquery rule)

2023-03-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42660:


Assignee: Apache Spark

> Infer filters for Join produced by IN and EXISTS clause 
> (RewritePredicateSubquery rule)
> ---
>
> Key: SPARK-42660
> URL: https://issues.apache.org/jira/browse/SPARK-42660
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.1
>Reporter: Kapil Singh
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42660) Infer filters for Join produced by IN and EXISTS clause (RewritePredicateSubquery rule)

2023-03-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42660:


Assignee: (was: Apache Spark)

> Infer filters for Join produced by IN and EXISTS clause 
> (RewritePredicateSubquery rule)
> ---
>
> Key: SPARK-42660
> URL: https://issues.apache.org/jira/browse/SPARK-42660
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.1
>Reporter: Kapil Singh
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42556) Dataset.colregex should link a plan_id when it only matches a single column.

2023-03-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42556:


Assignee: Apache Spark

> Dataset.colregex should link a plan_id when it only matches a single column.
> 
>
> Key: SPARK-42556
> URL: https://issues.apache.org/jira/browse/SPARK-42556
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Assignee: Apache Spark
>Priority: Major
>
> When colregex returns a single column it should link the plans plan_id. For 
> reference here is the non-connect Dataset code that does this:
> [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L1512]
> This also needs to be fixed for the Python client.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42556) Dataset.colregex should link a plan_id when it only matches a single column.

2023-03-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42556:


Assignee: (was: Apache Spark)

> Dataset.colregex should link a plan_id when it only matches a single column.
> 
>
> Key: SPARK-42556
> URL: https://issues.apache.org/jira/browse/SPARK-42556
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Priority: Major
>
> When colregex returns a single column it should link the plans plan_id. For 
> reference here is the non-connect Dataset code that does this:
> [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L1512]
> This also needs to be fixed for the Python client.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42556) Dataset.colregex should link a plan_id when it only matches a single column.

2023-03-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696046#comment-17696046
 ] 

Apache Spark commented on SPARK-42556:
--

User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/40265

> Dataset.colregex should link a plan_id when it only matches a single column.
> 
>
> Key: SPARK-42556
> URL: https://issues.apache.org/jira/browse/SPARK-42556
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Priority: Major
>
> When colregex returns a single column it should link the plans plan_id. For 
> reference here is the non-connect Dataset code that does this:
> [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L1512]
> This also needs to be fixed for the Python client.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42635) Several counter-intuitive behaviours in the TimestampAdd expression

2023-03-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696042#comment-17696042
 ] 

Apache Spark commented on SPARK-42635:
--

User 'chenhao-db' has created a pull request for this issue:
https://github.com/apache/spark/pull/40264

> Several counter-intuitive behaviours in the TimestampAdd expression
> ---
>
> Key: SPARK-42635
> URL: https://issues.apache.org/jira/browse/SPARK-42635
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0, 3.3.1, 3.3.2
>Reporter: Chenhao Li
>Assignee: Chenhao Li
>Priority: Major
> Fix For: 3.4.1
>
>
> # When the time is close to daylight saving time transition, the result may 
> be discontinuous and not monotonic.
> We currently have:
> {code:scala}
> scala> spark.conf.set("spark.sql.session.timeZone", "America/Los_Angeles")
> scala> spark.sql("select timestampadd(second, 24 * 3600 - 1, 
> timestamp'2011-03-12 03:00:00')").show
> ++
> |timestampadd(second, ((24 * 3600) - 1), TIMESTAMP '2011-03-12 03:00:00')|
> ++
> | 2011-03-13 03:59:59|
> ++
> scala> spark.sql("select timestampadd(second, 24 * 3600, timestamp'2011-03-12 
> 03:00:00')").show
> +--+
> |timestampadd(second, (24 * 3600), TIMESTAMP '2011-03-12 03:00:00')|
> +--+
> |   2011-03-13 03:00:00|
> +--+ {code}
>  
> In the second query, adding one more second will set the time back one hour 
> instead. Plus, there are only {{23 * 3600}} seconds from {{2011-03-12 
> 03:00:00}} to {{2011-03-13 03:00:00}}, instead of {{24 * 3600}} seconds, due 
> to the daylight saving time transition.
> The root cause of the problem is the Spark code at 
> [https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L790]
>  wrongly assumes every day has {{MICROS_PER_DAY}} seconds, and does the day 
> and time-in-day split before looking at the timezone.
> 2. Adding month, quarter, and year silently ignores Int overflow during unit 
> conversion.
> The root cause is 
> [https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L1246].
>  {{quantity}} is multiplied by {{3}} or {{MONTHS_PER_YEAR}} without checking 
> overflow. Note that we do have overflow checking in adding the amount to the 
> timestamp, so the behavior is inconsistent.
> This can cause counter-intuitive results like this:
> {code:scala}
> scala> spark.sql("select timestampadd(quarter, 1431655764, 
> timestamp'1970-01-01')").show
> +--+
> |timestampadd(quarter, 1431655764, TIMESTAMP '1970-01-01 00:00:00')|
> +--+
> |   1969-09-01 00:00:00|
> +--+{code}
> 3. Adding sub-month units (week, day, hour, minute, second, millisecond, 
> microsecond)silently ignores Long overflow during unit conversion.
> This is similar to the previous problem:
> {code:scala}
>  scala> spark.sql("select timestampadd(day, 106751992, 
> timestamp'1970-01-01')").show(false)
> +-+
> |timestampadd(day, 106751992, TIMESTAMP '1970-01-01 00:00:00')|
> +-+
> |-290308-12-22 15:58:10.448384|
> +-+{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42659) Reimplement `FPGrowthModel.transform` with dataframe operations

2023-03-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42659:


Assignee: (was: Apache Spark)

> Reimplement `FPGrowthModel.transform` with dataframe operations
> ---
>
> Key: SPARK-42659
> URL: https://issues.apache.org/jira/browse/SPARK-42659
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 3.5.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42659) Reimplement `FPGrowthModel.transform` with dataframe operations

2023-03-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42659:


Assignee: Apache Spark

> Reimplement `FPGrowthModel.transform` with dataframe operations
> ---
>
> Key: SPARK-42659
> URL: https://issues.apache.org/jira/browse/SPARK-42659
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 3.5.0
>Reporter: Ruifeng Zheng
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42659) Reimplement `FPGrowthModel.transform` with dataframe operations

2023-03-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696019#comment-17696019
 ] 

Apache Spark commented on SPARK-42659:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/40263

> Reimplement `FPGrowthModel.transform` with dataframe operations
> ---
>
> Key: SPARK-42659
> URL: https://issues.apache.org/jira/browse/SPARK-42659
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 3.5.0
>Reporter: Ruifeng Zheng
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42659) Reimplement `FPGrowthModel.transform` with dataframe operations

2023-03-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696018#comment-17696018
 ] 

Apache Spark commented on SPARK-42659:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/40263

> Reimplement `FPGrowthModel.transform` with dataframe operations
> ---
>
> Key: SPARK-42659
> URL: https://issues.apache.org/jira/browse/SPARK-42659
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 3.5.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42651) Optimize global sort to driver sort

2023-03-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42651:


Assignee: (was: Apache Spark)

> Optimize global sort to driver sort
> ---
>
> Key: SPARK-42651
> URL: https://issues.apache.org/jira/browse/SPARK-42651
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: XiDuo You
>Priority: Major
>
> If the size of plan is small enough, it's more efficient to sort all rows at 
> driver side that saves one shuffle



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42651) Optimize global sort to driver sort

2023-03-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695991#comment-17695991
 ] 

Apache Spark commented on SPARK-42651:
--

User 'ulysses-you' has created a pull request for this issue:
https://github.com/apache/spark/pull/40262

> Optimize global sort to driver sort
> ---
>
> Key: SPARK-42651
> URL: https://issues.apache.org/jira/browse/SPARK-42651
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: XiDuo You
>Priority: Major
>
> If the size of plan is small enough, it's more efficient to sort all rows at 
> driver side that saves one shuffle



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42651) Optimize global sort to driver sort

2023-03-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42651:


Assignee: Apache Spark

> Optimize global sort to driver sort
> ---
>
> Key: SPARK-42651
> URL: https://issues.apache.org/jira/browse/SPARK-42651
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: XiDuo You
>Assignee: Apache Spark
>Priority: Major
>
> If the size of plan is small enough, it's more efficient to sort all rows at 
> driver side that saves one shuffle



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42651) Optimize global sort to driver sort

2023-03-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695992#comment-17695992
 ] 

Apache Spark commented on SPARK-42651:
--

User 'ulysses-you' has created a pull request for this issue:
https://github.com/apache/spark/pull/40262

> Optimize global sort to driver sort
> ---
>
> Key: SPARK-42651
> URL: https://issues.apache.org/jira/browse/SPARK-42651
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: XiDuo You
>Priority: Major
>
> If the size of plan is small enough, it's more efficient to sort all rows at 
> driver side that saves one shuffle



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42615) Refactor the AnalyzePlan RPC and add `session.version`

2023-03-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695943#comment-17695943
 ] 

Apache Spark commented on SPARK-42615:
--

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/40261

> Refactor the AnalyzePlan RPC and add `session.version`
> --
>
> Key: SPARK-42615
> URL: https://issues.apache.org/jira/browse/SPARK-42615
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
> Fix For: 3.4.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42630) Make `parse_data_type` use new proto message `DDLParse`

2023-03-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695928#comment-17695928
 ] 

Apache Spark commented on SPARK-42630:
--

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/40260

> Make `parse_data_type` use new proto message `DDLParse`
> ---
>
> Key: SPARK-42630
> URL: https://issues.apache.org/jira/browse/SPARK-42630
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42630) Make `parse_data_type` use new proto message `DDLParse`

2023-03-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695927#comment-17695927
 ] 

Apache Spark commented on SPARK-42630:
--

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/40260

> Make `parse_data_type` use new proto message `DDLParse`
> ---
>
> Key: SPARK-42630
> URL: https://issues.apache.org/jira/browse/SPARK-42630
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42609) Add tests for grouping() and grouping_id() functions

2023-03-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42609:


Assignee: Apache Spark  (was: Rui Wang)

> Add tests for grouping() and grouping_id() functions
> 
>
> Key: SPARK-42609
> URL: https://issues.apache.org/jira/browse/SPARK-42609
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42609) Add tests for grouping() and grouping_id() functions

2023-03-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42609:


Assignee: Rui Wang  (was: Apache Spark)

> Add tests for grouping() and grouping_id() functions
> 
>
> Key: SPARK-42609
> URL: https://issues.apache.org/jira/browse/SPARK-42609
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42609) Add tests for grouping() and grouping_id() functions

2023-03-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695912#comment-17695912
 ] 

Apache Spark commented on SPARK-42609:
--

User 'amaliujia' has created a pull request for this issue:
https://github.com/apache/spark/pull/40259

> Add tests for grouping() and grouping_id() functions
> 
>
> Key: SPARK-42609
> URL: https://issues.apache.org/jira/browse/SPARK-42609
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42655) Incorrect ambiguous column reference error

2023-03-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42655:


Assignee: Apache Spark

> Incorrect ambiguous column reference error
> --
>
> Key: SPARK-42655
> URL: https://issues.apache.org/jira/browse/SPARK-42655
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: Shrikant Prasad
>Assignee: Apache Spark
>Priority: Major
>
> val df1 = 
> sc.parallelize(List((1,2,3,4,5),(1,2,3,4,5))).toDF("id","col2","col3","col4", 
> "col5")
> val op_cols_same_case = List("id","col2","col3","col4", "col5", "id")
> val df2 = df1.select(op_cols_same_case.head, op_cols_same_case.tail: _*)
> df2.select("id").show()
>  
> This query runs fine.
>  
> But when we change the casing of the op_cols to have mix of upper & lower 
> case ("id" & "ID") it throws an ambiguous col ref error:
>  
> val df1 = 
> sc.parallelize(List((1,2,3,4,5),(1,2,3,4,5))).toDF("id","col2","col3","col4", 
> "col5")
> val op_cols_mixed_case = List("id","col2","col3","col4", "col5", "ID")
> val df3 = df1.select(op_cols_mixed_case.head, op_cols_mixed_case.tail: _*)
> df3.select("id").show()
> org.apache.spark.sql.AnalysisException: Reference 'id' is ambiguous, could 
> be: id, id.
>   at 
> org.apache.spark.sql.catalyst.expressions.package$AttributeSeq.resolve(package.scala:363)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveChildren(LogicalPlan.scala:112)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$resolveExpressionByPlanChildren$1(Analyzer.scala:1857)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$resolveExpression$2(Analyzer.scala:1787)
>   at 
> org.apache.spark.sql.catalyst.analysis.package$.withPosition(package.scala:60)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.innerResolve$1(Analyzer.scala:1794)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.resolveExpression(Analyzer.scala:1812)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.resolveExpressionByPlanChildren(Analyzer.scala:1863)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$apply$17.$anonfun$applyOrElse$94(Analyzer.scala:1577)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:193)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:193)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:204)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$3(QueryPlan.scala:209)
>   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
>   at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
>   at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
>   at scala.collection.TraversableLike.map(TraversableLike.scala:286)
>   at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:108)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:209)
>  
> Since, Spark is case insensitive, it should work for second case also when we 
> have upper and lower case column names in the column list.
> It also works fine in Spark 2.3.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42655) Incorrect ambiguous column reference error

2023-03-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695892#comment-17695892
 ] 

Apache Spark commented on SPARK-42655:
--

User 'shrprasa' has created a pull request for this issue:
https://github.com/apache/spark/pull/40258

> Incorrect ambiguous column reference error
> --
>
> Key: SPARK-42655
> URL: https://issues.apache.org/jira/browse/SPARK-42655
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: Shrikant Prasad
>Priority: Major
>
> val df1 = 
> sc.parallelize(List((1,2,3,4,5),(1,2,3,4,5))).toDF("id","col2","col3","col4", 
> "col5")
> val op_cols_same_case = List("id","col2","col3","col4", "col5", "id")
> val df2 = df1.select(op_cols_same_case.head, op_cols_same_case.tail: _*)
> df2.select("id").show()
>  
> This query runs fine.
>  
> But when we change the casing of the op_cols to have mix of upper & lower 
> case ("id" & "ID") it throws an ambiguous col ref error:
>  
> val df1 = 
> sc.parallelize(List((1,2,3,4,5),(1,2,3,4,5))).toDF("id","col2","col3","col4", 
> "col5")
> val op_cols_mixed_case = List("id","col2","col3","col4", "col5", "ID")
> val df3 = df1.select(op_cols_mixed_case.head, op_cols_mixed_case.tail: _*)
> df3.select("id").show()
> org.apache.spark.sql.AnalysisException: Reference 'id' is ambiguous, could 
> be: id, id.
>   at 
> org.apache.spark.sql.catalyst.expressions.package$AttributeSeq.resolve(package.scala:363)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveChildren(LogicalPlan.scala:112)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$resolveExpressionByPlanChildren$1(Analyzer.scala:1857)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$resolveExpression$2(Analyzer.scala:1787)
>   at 
> org.apache.spark.sql.catalyst.analysis.package$.withPosition(package.scala:60)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.innerResolve$1(Analyzer.scala:1794)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.resolveExpression(Analyzer.scala:1812)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.resolveExpressionByPlanChildren(Analyzer.scala:1863)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$apply$17.$anonfun$applyOrElse$94(Analyzer.scala:1577)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:193)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:193)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:204)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$3(QueryPlan.scala:209)
>   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
>   at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
>   at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
>   at scala.collection.TraversableLike.map(TraversableLike.scala:286)
>   at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:108)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:209)
>  
> Since, Spark is case insensitive, it should work for second case also when we 
> have upper and lower case column names in the column list.
> It also works fine in Spark 2.3.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42655) Incorrect ambiguous column reference error

2023-03-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42655:


Assignee: (was: Apache Spark)

> Incorrect ambiguous column reference error
> --
>
> Key: SPARK-42655
> URL: https://issues.apache.org/jira/browse/SPARK-42655
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: Shrikant Prasad
>Priority: Major
>
> val df1 = 
> sc.parallelize(List((1,2,3,4,5),(1,2,3,4,5))).toDF("id","col2","col3","col4", 
> "col5")
> val op_cols_same_case = List("id","col2","col3","col4", "col5", "id")
> val df2 = df1.select(op_cols_same_case.head, op_cols_same_case.tail: _*)
> df2.select("id").show()
>  
> This query runs fine.
>  
> But when we change the casing of the op_cols to have mix of upper & lower 
> case ("id" & "ID") it throws an ambiguous col ref error:
>  
> val df1 = 
> sc.parallelize(List((1,2,3,4,5),(1,2,3,4,5))).toDF("id","col2","col3","col4", 
> "col5")
> val op_cols_mixed_case = List("id","col2","col3","col4", "col5", "ID")
> val df3 = df1.select(op_cols_mixed_case.head, op_cols_mixed_case.tail: _*)
> df3.select("id").show()
> org.apache.spark.sql.AnalysisException: Reference 'id' is ambiguous, could 
> be: id, id.
>   at 
> org.apache.spark.sql.catalyst.expressions.package$AttributeSeq.resolve(package.scala:363)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveChildren(LogicalPlan.scala:112)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$resolveExpressionByPlanChildren$1(Analyzer.scala:1857)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$resolveExpression$2(Analyzer.scala:1787)
>   at 
> org.apache.spark.sql.catalyst.analysis.package$.withPosition(package.scala:60)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.innerResolve$1(Analyzer.scala:1794)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.resolveExpression(Analyzer.scala:1812)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.resolveExpressionByPlanChildren(Analyzer.scala:1863)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$apply$17.$anonfun$applyOrElse$94(Analyzer.scala:1577)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:193)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:193)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:204)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$3(QueryPlan.scala:209)
>   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
>   at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
>   at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
>   at scala.collection.TraversableLike.map(TraversableLike.scala:286)
>   at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:108)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:209)
>  
> Since, Spark is case insensitive, it should work for second case also when we 
> have upper and lower case column names in the column list.
> It also works fine in Spark 2.3.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42656) Spark Connect Scala Client Shell Script

2023-03-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42656:


Assignee: (was: Apache Spark)

> Spark Connect Scala Client Shell Script
> ---
>
> Key: SPARK-42656
> URL: https://issues.apache.org/jira/browse/SPARK-42656
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Zhen Li
>Priority: Major
>
> Adding a shell script to run scala client in a scala REPL to allow users to 
> connect to spark connect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42656) Spark Connect Scala Client Shell Script

2023-03-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42656:


Assignee: Apache Spark

> Spark Connect Scala Client Shell Script
> ---
>
> Key: SPARK-42656
> URL: https://issues.apache.org/jira/browse/SPARK-42656
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Zhen Li
>Assignee: Apache Spark
>Priority: Major
>
> Adding a shell script to run scala client in a scala REPL to allow users to 
> connect to spark connect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42656) Spark Connect Scala Client Shell Script

2023-03-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695879#comment-17695879
 ] 

Apache Spark commented on SPARK-42656:
--

User 'zhenlineo' has created a pull request for this issue:
https://github.com/apache/spark/pull/40257

> Spark Connect Scala Client Shell Script
> ---
>
> Key: SPARK-42656
> URL: https://issues.apache.org/jira/browse/SPARK-42656
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Zhen Li
>Priority: Major
>
> Adding a shell script to run scala client in a scala REPL to allow users to 
> connect to spark connect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42653) Artifact transfer from Scala/JVM client to Server

2023-03-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695820#comment-17695820
 ] 

Apache Spark commented on SPARK-42653:
--

User 'vicennial' has created a pull request for this issue:
https://github.com/apache/spark/pull/40256

> Artifact transfer from Scala/JVM client to Server
> -
>
> Key: SPARK-42653
> URL: https://issues.apache.org/jira/browse/SPARK-42653
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Venkata Sai Akhil Gudesa
>Priority: Major
>
> In the decoupled client-server architecture of Spark Connect, a remote client 
> may use a local JAR or a new class in their UDF that may not be present on 
> the server. To handle these cases of missing "artifacts", we need to 
> implement a mechanism to transfer artifacts from the client side over to the 
> server side as per the protocol defined in 
> https://github.com/apache/spark/pull/40147 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42653) Artifact transfer from Scala/JVM client to Server

2023-03-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42653:


Assignee: (was: Apache Spark)

> Artifact transfer from Scala/JVM client to Server
> -
>
> Key: SPARK-42653
> URL: https://issues.apache.org/jira/browse/SPARK-42653
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Venkata Sai Akhil Gudesa
>Priority: Major
>
> In the decoupled client-server architecture of Spark Connect, a remote client 
> may use a local JAR or a new class in their UDF that may not be present on 
> the server. To handle these cases of missing "artifacts", we need to 
> implement a mechanism to transfer artifacts from the client side over to the 
> server side as per the protocol defined in 
> https://github.com/apache/spark/pull/40147 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42653) Artifact transfer from Scala/JVM client to Server

2023-03-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42653:


Assignee: Apache Spark

> Artifact transfer from Scala/JVM client to Server
> -
>
> Key: SPARK-42653
> URL: https://issues.apache.org/jira/browse/SPARK-42653
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Venkata Sai Akhil Gudesa
>Assignee: Apache Spark
>Priority: Major
>
> In the decoupled client-server architecture of Spark Connect, a remote client 
> may use a local JAR or a new class in their UDF that may not be present on 
> the server. To handle these cases of missing "artifacts", we need to 
> implement a mechanism to transfer artifacts from the client side over to the 
> server side as per the protocol defined in 
> https://github.com/apache/spark/pull/40147 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42653) Artifact transfer from Scala/JVM client to Server

2023-03-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695819#comment-17695819
 ] 

Apache Spark commented on SPARK-42653:
--

User 'vicennial' has created a pull request for this issue:
https://github.com/apache/spark/pull/40256

> Artifact transfer from Scala/JVM client to Server
> -
>
> Key: SPARK-42653
> URL: https://issues.apache.org/jira/browse/SPARK-42653
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Venkata Sai Akhil Gudesa
>Priority: Major
>
> In the decoupled client-server architecture of Spark Connect, a remote client 
> may use a local JAR or a new class in their UDF that may not be present on 
> the server. To handle these cases of missing "artifacts", we need to 
> implement a mechanism to transfer artifacts from the client side over to the 
> server side as per the protocol defined in 
> https://github.com/apache/spark/pull/40147 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42654) Upgrade dropwizard metrics 4.2.17

2023-03-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42654:


Assignee: Apache Spark

> Upgrade dropwizard metrics 4.2.17
> -
>
> Key: SPARK-42654
> URL: https://issues.apache.org/jira/browse/SPARK-42654
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Major
>
> * [https://github.com/dropwizard/metrics/releases/tag/v4.2.16]
>  * [https://github.com/dropwizard/metrics/releases/tag/v4.2.17]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42558) Implement DataFrameStatFunctions

2023-03-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42558:


Assignee: Apache Spark

> Implement DataFrameStatFunctions
> 
>
> Key: SPARK-42558
> URL: https://issues.apache.org/jira/browse/SPARK-42558
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Assignee: Apache Spark
>Priority: Major
>
> Implement DataFrameStatFunctions for connect, and hook it up to Dataset.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42558) Implement DataFrameStatFunctions

2023-03-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695816#comment-17695816
 ] 

Apache Spark commented on SPARK-42558:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/40255

> Implement DataFrameStatFunctions
> 
>
> Key: SPARK-42558
> URL: https://issues.apache.org/jira/browse/SPARK-42558
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Priority: Major
>
> Implement DataFrameStatFunctions for connect, and hook it up to Dataset.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42654) Upgrade dropwizard metrics 4.2.17

2023-03-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695814#comment-17695814
 ] 

Apache Spark commented on SPARK-42654:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/40254

> Upgrade dropwizard metrics 4.2.17
> -
>
> Key: SPARK-42654
> URL: https://issues.apache.org/jira/browse/SPARK-42654
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Priority: Major
>
> * [https://github.com/dropwizard/metrics/releases/tag/v4.2.16]
>  * [https://github.com/dropwizard/metrics/releases/tag/v4.2.17]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42558) Implement DataFrameStatFunctions

2023-03-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42558:


Assignee: (was: Apache Spark)

> Implement DataFrameStatFunctions
> 
>
> Key: SPARK-42558
> URL: https://issues.apache.org/jira/browse/SPARK-42558
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Priority: Major
>
> Implement DataFrameStatFunctions for connect, and hook it up to Dataset.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42654) Upgrade dropwizard metrics 4.2.17

2023-03-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42654:


Assignee: (was: Apache Spark)

> Upgrade dropwizard metrics 4.2.17
> -
>
> Key: SPARK-42654
> URL: https://issues.apache.org/jira/browse/SPARK-42654
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Priority: Major
>
> * [https://github.com/dropwizard/metrics/releases/tag/v4.2.16]
>  * [https://github.com/dropwizard/metrics/releases/tag/v4.2.17]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42553) NonReserved keyword "interval" can't be column name

2023-03-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695696#comment-17695696
 ] 

Apache Spark commented on SPARK-42553:
--

User 'jiang13021' has created a pull request for this issue:
https://github.com/apache/spark/pull/40253

> NonReserved keyword "interval" can't be column name
> ---
>
> Key: SPARK-42553
> URL: https://issues.apache.org/jira/browse/SPARK-42553
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0, 3.3.1, 3.2.3, 3.3.2
> Environment: Scala version 2.12.15 (OpenJDK 64-Bit Server VM, Java 
> 1.8.0_345)
> Spark version 3.2.3-SNAPSHOT
>Reporter: jiang13021
>Assignee: jiang13021
>Priority: Major
> Fix For: 3.4.1
>
>
> INTERVAL is a Non-Reserved keyword in spark. "Non-Reserved keywords" have a 
> special meaning in particular contexts and can be used as identifiers in 
> other contexts. So by design, interval can be used as a column name.
> {code:java}
> scala> spark.sql("select interval from mytable")
> org.apache.spark.sql.catalyst.parser.ParseException:
> at least one time unit should be given for interval literal(line 1, pos 7)== 
> SQL ==
> select interval from mytable
> ---^^^  at 
> org.apache.spark.sql.errors.QueryParsingErrors$.invalidIntervalLiteralError(QueryParsingErrors.scala:196)
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.$anonfun$parseIntervalLiteral$1(AstBuilder.scala:2481)
>   at 
> org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:133)
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.parseIntervalLiteral(AstBuilder.scala:2466)
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.$anonfun$visitInterval$1(AstBuilder.scala:2432)
>   at 
> org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:133)
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.visitInterval(AstBuilder.scala:2431)
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.visitInterval(AstBuilder.scala:57)
>   at 
> org.apache.spark.sql.catalyst.parser.SqlBaseParser$IntervalContext.accept(SqlBaseParser.java:17308)
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.visitChildren(AstBuilder.scala:71)
>   at 
> org.apache.spark.sql.catalyst.parser.SqlBaseBaseVisitor.visitIntervalLiteral(SqlBaseBaseVisitor.java:1581)
>   at 
> org.apache.spark.sql.catalyst.parser.SqlBaseParser$IntervalLiteralContext.accept(SqlBaseParser.java:16929)
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.visitChildren(AstBuilder.scala:71)
>   at 
> org.apache.spark.sql.catalyst.parser.SqlBaseBaseVisitor.visitConstantDefault(SqlBaseBaseVisitor.java:1511)
>   at 
> org.apache.spark.sql.catalyst.parser.SqlBaseParser$ConstantDefaultContext.accept(SqlBaseParser.java:15905)
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.visitChildren(AstBuilder.scala:71)
>   at 
> org.apache.spark.sql.catalyst.parser.SqlBaseBaseVisitor.visitValueExpressionDefault(SqlBaseBaseVisitor.java:1392)
>   at 
> org.apache.spark.sql.catalyst.parser.SqlBaseParser$ValueExpressionDefaultContext.accept(SqlBaseParser.java:15298)
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.typedVisit(AstBuilder.scala:61)
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.expression(AstBuilder.scala:1412)
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.$anonfun$visitPredicated$1(AstBuilder.scala:1548)
>   at 
> org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:133)
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.visitPredicated(AstBuilder.scala:1547)
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.visitPredicated(AstBuilder.scala:57)
>   at 
> org.apache.spark.sql.catalyst.parser.SqlBaseParser$PredicatedContext.accept(SqlBaseParser.java:14745)
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.visitChildren(AstBuilder.scala:71)
>   at 
> org.apache.spark.sql.catalyst.parser.SqlBaseBaseVisitor.visitExpression(SqlBaseBaseVisitor.java:1343)
>   at 
> org.apache.spark.sql.catalyst.parser.SqlBaseParser$ExpressionContext.accept(SqlBaseParser.java:14606)
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.typedVisit(AstBuilder.scala:61)
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.expression(AstBuilder.scala:1412)
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.$anonfun$visitNamedExpression$1(AstBuilder.scala:1434)
>   at 
> org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:133)
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.visitNamedExpression(AstBuilder.scala:1433)
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.visitNamedExpression(AstBuilder.scala:57)
>   at 
> 

[jira] [Commented] (SPARK-42555) Add JDBC to DataFrameReader

2023-03-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695691#comment-17695691
 ] 

Apache Spark commented on SPARK-42555:
--

User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/40252

> Add JDBC to DataFrameReader
> ---
>
> Key: SPARK-42555
> URL: https://issues.apache.org/jira/browse/SPARK-42555
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42555) Add JDBC to DataFrameReader

2023-03-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42555:


Assignee: Apache Spark

> Add JDBC to DataFrameReader
> ---
>
> Key: SPARK-42555
> URL: https://issues.apache.org/jira/browse/SPARK-42555
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42555) Add JDBC to DataFrameReader

2023-03-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42555:


Assignee: (was: Apache Spark)

> Add JDBC to DataFrameReader
> ---
>
> Key: SPARK-42555
> URL: https://issues.apache.org/jira/browse/SPARK-42555
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41725) Remove the workaround of sql(...).collect back in PySpark tests

2023-03-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695643#comment-17695643
 ] 

Apache Spark commented on SPARK-41725:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/40251

> Remove the workaround of sql(...).collect back in PySpark tests
> ---
>
> Key: SPARK-41725
> URL: https://issues.apache.org/jira/browse/SPARK-41725
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark, Tests
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
> Fix For: 3.4.0
>
>
> See https://github.com/apache/spark/pull/39224/files#r1057436437
> We don't have to `collect` for every `sql` but Spark Connect requires it. We 
> should remove them out.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42649) Remove the standard Apache License header from the top of third-party source files

2023-03-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42649:


Assignee: Apache Spark

> Remove the standard Apache License header from the top of third-party source 
> files
> --
>
> Key: SPARK-42649
> URL: https://issues.apache.org/jira/browse/SPARK-42649
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.1, 1.2.2, 1.3.1, 1.4.1, 1.5.2, 1.6.3, 2.1.3, 2.2.3, 
> 2.3.4, 2.4.8, 3.0.3, 3.1.3, 3.2.3, 3.3.2, 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42649) Remove the standard Apache License header from the top of third-party source files

2023-03-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42649:


Assignee: (was: Apache Spark)

> Remove the standard Apache License header from the top of third-party source 
> files
> --
>
> Key: SPARK-42649
> URL: https://issues.apache.org/jira/browse/SPARK-42649
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.1, 1.2.2, 1.3.1, 1.4.1, 1.5.2, 1.6.3, 2.1.3, 2.2.3, 
> 2.3.4, 2.4.8, 3.0.3, 3.1.3, 3.2.3, 3.3.2, 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42648) Upgrade versions-maven-plugin to 2.15.0

2023-03-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42648:


Assignee: Apache Spark

> Upgrade versions-maven-plugin to 2.15.0
> ---
>
> Key: SPARK-42648
> URL: https://issues.apache.org/jira/browse/SPARK-42648
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Major
>
> https://github.com/mojohaus/versions/releases/tag/2.15.0



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42649) Remove the standard Apache License header from the top of third-party source files

2023-03-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695547#comment-17695547
 ] 

Apache Spark commented on SPARK-42649:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/40249

> Remove the standard Apache License header from the top of third-party source 
> files
> --
>
> Key: SPARK-42649
> URL: https://issues.apache.org/jira/browse/SPARK-42649
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.1, 1.2.2, 1.3.1, 1.4.1, 1.5.2, 1.6.3, 2.1.3, 2.2.3, 
> 2.3.4, 2.4.8, 3.0.3, 3.1.3, 3.2.3, 3.3.2, 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42648) Upgrade versions-maven-plugin to 2.15.0

2023-03-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42648:


Assignee: (was: Apache Spark)

> Upgrade versions-maven-plugin to 2.15.0
> ---
>
> Key: SPARK-42648
> URL: https://issues.apache.org/jira/browse/SPARK-42648
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Priority: Major
>
> https://github.com/mojohaus/versions/releases/tag/2.15.0



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42648) Upgrade versions-maven-plugin to 2.15.0

2023-03-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695545#comment-17695545
 ] 

Apache Spark commented on SPARK-42648:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/40248

> Upgrade versions-maven-plugin to 2.15.0
> ---
>
> Key: SPARK-42648
> URL: https://issues.apache.org/jira/browse/SPARK-42648
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Priority: Major
>
> https://github.com/mojohaus/versions/releases/tag/2.15.0



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42642) Make Python the first code example tab in the Spark documentation

2023-03-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42642:


Assignee: (was: Apache Spark)

> Make Python the first code example tab in the Spark documentation
> -
>
> Key: SPARK-42642
> URL: https://issues.apache.org/jira/browse/SPARK-42642
> Project: Spark
>  Issue Type: Documentation
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: Allan Folting
>Priority: Major
> Attachments: Screenshot 2023-03-01 at 8.10.08 PM.png, Screenshot 
> 2023-03-01 at 8.10.22 PM.png
>
>
> Python is the most approachable and most popular language so it should be the 
> default language in code examples so this makes Python the first code example 
> tab consistently across the documentation, where applicable.
> This is continuing the work started with:
> https://issues.apache.org/jira/browse/SPARK-42493
> where these two pages were updated:
> [https://spark.apache.org/docs/latest/sql-getting-started.html]
> [https://spark.apache.org/docs/latest/sql-data-sources-load-save-functions.html]
>  
> Pages being updated now:
> [https://spark.apache.org/docs/latest/ml-classification-regression.html]
> [https://spark.apache.org/docs/latest/ml-clustering.html]
> [https://spark.apache.org/docs/latest/ml-collaborative-filtering.html]
> [https://spark.apache.org/docs/latest/ml-datasource.html]
> [https://spark.apache.org/docs/latest/ml-features.html]
> [https://spark.apache.org/docs/latest/ml-frequent-pattern-mining.html]
> [https://spark.apache.org/docs/latest/ml-migration-guide.html]
> [https://spark.apache.org/docs/latest/ml-pipeline.html]
> [https://spark.apache.org/docs/latest/ml-statistics.html]
> [https://spark.apache.org/docs/latest/ml-tuning.html]
>  
> [https://spark.apache.org/docs/latest/mllib-clustering.html]
> [https://spark.apache.org/docs/latest/mllib-collaborative-filtering.html]
> [https://spark.apache.org/docs/latest/mllib-data-types.html]
> [https://spark.apache.org/docs/latest/mllib-decision-tree.html]
> [https://spark.apache.org/docs/latest/mllib-dimensionality-reduction.html]
> [https://spark.apache.org/docs/latest/mllib-ensembles.html]
> [https://spark.apache.org/docs/latest/mllib-evaluation-metrics.html]
> [https://spark.apache.org/docs/latest/mllib-feature-extraction.html]
> [https://spark.apache.org/docs/latest/mllib-frequent-pattern-mining.html]
> [https://spark.apache.org/docs/latest/mllib-isotonic-regression.html]
> [https://spark.apache.org/docs/latest/mllib-linear-methods.html]
> [https://spark.apache.org/docs/latest/mllib-naive-bayes.html]
> [https://spark.apache.org/docs/latest/mllib-statistics.html]
>  
> [https://spark.apache.org/docs/latest/quick-start.html]
>  
> [https://spark.apache.org/docs/latest/rdd-programming-guide.html]
>  
> [https://spark.apache.org/docs/latest/sql-data-sources-avro.html]
> [https://spark.apache.org/docs/latest/sql-data-sources-binaryFile.html]
> [https://spark.apache.org/docs/latest/sql-data-sources-csv.html]
> [https://spark.apache.org/docs/latest/sql-data-sources-generic-options.html]
> [https://spark.apache.org/docs/latest/sql-data-sources-hive-tables.html]
> [https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html]
> [https://spark.apache.org/docs/latest/sql-data-sources-json.html]
> [https://spark.apache.org/docs/latest/sql-data-sources-parquet.html]
> sql-data-sources-protobuf.html
> [https://spark.apache.org/docs/latest/sql-data-sources-text.html]
> [https://spark.apache.org/docs/latest/sql-migration-guide.html]
> [https://spark.apache.org/docs/latest/sql-performance-tuning.html]
> [https://spark.apache.org/docs/latest/sql-ref-datatypes.html]
>  
> [https://spark.apache.org/docs/latest/streaming-kinesis-integration.html]
> [https://spark.apache.org/docs/latest/streaming-programming-guide.html]
>  
> [https://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html]
> [https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html]
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42642) Make Python the first code example tab in the Spark documentation

2023-03-02 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42642:


Assignee: Apache Spark

> Make Python the first code example tab in the Spark documentation
> -
>
> Key: SPARK-42642
> URL: https://issues.apache.org/jira/browse/SPARK-42642
> Project: Spark
>  Issue Type: Documentation
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: Allan Folting
>Assignee: Apache Spark
>Priority: Major
> Attachments: Screenshot 2023-03-01 at 8.10.08 PM.png, Screenshot 
> 2023-03-01 at 8.10.22 PM.png
>
>
> Python is the most approachable and most popular language so it should be the 
> default language in code examples so this makes Python the first code example 
> tab consistently across the documentation, where applicable.
> This is continuing the work started with:
> https://issues.apache.org/jira/browse/SPARK-42493
> where these two pages were updated:
> [https://spark.apache.org/docs/latest/sql-getting-started.html]
> [https://spark.apache.org/docs/latest/sql-data-sources-load-save-functions.html]
>  
> Pages being updated now:
> [https://spark.apache.org/docs/latest/ml-classification-regression.html]
> [https://spark.apache.org/docs/latest/ml-clustering.html]
> [https://spark.apache.org/docs/latest/ml-collaborative-filtering.html]
> [https://spark.apache.org/docs/latest/ml-datasource.html]
> [https://spark.apache.org/docs/latest/ml-features.html]
> [https://spark.apache.org/docs/latest/ml-frequent-pattern-mining.html]
> [https://spark.apache.org/docs/latest/ml-migration-guide.html]
> [https://spark.apache.org/docs/latest/ml-pipeline.html]
> [https://spark.apache.org/docs/latest/ml-statistics.html]
> [https://spark.apache.org/docs/latest/ml-tuning.html]
>  
> [https://spark.apache.org/docs/latest/mllib-clustering.html]
> [https://spark.apache.org/docs/latest/mllib-collaborative-filtering.html]
> [https://spark.apache.org/docs/latest/mllib-data-types.html]
> [https://spark.apache.org/docs/latest/mllib-decision-tree.html]
> [https://spark.apache.org/docs/latest/mllib-dimensionality-reduction.html]
> [https://spark.apache.org/docs/latest/mllib-ensembles.html]
> [https://spark.apache.org/docs/latest/mllib-evaluation-metrics.html]
> [https://spark.apache.org/docs/latest/mllib-feature-extraction.html]
> [https://spark.apache.org/docs/latest/mllib-frequent-pattern-mining.html]
> [https://spark.apache.org/docs/latest/mllib-isotonic-regression.html]
> [https://spark.apache.org/docs/latest/mllib-linear-methods.html]
> [https://spark.apache.org/docs/latest/mllib-naive-bayes.html]
> [https://spark.apache.org/docs/latest/mllib-statistics.html]
>  
> [https://spark.apache.org/docs/latest/quick-start.html]
>  
> [https://spark.apache.org/docs/latest/rdd-programming-guide.html]
>  
> [https://spark.apache.org/docs/latest/sql-data-sources-avro.html]
> [https://spark.apache.org/docs/latest/sql-data-sources-binaryFile.html]
> [https://spark.apache.org/docs/latest/sql-data-sources-csv.html]
> [https://spark.apache.org/docs/latest/sql-data-sources-generic-options.html]
> [https://spark.apache.org/docs/latest/sql-data-sources-hive-tables.html]
> [https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html]
> [https://spark.apache.org/docs/latest/sql-data-sources-json.html]
> [https://spark.apache.org/docs/latest/sql-data-sources-parquet.html]
> sql-data-sources-protobuf.html
> [https://spark.apache.org/docs/latest/sql-data-sources-text.html]
> [https://spark.apache.org/docs/latest/sql-migration-guide.html]
> [https://spark.apache.org/docs/latest/sql-performance-tuning.html]
> [https://spark.apache.org/docs/latest/sql-ref-datatypes.html]
>  
> [https://spark.apache.org/docs/latest/streaming-kinesis-integration.html]
> [https://spark.apache.org/docs/latest/streaming-programming-guide.html]
>  
> [https://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html]
> [https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html]
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42642) Make Python the first code example tab in the Spark documentation

2023-03-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695544#comment-17695544
 ] 

Apache Spark commented on SPARK-42642:
--

User 'allanf-db' has created a pull request for this issue:
https://github.com/apache/spark/pull/40250

> Make Python the first code example tab in the Spark documentation
> -
>
> Key: SPARK-42642
> URL: https://issues.apache.org/jira/browse/SPARK-42642
> Project: Spark
>  Issue Type: Documentation
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: Allan Folting
>Priority: Major
> Attachments: Screenshot 2023-03-01 at 8.10.08 PM.png, Screenshot 
> 2023-03-01 at 8.10.22 PM.png
>
>
> Python is the most approachable and most popular language so it should be the 
> default language in code examples so this makes Python the first code example 
> tab consistently across the documentation, where applicable.
> This is continuing the work started with:
> https://issues.apache.org/jira/browse/SPARK-42493
> where these two pages were updated:
> [https://spark.apache.org/docs/latest/sql-getting-started.html]
> [https://spark.apache.org/docs/latest/sql-data-sources-load-save-functions.html]
>  
> Pages being updated now:
> [https://spark.apache.org/docs/latest/ml-classification-regression.html]
> [https://spark.apache.org/docs/latest/ml-clustering.html]
> [https://spark.apache.org/docs/latest/ml-collaborative-filtering.html]
> [https://spark.apache.org/docs/latest/ml-datasource.html]
> [https://spark.apache.org/docs/latest/ml-features.html]
> [https://spark.apache.org/docs/latest/ml-frequent-pattern-mining.html]
> [https://spark.apache.org/docs/latest/ml-migration-guide.html]
> [https://spark.apache.org/docs/latest/ml-pipeline.html]
> [https://spark.apache.org/docs/latest/ml-statistics.html]
> [https://spark.apache.org/docs/latest/ml-tuning.html]
>  
> [https://spark.apache.org/docs/latest/mllib-clustering.html]
> [https://spark.apache.org/docs/latest/mllib-collaborative-filtering.html]
> [https://spark.apache.org/docs/latest/mllib-data-types.html]
> [https://spark.apache.org/docs/latest/mllib-decision-tree.html]
> [https://spark.apache.org/docs/latest/mllib-dimensionality-reduction.html]
> [https://spark.apache.org/docs/latest/mllib-ensembles.html]
> [https://spark.apache.org/docs/latest/mllib-evaluation-metrics.html]
> [https://spark.apache.org/docs/latest/mllib-feature-extraction.html]
> [https://spark.apache.org/docs/latest/mllib-frequent-pattern-mining.html]
> [https://spark.apache.org/docs/latest/mllib-isotonic-regression.html]
> [https://spark.apache.org/docs/latest/mllib-linear-methods.html]
> [https://spark.apache.org/docs/latest/mllib-naive-bayes.html]
> [https://spark.apache.org/docs/latest/mllib-statistics.html]
>  
> [https://spark.apache.org/docs/latest/quick-start.html]
>  
> [https://spark.apache.org/docs/latest/rdd-programming-guide.html]
>  
> [https://spark.apache.org/docs/latest/sql-data-sources-avro.html]
> [https://spark.apache.org/docs/latest/sql-data-sources-binaryFile.html]
> [https://spark.apache.org/docs/latest/sql-data-sources-csv.html]
> [https://spark.apache.org/docs/latest/sql-data-sources-generic-options.html]
> [https://spark.apache.org/docs/latest/sql-data-sources-hive-tables.html]
> [https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html]
> [https://spark.apache.org/docs/latest/sql-data-sources-json.html]
> [https://spark.apache.org/docs/latest/sql-data-sources-parquet.html]
> sql-data-sources-protobuf.html
> [https://spark.apache.org/docs/latest/sql-data-sources-text.html]
> [https://spark.apache.org/docs/latest/sql-migration-guide.html]
> [https://spark.apache.org/docs/latest/sql-performance-tuning.html]
> [https://spark.apache.org/docs/latest/sql-ref-datatypes.html]
>  
> [https://spark.apache.org/docs/latest/streaming-kinesis-integration.html]
> [https://spark.apache.org/docs/latest/streaming-programming-guide.html]
>  
> [https://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html]
> [https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html]
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42647) Remove aliases from deprecated numpy data types

2023-03-01 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42647:


Assignee: (was: Apache Spark)

> Remove aliases from deprecated numpy data types
> ---
>
> Key: SPARK-42647
> URL: https://issues.apache.org/jira/browse/SPARK-42647
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.3.0, 3.3.1, 3.3.3, 3.3.2, 3.4.0, 3.4.1
>Reporter: Aimilios Tsouvelekakis
>Priority: Major
>
> Numpy has started changing the alias to some of its data-types. This means 
> that users with the latest version of numpy they will face either warnings or 
> errors according to the type that they are using. This affects all the users 
> using numoy > 1.20.0. One of the types was fixed back in September with this 
> [pull|https://github.com/apache/spark/pull/37817] request.
> The problem can be split into 2 types:
> [numpy 1.24.0|https://github.com/numpy/numpy/pull/22607]: The scalar type 
> aliases ending in a 0 bit size: np.object0, np.str0, np.bytes0, np.void0, 
> np.int0, np.uint0 as well as np.bool8 are now deprecated and will eventually 
> be removed. At this point in numpy 1.25.0 they give a awarning
> [numpy 1.20.0|https://github.com/numpy/numpy/pull/14882]: Using the aliases 
> of builtin types like np.int is deprecated and removed since numpy version 
> 1.24.0
> The changes are needed so pyspark can be compatible with the latest numpy and 
> avoid
>  * attribute errors on data types being deprecated from version 1.20.0: 
> [https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations]
>  * warnings on deprecated data types from version 1.24.0: 
> [https://numpy.org/devdocs/release/1.24.0-notes.html#deprecations]
>  
> From my main research I see the following:
> The only changes that are functional are related with the conversion.py file. 
> The rest of the changes are inside tests in the user_guide or in some 
> docstrings describing specific functions. Since I am not an expert in these 
> tests I wait for the reviewer and some people with more experience in the 
> pyspark code.
> These types are aliases for classic python types so yes they should work with 
> all the numpy versions 
> [1|https://numpy.org/devdocs/release/1.20.0-notes.html], 
> [2|https://stackoverflow.com/questions/74844262/how-can-i-solve-error-module-numpy-has-no-attribute-float-in-python].
>  The error or warning comes from the call to the numpy.
>  
> For the versions I chose to include from 3.3 and onwards but I see that 3.2 
> also is still in the 18 month maintenace cadence as it was released in 
> October 2021.
>  
> The pull request: [https://github.com/apache/spark/pull/40220]
> Best Regards,
> Aimilios



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42647) Remove aliases from deprecated numpy data types

2023-03-01 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695520#comment-17695520
 ] 

Apache Spark commented on SPARK-42647:
--

User 'aimtsou' has created a pull request for this issue:
https://github.com/apache/spark/pull/40220

> Remove aliases from deprecated numpy data types
> ---
>
> Key: SPARK-42647
> URL: https://issues.apache.org/jira/browse/SPARK-42647
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.3.0, 3.3.1, 3.3.3, 3.3.2, 3.4.0, 3.4.1
>Reporter: Aimilios Tsouvelekakis
>Priority: Major
>
> Numpy has started changing the alias to some of its data-types. This means 
> that users with the latest version of numpy they will face either warnings or 
> errors according to the type that they are using. This affects all the users 
> using numoy > 1.20.0. One of the types was fixed back in September with this 
> [pull|https://github.com/apache/spark/pull/37817] request.
> The problem can be split into 2 types:
> [numpy 1.24.0|https://github.com/numpy/numpy/pull/22607]: The scalar type 
> aliases ending in a 0 bit size: np.object0, np.str0, np.bytes0, np.void0, 
> np.int0, np.uint0 as well as np.bool8 are now deprecated and will eventually 
> be removed. At this point in numpy 1.25.0 they give a awarning
> [numpy 1.20.0|https://github.com/numpy/numpy/pull/14882]: Using the aliases 
> of builtin types like np.int is deprecated and removed since numpy version 
> 1.24.0
> The changes are needed so pyspark can be compatible with the latest numpy and 
> avoid
>  * attribute errors on data types being deprecated from version 1.20.0: 
> [https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations]
>  * warnings on deprecated data types from version 1.24.0: 
> [https://numpy.org/devdocs/release/1.24.0-notes.html#deprecations]
>  
> From my main research I see the following:
> The only changes that are functional are related with the conversion.py file. 
> The rest of the changes are inside tests in the user_guide or in some 
> docstrings describing specific functions. Since I am not an expert in these 
> tests I wait for the reviewer and some people with more experience in the 
> pyspark code.
> These types are aliases for classic python types so yes they should work with 
> all the numpy versions 
> [1|https://numpy.org/devdocs/release/1.20.0-notes.html], 
> [2|https://stackoverflow.com/questions/74844262/how-can-i-solve-error-module-numpy-has-no-attribute-float-in-python].
>  The error or warning comes from the call to the numpy.
>  
> For the versions I chose to include from 3.3 and onwards but I see that 3.2 
> also is still in the 18 month maintenace cadence as it was released in 
> October 2021.
>  
> The pull request: [https://github.com/apache/spark/pull/40220]
> Best Regards,
> Aimilios



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42647) Remove aliases from deprecated numpy data types

2023-03-01 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42647:


Assignee: Apache Spark

> Remove aliases from deprecated numpy data types
> ---
>
> Key: SPARK-42647
> URL: https://issues.apache.org/jira/browse/SPARK-42647
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.3.0, 3.3.1, 3.3.3, 3.3.2, 3.4.0, 3.4.1
>Reporter: Aimilios Tsouvelekakis
>Assignee: Apache Spark
>Priority: Major
>
> Numpy has started changing the alias to some of its data-types. This means 
> that users with the latest version of numpy they will face either warnings or 
> errors according to the type that they are using. This affects all the users 
> using numoy > 1.20.0. One of the types was fixed back in September with this 
> [pull|https://github.com/apache/spark/pull/37817] request.
> The problem can be split into 2 types:
> [numpy 1.24.0|https://github.com/numpy/numpy/pull/22607]: The scalar type 
> aliases ending in a 0 bit size: np.object0, np.str0, np.bytes0, np.void0, 
> np.int0, np.uint0 as well as np.bool8 are now deprecated and will eventually 
> be removed. At this point in numpy 1.25.0 they give a awarning
> [numpy 1.20.0|https://github.com/numpy/numpy/pull/14882]: Using the aliases 
> of builtin types like np.int is deprecated and removed since numpy version 
> 1.24.0
> The changes are needed so pyspark can be compatible with the latest numpy and 
> avoid
>  * attribute errors on data types being deprecated from version 1.20.0: 
> [https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations]
>  * warnings on deprecated data types from version 1.24.0: 
> [https://numpy.org/devdocs/release/1.24.0-notes.html#deprecations]
>  
> From my main research I see the following:
> The only changes that are functional are related with the conversion.py file. 
> The rest of the changes are inside tests in the user_guide or in some 
> docstrings describing specific functions. Since I am not an expert in these 
> tests I wait for the reviewer and some people with more experience in the 
> pyspark code.
> These types are aliases for classic python types so yes they should work with 
> all the numpy versions 
> [1|https://numpy.org/devdocs/release/1.20.0-notes.html], 
> [2|https://stackoverflow.com/questions/74844262/how-can-i-solve-error-module-numpy-has-no-attribute-float-in-python].
>  The error or warning comes from the call to the numpy.
>  
> For the versions I chose to include from 3.3 and onwards but I see that 3.2 
> also is still in the 18 month maintenace cadence as it was released in 
> October 2021.
>  
> The pull request: [https://github.com/apache/spark/pull/40220]
> Best Regards,
> Aimilios



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42646) Upgrad cyclonedx from 2.7.3 to 2.7.5

2023-03-01 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695465#comment-17695465
 ] 

Apache Spark commented on SPARK-42646:
--

User 'panbingkun' has created a pull request for this issue:
https://github.com/apache/spark/pull/40247

> Upgrad cyclonedx from 2.7.3 to 2.7.5
> 
>
> Key: SPARK-42646
> URL: https://issues.apache.org/jira/browse/SPARK-42646
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Priority: Minor
>
> !https://user-images.githubusercontent.com/15246973/222338040-d7c8d595-be0b-40bb-af49-6b260dc0c425.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42646) Upgrad cyclonedx from 2.7.3 to 2.7.5

2023-03-01 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42646:


Assignee: Apache Spark

> Upgrad cyclonedx from 2.7.3 to 2.7.5
> 
>
> Key: SPARK-42646
> URL: https://issues.apache.org/jira/browse/SPARK-42646
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Assignee: Apache Spark
>Priority: Minor
>
> !https://user-images.githubusercontent.com/15246973/222338040-d7c8d595-be0b-40bb-af49-6b260dc0c425.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42646) Upgrad cyclonedx from 2.7.3 to 2.7.5

2023-03-01 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42646:


Assignee: (was: Apache Spark)

> Upgrad cyclonedx from 2.7.3 to 2.7.5
> 
>
> Key: SPARK-42646
> URL: https://issues.apache.org/jira/browse/SPARK-42646
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Priority: Minor
>
> !https://user-images.githubusercontent.com/15246973/222338040-d7c8d595-be0b-40bb-af49-6b260dc0c425.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42644) Add `hive` dependency to `connect` module

2023-03-01 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42644:


Assignee: (was: Apache Spark)

> Add `hive` dependency to `connect` module
> -
>
> Key: SPARK-42644
> URL: https://issues.apache.org/jira/browse/SPARK-42644
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42644) Add `hive` dependency to `connect` module

2023-03-01 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42644:


Assignee: Apache Spark

> Add `hive` dependency to `connect` module
> -
>
> Key: SPARK-42644
> URL: https://issues.apache.org/jira/browse/SPARK-42644
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42644) Add `hive` dependency to `connect` module

2023-03-01 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695464#comment-17695464
 ] 

Apache Spark commented on SPARK-42644:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/40246

> Add `hive` dependency to `connect` module
> -
>
> Key: SPARK-42644
> URL: https://issues.apache.org/jira/browse/SPARK-42644
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42643) Implement `spark.udf.registerJavaFunction`

2023-03-01 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695429#comment-17695429
 ] 

Apache Spark commented on SPARK-42643:
--

User 'xinrong-meng' has created a pull request for this issue:
https://github.com/apache/spark/pull/40244

> Implement `spark.udf.registerJavaFunction`
> --
>
> Key: SPARK-42643
> URL: https://issues.apache.org/jira/browse/SPARK-42643
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Implement `spark.udf.registerJavaFunction`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42643) Implement `spark.udf.registerJavaFunction`

2023-03-01 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42643:


Assignee: Apache Spark

> Implement `spark.udf.registerJavaFunction`
> --
>
> Key: SPARK-42643
> URL: https://issues.apache.org/jira/browse/SPARK-42643
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Assignee: Apache Spark
>Priority: Major
>
> Implement `spark.udf.registerJavaFunction`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42643) Implement `spark.udf.registerJavaFunction`

2023-03-01 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42643:


Assignee: (was: Apache Spark)

> Implement `spark.udf.registerJavaFunction`
> --
>
> Key: SPARK-42643
> URL: https://issues.apache.org/jira/browse/SPARK-42643
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Implement `spark.udf.registerJavaFunction`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41823) DataFrame.join creating ambiguous column names

2023-03-01 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695428#comment-17695428
 ] 

Apache Spark commented on SPARK-41823:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/40245

> DataFrame.join creating ambiguous column names
> --
>
> Key: SPARK-41823
> URL: https://issues.apache.org/jira/browse/SPARK-41823
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Assignee: Ruifeng Zheng
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 254, in pyspark.sql.connect.dataframe.DataFrame.drop
> Failed example:
>     df.join(df2, df.name == df2.name, 'inner').drop('name').show()
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File "", line 
> 1, in 
>         df.join(df2, df.name == df2.name, 'inner').drop('name').show()
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 534, in show
>         print(self._show_string(n, truncate, vertical))
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 423, in _show_string
>         ).toPandas()
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 1031, in toPandas
>         return self._session.client.to_pandas(query)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 413, in to_pandas
>         return self._execute_and_fetch(req)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 573, in _execute_and_fetch
>         self._handle_error(rpc_error)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 619, in _handle_error
>         raise SparkConnectAnalysisException(
>     pyspark.sql.connect.client.SparkConnectAnalysisException: 
> [AMBIGUOUS_REFERENCE] Reference `name` is ambiguous, could be: [`name`, 
> `name`].
>     Plan: {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42641) Upgrade buf to v1.15.0

2023-03-01 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42641:


Assignee: (was: Apache Spark)

> Upgrade buf to v1.15.0
> --
>
> Key: SPARK-42641
> URL: https://issues.apache.org/jira/browse/SPARK-42641
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42641) Upgrade buf to v1.15.0

2023-03-01 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42641:


Assignee: Apache Spark

> Upgrade buf to v1.15.0
> --
>
> Key: SPARK-42641
> URL: https://issues.apache.org/jira/browse/SPARK-42641
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42641) Upgrade buf to v1.15.0

2023-03-01 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695404#comment-17695404
 ] 

Apache Spark commented on SPARK-42641:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/40243

> Upgrade buf to v1.15.0
> --
>
> Key: SPARK-42641
> URL: https://issues.apache.org/jira/browse/SPARK-42641
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42640) Remove stale entries from the excluding rules for CompabilitySuite

2023-03-01 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42640:


Assignee: Apache Spark  (was: Rui Wang)

> Remove stale entries from the excluding rules for CompabilitySuite
> --
>
> Key: SPARK-42640
> URL: https://issues.apache.org/jira/browse/SPARK-42640
> Project: Spark
>  Issue Type: Task
>  Components: Connect
>Affects Versions: 3.4.1
>Reporter: Rui Wang
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42640) Remove stale entries from the excluding rules for CompabilitySuite

2023-03-01 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42640:


Assignee: Rui Wang  (was: Apache Spark)

> Remove stale entries from the excluding rules for CompabilitySuite
> --
>
> Key: SPARK-42640
> URL: https://issues.apache.org/jira/browse/SPARK-42640
> Project: Spark
>  Issue Type: Task
>  Components: Connect
>Affects Versions: 3.4.1
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42640) Remove stale entries from the excluding rules for CompabilitySuite

2023-03-01 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695387#comment-17695387
 ] 

Apache Spark commented on SPARK-42640:
--

User 'amaliujia' has created a pull request for this issue:
https://github.com/apache/spark/pull/40241

> Remove stale entries from the excluding rules for CompabilitySuite
> --
>
> Key: SPARK-42640
> URL: https://issues.apache.org/jira/browse/SPARK-42640
> Project: Spark
>  Issue Type: Task
>  Components: Connect
>Affects Versions: 3.4.1
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42639) Add createDataFrame/createDataset to SparkSession

2023-03-01 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695386#comment-17695386
 ] 

Apache Spark commented on SPARK-42639:
--

User 'hvanhovell' has created a pull request for this issue:
https://github.com/apache/spark/pull/40242

> Add createDataFrame/createDataset to SparkSession
> -
>
> Key: SPARK-42639
> URL: https://issues.apache.org/jira/browse/SPARK-42639
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Assignee: Herman van Hövell
>Priority: Major
>
> Add createDataFrame/createDataset to SparkSession



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42639) Add createDataFrame/createDataset to SparkSession

2023-03-01 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42639:


Assignee: Apache Spark  (was: Herman van Hövell)

> Add createDataFrame/createDataset to SparkSession
> -
>
> Key: SPARK-42639
> URL: https://issues.apache.org/jira/browse/SPARK-42639
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Assignee: Apache Spark
>Priority: Major
>
> Add createDataFrame/createDataset to SparkSession



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42639) Add createDataFrame/createDataset to SparkSession

2023-03-01 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42639:


Assignee: Herman van Hövell  (was: Apache Spark)

> Add createDataFrame/createDataset to SparkSession
> -
>
> Key: SPARK-42639
> URL: https://issues.apache.org/jira/browse/SPARK-42639
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Assignee: Herman van Hövell
>Priority: Major
>
> Add createDataFrame/createDataset to SparkSession



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42458) createDataFrame should support DDL string as schema

2023-03-01 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42458:


Assignee: Apache Spark

> createDataFrame should support DDL string as schema
> ---
>
> Key: SPARK-42458
> URL: https://issues.apache.org/jira/browse/SPARK-42458
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Takuya Ueshin
>Assignee: Apache Spark
>Priority: Major
>
> {code:python}
> File "/.../python/pyspark/sql/connect/readwriter.py", line 393, in 
> pyspark.sql.connect.readwriter.DataFrameWriter.option
> Failed example:
> with tempfile.TemporaryDirectory() as d:
> # Write a DataFrame into a CSV file with 'nullValue' option set to 
> 'Hyukjin Kwon'.
> df = spark.createDataFrame([(100, None)], "age INT, name STRING")
> df.write.option("nullValue", "Hyukjin 
> Kwon").mode("overwrite").format("csv").save(d)
> # Read the CSV file as a DataFrame.
> spark.read.schema(df.schema).format('csv').load(d).show()
> Exception raised:
> Traceback (most recent call last):
>   File "/.../lib/python3.9/doctest.py", line 1334, in __run
> exec(compile(example.source, filename, "single",
>   File " pyspark.sql.connect.readwriter.DataFrameWriter.option[2]>", line 3, in 
> 
> df = spark.createDataFrame([(100, None)], "age INT, name STRING")
>   File "/.../python/pyspark/sql/connect/session.py", line 312, in 
> createDataFrame
> raise ValueError(
> ValueError: Some of types cannot be determined after inferring, a 
> StructType Schema is required in this case
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42458) createDataFrame should support DDL string as schema

2023-03-01 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695351#comment-17695351
 ] 

Apache Spark commented on SPARK-42458:
--

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/40240

> createDataFrame should support DDL string as schema
> ---
>
> Key: SPARK-42458
> URL: https://issues.apache.org/jira/browse/SPARK-42458
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Takuya Ueshin
>Priority: Major
>
> {code:python}
> File "/.../python/pyspark/sql/connect/readwriter.py", line 393, in 
> pyspark.sql.connect.readwriter.DataFrameWriter.option
> Failed example:
> with tempfile.TemporaryDirectory() as d:
> # Write a DataFrame into a CSV file with 'nullValue' option set to 
> 'Hyukjin Kwon'.
> df = spark.createDataFrame([(100, None)], "age INT, name STRING")
> df.write.option("nullValue", "Hyukjin 
> Kwon").mode("overwrite").format("csv").save(d)
> # Read the CSV file as a DataFrame.
> spark.read.schema(df.schema).format('csv').load(d).show()
> Exception raised:
> Traceback (most recent call last):
>   File "/.../lib/python3.9/doctest.py", line 1334, in __run
> exec(compile(example.source, filename, "single",
>   File " pyspark.sql.connect.readwriter.DataFrameWriter.option[2]>", line 3, in 
> 
> df = spark.createDataFrame([(100, None)], "age INT, name STRING")
>   File "/.../python/pyspark/sql/connect/session.py", line 312, in 
> createDataFrame
> raise ValueError(
> ValueError: Some of types cannot be determined after inferring, a 
> StructType Schema is required in this case
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42458) createDataFrame should support DDL string as schema

2023-03-01 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42458:


Assignee: (was: Apache Spark)

> createDataFrame should support DDL string as schema
> ---
>
> Key: SPARK-42458
> URL: https://issues.apache.org/jira/browse/SPARK-42458
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Takuya Ueshin
>Priority: Major
>
> {code:python}
> File "/.../python/pyspark/sql/connect/readwriter.py", line 393, in 
> pyspark.sql.connect.readwriter.DataFrameWriter.option
> Failed example:
> with tempfile.TemporaryDirectory() as d:
> # Write a DataFrame into a CSV file with 'nullValue' option set to 
> 'Hyukjin Kwon'.
> df = spark.createDataFrame([(100, None)], "age INT, name STRING")
> df.write.option("nullValue", "Hyukjin 
> Kwon").mode("overwrite").format("csv").save(d)
> # Read the CSV file as a DataFrame.
> spark.read.schema(df.schema).format('csv').load(d).show()
> Exception raised:
> Traceback (most recent call last):
>   File "/.../lib/python3.9/doctest.py", line 1334, in __run
> exec(compile(example.source, filename, "single",
>   File " pyspark.sql.connect.readwriter.DataFrameWriter.option[2]>", line 3, in 
> 
> df = spark.createDataFrame([(100, None)], "age INT, name STRING")
>   File "/.../python/pyspark/sql/connect/session.py", line 312, in 
> createDataFrame
> raise ValueError(
> ValueError: Some of types cannot be determined after inferring, a 
> StructType Schema is required in this case
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42633) Use the actual schema in a LocalRelation

2023-03-01 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695280#comment-17695280
 ] 

Apache Spark commented on SPARK-42633:
--

User 'hvanhovell' has created a pull request for this issue:
https://github.com/apache/spark/pull/40238

> Use the actual schema in a LocalRelation
> 
>
> Key: SPARK-42633
> URL: https://issues.apache.org/jira/browse/SPARK-42633
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Assignee: Herman van Hövell
>Priority: Major
>
> Make the LocalRelation proto take an actual schema message instead of a 
> string.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42637) Add SparkSession.stop

2023-03-01 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695279#comment-17695279
 ] 

Apache Spark commented on SPARK-42637:
--

User 'hvanhovell' has created a pull request for this issue:
https://github.com/apache/spark/pull/40239

> Add SparkSession.stop
> -
>
> Key: SPARK-42637
> URL: https://issues.apache.org/jira/browse/SPARK-42637
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Assignee: Herman van Hövell
>Priority: Major
>
> Add SparkSession.stop()



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42635) Several counter-intuitive behaviours in the TimestampAdd expression

2023-03-01 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42635:


Assignee: (was: Apache Spark)

> Several counter-intuitive behaviours in the TimestampAdd expression
> ---
>
> Key: SPARK-42635
> URL: https://issues.apache.org/jira/browse/SPARK-42635
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0, 3.3.1, 3.3.2
>Reporter: Chenhao Li
>Priority: Major
>
> # When the time is close to daylight saving time transition, the result may 
> be discontinuous and not monotonic.
> We currently have:
> {code:scala}
> scala> spark.conf.set("spark.sql.session.timeZone", "America/Los_Angeles")
> scala> spark.sql("select timestampadd(second, 24 * 3600 - 1, 
> timestamp'2011-03-12 03:00:00')").show
> ++
> |timestampadd(second, ((24 * 3600) - 1), TIMESTAMP '2011-03-12 03:00:00')|
> ++
> | 2011-03-13 03:59:59|
> ++
> scala> spark.sql("select timestampadd(second, 24 * 3600, timestamp'2011-03-12 
> 03:00:00')").show
> +--+
> |timestampadd(second, (24 * 3600), TIMESTAMP '2011-03-12 03:00:00')|
> +--+
> |   2011-03-13 03:00:00|
> +--+ {code}
>  
> In the second query, adding one more second will set the time back one hour 
> instead. Plus, there are only {{23 * 3600}} seconds from {{2011-03-12 
> 03:00:00}} to {{2011-03-13 03:00:00}}, instead of {{24 * 3600}} seconds, due 
> to the daylight saving time transition.
> The root cause of the problem is the Spark code at 
> [https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L790]
>  wrongly assumes every day has {{MICROS_PER_DAY}} seconds, and does the day 
> and time-in-day split before looking at the timezone.
> 2. Adding month, quarter, and year silently ignores Int overflow during unit 
> conversion.
> The root cause is 
> [https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L1246].
>  {{quantity}} is multiplied by {{3}} or {{MONTHS_PER_YEAR}} without checking 
> overflow. Note that we do have overflow checking in adding the amount to the 
> timestamp, so the behavior is inconsistent.
> This can cause counter-intuitive results like this:
> {code:scala}
> scala> spark.sql("select timestampadd(quarter, 1431655764, 
> timestamp'1970-01-01')").show
> +--+
> |timestampadd(quarter, 1431655764, TIMESTAMP '1970-01-01 00:00:00')|
> +--+
> |   1969-09-01 00:00:00|
> +--+{code}
> 3. Adding sub-month units (week, day, hour, minute, second, millisecond, 
> microsecond)silently ignores Long overflow during unit conversion.
> This is similar to the previous problem:
> {code:scala}
>  scala> spark.sql("select timestampadd(day, 106751992, 
> timestamp'1970-01-01')").show(false)
> +-+
> |timestampadd(day, 106751992, TIMESTAMP '1970-01-01 00:00:00')|
> +-+
> |-290308-12-22 15:58:10.448384|
> +-+{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42635) Several counter-intuitive behaviours in the TimestampAdd expression

2023-03-01 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695278#comment-17695278
 ] 

Apache Spark commented on SPARK-42635:
--

User 'chenhao-db' has created a pull request for this issue:
https://github.com/apache/spark/pull/40237

> Several counter-intuitive behaviours in the TimestampAdd expression
> ---
>
> Key: SPARK-42635
> URL: https://issues.apache.org/jira/browse/SPARK-42635
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0, 3.3.1, 3.3.2
>Reporter: Chenhao Li
>Priority: Major
>
> # When the time is close to daylight saving time transition, the result may 
> be discontinuous and not monotonic.
> We currently have:
> {code:scala}
> scala> spark.conf.set("spark.sql.session.timeZone", "America/Los_Angeles")
> scala> spark.sql("select timestampadd(second, 24 * 3600 - 1, 
> timestamp'2011-03-12 03:00:00')").show
> ++
> |timestampadd(second, ((24 * 3600) - 1), TIMESTAMP '2011-03-12 03:00:00')|
> ++
> | 2011-03-13 03:59:59|
> ++
> scala> spark.sql("select timestampadd(second, 24 * 3600, timestamp'2011-03-12 
> 03:00:00')").show
> +--+
> |timestampadd(second, (24 * 3600), TIMESTAMP '2011-03-12 03:00:00')|
> +--+
> |   2011-03-13 03:00:00|
> +--+ {code}
>  
> In the second query, adding one more second will set the time back one hour 
> instead. Plus, there are only {{23 * 3600}} seconds from {{2011-03-12 
> 03:00:00}} to {{2011-03-13 03:00:00}}, instead of {{24 * 3600}} seconds, due 
> to the daylight saving time transition.
> The root cause of the problem is the Spark code at 
> [https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L790]
>  wrongly assumes every day has {{MICROS_PER_DAY}} seconds, and does the day 
> and time-in-day split before looking at the timezone.
> 2. Adding month, quarter, and year silently ignores Int overflow during unit 
> conversion.
> The root cause is 
> [https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L1246].
>  {{quantity}} is multiplied by {{3}} or {{MONTHS_PER_YEAR}} without checking 
> overflow. Note that we do have overflow checking in adding the amount to the 
> timestamp, so the behavior is inconsistent.
> This can cause counter-intuitive results like this:
> {code:scala}
> scala> spark.sql("select timestampadd(quarter, 1431655764, 
> timestamp'1970-01-01')").show
> +--+
> |timestampadd(quarter, 1431655764, TIMESTAMP '1970-01-01 00:00:00')|
> +--+
> |   1969-09-01 00:00:00|
> +--+{code}
> 3. Adding sub-month units (week, day, hour, minute, second, millisecond, 
> microsecond)silently ignores Long overflow during unit conversion.
> This is similar to the previous problem:
> {code:scala}
>  scala> spark.sql("select timestampadd(day, 106751992, 
> timestamp'1970-01-01')").show(false)
> +-+
> |timestampadd(day, 106751992, TIMESTAMP '1970-01-01 00:00:00')|
> +-+
> |-290308-12-22 15:58:10.448384|
> +-+{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42635) Several counter-intuitive behaviours in the TimestampAdd expression

2023-03-01 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42635:


Assignee: Apache Spark

> Several counter-intuitive behaviours in the TimestampAdd expression
> ---
>
> Key: SPARK-42635
> URL: https://issues.apache.org/jira/browse/SPARK-42635
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0, 3.3.1, 3.3.2
>Reporter: Chenhao Li
>Assignee: Apache Spark
>Priority: Major
>
> # When the time is close to daylight saving time transition, the result may 
> be discontinuous and not monotonic.
> We currently have:
> {code:scala}
> scala> spark.conf.set("spark.sql.session.timeZone", "America/Los_Angeles")
> scala> spark.sql("select timestampadd(second, 24 * 3600 - 1, 
> timestamp'2011-03-12 03:00:00')").show
> ++
> |timestampadd(second, ((24 * 3600) - 1), TIMESTAMP '2011-03-12 03:00:00')|
> ++
> | 2011-03-13 03:59:59|
> ++
> scala> spark.sql("select timestampadd(second, 24 * 3600, timestamp'2011-03-12 
> 03:00:00')").show
> +--+
> |timestampadd(second, (24 * 3600), TIMESTAMP '2011-03-12 03:00:00')|
> +--+
> |   2011-03-13 03:00:00|
> +--+ {code}
>  
> In the second query, adding one more second will set the time back one hour 
> instead. Plus, there are only {{23 * 3600}} seconds from {{2011-03-12 
> 03:00:00}} to {{2011-03-13 03:00:00}}, instead of {{24 * 3600}} seconds, due 
> to the daylight saving time transition.
> The root cause of the problem is the Spark code at 
> [https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L790]
>  wrongly assumes every day has {{MICROS_PER_DAY}} seconds, and does the day 
> and time-in-day split before looking at the timezone.
> 2. Adding month, quarter, and year silently ignores Int overflow during unit 
> conversion.
> The root cause is 
> [https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L1246].
>  {{quantity}} is multiplied by {{3}} or {{MONTHS_PER_YEAR}} without checking 
> overflow. Note that we do have overflow checking in adding the amount to the 
> timestamp, so the behavior is inconsistent.
> This can cause counter-intuitive results like this:
> {code:scala}
> scala> spark.sql("select timestampadd(quarter, 1431655764, 
> timestamp'1970-01-01')").show
> +--+
> |timestampadd(quarter, 1431655764, TIMESTAMP '1970-01-01 00:00:00')|
> +--+
> |   1969-09-01 00:00:00|
> +--+{code}
> 3. Adding sub-month units (week, day, hour, minute, second, millisecond, 
> microsecond)silently ignores Long overflow during unit conversion.
> This is similar to the previous problem:
> {code:scala}
>  scala> spark.sql("select timestampadd(day, 106751992, 
> timestamp'1970-01-01')").show(false)
> +-+
> |timestampadd(day, 106751992, TIMESTAMP '1970-01-01 00:00:00')|
> +-+
> |-290308-12-22 15:58:10.448384|
> +-+{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38735) Test the error class: INTERNAL_ERROR

2023-03-01 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695259#comment-17695259
 ] 

Apache Spark commented on SPARK-38735:
--

User 'the8thC' has created a pull request for this issue:
https://github.com/apache/spark/pull/40236

> Test the error class: INTERNAL_ERROR
> 
>
> Key: SPARK-38735
> URL: https://issues.apache.org/jira/browse/SPARK-38735
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Priority: Minor
>  Labels: starter
>
> Add tests for the error class *INTERNAL_ERROR* to QueryExecutionErrorsSuite. 
> The test should cover the exception throw in QueryExecutionErrors:
> {code:scala}
>   def logicalHintOperatorNotRemovedDuringAnalysisError(): Throwable = {
> new SparkIllegalStateException(errorClass = "INTERNAL_ERROR",
>   messageParameters = Array(
> "Internal error: logical hint operator should have been removed 
> during analysis"))
>   }
>   def cannotEvaluateExpressionError(expression: Expression): Throwable = {
> new SparkUnsupportedOperationException(errorClass = "INTERNAL_ERROR",
>   messageParameters = Array(s"Cannot evaluate expression: $expression"))
>   }
>   def cannotGenerateCodeForExpressionError(expression: Expression): Throwable 
> = {
> new SparkUnsupportedOperationException(errorClass = "INTERNAL_ERROR",
>   messageParameters = Array(s"Cannot generate code for expression: 
> $expression"))
>   }
>   def cannotTerminateGeneratorError(generator: UnresolvedGenerator): 
> Throwable = {
> new SparkUnsupportedOperationException(errorClass = "INTERNAL_ERROR",
>   messageParameters = Array(s"Cannot terminate expression: $generator"))
>   }
>   def methodNotDeclaredError(name: String): Throwable = {
> new SparkNoSuchMethodException(errorClass = "INTERNAL_ERROR",
>   messageParameters = Array(
> s"""A method named "$name" is not declared in any enclosing class nor 
> any supertype"""))
>   }
> {code}
> For example, here is a test for the error class *UNSUPPORTED_FEATURE*: 
> https://github.com/apache/spark/blob/34e3029a43d2a8241f70f2343be8285cb7f231b9/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryCompilationErrorsSuite.scala#L151-L170
> +The test must have a check of:+
> # the entire error message
> # sqlState if it is defined in the error-classes.json file
> # the error class



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38735) Test the error class: INTERNAL_ERROR

2023-03-01 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38735:


Assignee: (was: Apache Spark)

> Test the error class: INTERNAL_ERROR
> 
>
> Key: SPARK-38735
> URL: https://issues.apache.org/jira/browse/SPARK-38735
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Priority: Minor
>  Labels: starter
>
> Add tests for the error class *INTERNAL_ERROR* to QueryExecutionErrorsSuite. 
> The test should cover the exception throw in QueryExecutionErrors:
> {code:scala}
>   def logicalHintOperatorNotRemovedDuringAnalysisError(): Throwable = {
> new SparkIllegalStateException(errorClass = "INTERNAL_ERROR",
>   messageParameters = Array(
> "Internal error: logical hint operator should have been removed 
> during analysis"))
>   }
>   def cannotEvaluateExpressionError(expression: Expression): Throwable = {
> new SparkUnsupportedOperationException(errorClass = "INTERNAL_ERROR",
>   messageParameters = Array(s"Cannot evaluate expression: $expression"))
>   }
>   def cannotGenerateCodeForExpressionError(expression: Expression): Throwable 
> = {
> new SparkUnsupportedOperationException(errorClass = "INTERNAL_ERROR",
>   messageParameters = Array(s"Cannot generate code for expression: 
> $expression"))
>   }
>   def cannotTerminateGeneratorError(generator: UnresolvedGenerator): 
> Throwable = {
> new SparkUnsupportedOperationException(errorClass = "INTERNAL_ERROR",
>   messageParameters = Array(s"Cannot terminate expression: $generator"))
>   }
>   def methodNotDeclaredError(name: String): Throwable = {
> new SparkNoSuchMethodException(errorClass = "INTERNAL_ERROR",
>   messageParameters = Array(
> s"""A method named "$name" is not declared in any enclosing class nor 
> any supertype"""))
>   }
> {code}
> For example, here is a test for the error class *UNSUPPORTED_FEATURE*: 
> https://github.com/apache/spark/blob/34e3029a43d2a8241f70f2343be8285cb7f231b9/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryCompilationErrorsSuite.scala#L151-L170
> +The test must have a check of:+
> # the entire error message
> # sqlState if it is defined in the error-classes.json file
> # the error class



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38735) Test the error class: INTERNAL_ERROR

2023-03-01 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695261#comment-17695261
 ] 

Apache Spark commented on SPARK-38735:
--

User 'the8thC' has created a pull request for this issue:
https://github.com/apache/spark/pull/40236

> Test the error class: INTERNAL_ERROR
> 
>
> Key: SPARK-38735
> URL: https://issues.apache.org/jira/browse/SPARK-38735
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Priority: Minor
>  Labels: starter
>
> Add tests for the error class *INTERNAL_ERROR* to QueryExecutionErrorsSuite. 
> The test should cover the exception throw in QueryExecutionErrors:
> {code:scala}
>   def logicalHintOperatorNotRemovedDuringAnalysisError(): Throwable = {
> new SparkIllegalStateException(errorClass = "INTERNAL_ERROR",
>   messageParameters = Array(
> "Internal error: logical hint operator should have been removed 
> during analysis"))
>   }
>   def cannotEvaluateExpressionError(expression: Expression): Throwable = {
> new SparkUnsupportedOperationException(errorClass = "INTERNAL_ERROR",
>   messageParameters = Array(s"Cannot evaluate expression: $expression"))
>   }
>   def cannotGenerateCodeForExpressionError(expression: Expression): Throwable 
> = {
> new SparkUnsupportedOperationException(errorClass = "INTERNAL_ERROR",
>   messageParameters = Array(s"Cannot generate code for expression: 
> $expression"))
>   }
>   def cannotTerminateGeneratorError(generator: UnresolvedGenerator): 
> Throwable = {
> new SparkUnsupportedOperationException(errorClass = "INTERNAL_ERROR",
>   messageParameters = Array(s"Cannot terminate expression: $generator"))
>   }
>   def methodNotDeclaredError(name: String): Throwable = {
> new SparkNoSuchMethodException(errorClass = "INTERNAL_ERROR",
>   messageParameters = Array(
> s"""A method named "$name" is not declared in any enclosing class nor 
> any supertype"""))
>   }
> {code}
> For example, here is a test for the error class *UNSUPPORTED_FEATURE*: 
> https://github.com/apache/spark/blob/34e3029a43d2a8241f70f2343be8285cb7f231b9/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryCompilationErrorsSuite.scala#L151-L170
> +The test must have a check of:+
> # the entire error message
> # sqlState if it is defined in the error-classes.json file
> # the error class



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38735) Test the error class: INTERNAL_ERROR

2023-03-01 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38735:


Assignee: Apache Spark

> Test the error class: INTERNAL_ERROR
> 
>
> Key: SPARK-38735
> URL: https://issues.apache.org/jira/browse/SPARK-38735
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Apache Spark
>Priority: Minor
>  Labels: starter
>
> Add tests for the error class *INTERNAL_ERROR* to QueryExecutionErrorsSuite. 
> The test should cover the exception throw in QueryExecutionErrors:
> {code:scala}
>   def logicalHintOperatorNotRemovedDuringAnalysisError(): Throwable = {
> new SparkIllegalStateException(errorClass = "INTERNAL_ERROR",
>   messageParameters = Array(
> "Internal error: logical hint operator should have been removed 
> during analysis"))
>   }
>   def cannotEvaluateExpressionError(expression: Expression): Throwable = {
> new SparkUnsupportedOperationException(errorClass = "INTERNAL_ERROR",
>   messageParameters = Array(s"Cannot evaluate expression: $expression"))
>   }
>   def cannotGenerateCodeForExpressionError(expression: Expression): Throwable 
> = {
> new SparkUnsupportedOperationException(errorClass = "INTERNAL_ERROR",
>   messageParameters = Array(s"Cannot generate code for expression: 
> $expression"))
>   }
>   def cannotTerminateGeneratorError(generator: UnresolvedGenerator): 
> Throwable = {
> new SparkUnsupportedOperationException(errorClass = "INTERNAL_ERROR",
>   messageParameters = Array(s"Cannot terminate expression: $generator"))
>   }
>   def methodNotDeclaredError(name: String): Throwable = {
> new SparkNoSuchMethodException(errorClass = "INTERNAL_ERROR",
>   messageParameters = Array(
> s"""A method named "$name" is not declared in any enclosing class nor 
> any supertype"""))
>   }
> {code}
> For example, here is a test for the error class *UNSUPPORTED_FEATURE*: 
> https://github.com/apache/spark/blob/34e3029a43d2a8241f70f2343be8285cb7f231b9/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryCompilationErrorsSuite.scala#L151-L170
> +The test must have a check of:+
> # the entire error message
> # sqlState if it is defined in the error-classes.json file
> # the error class



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



<    3   4   5   6   7   8   9   10   11   12   >