date:20231019

[jira] [Resolved] (SPARK-44837) Improve error message for ALTER TABLE ALTER COLUMN on partition columns in non-delta tables

2023-10-19 Thread Max Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-44837.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 42524
[https://github.com/apache/spark/pull/42524]

> Improve error message for ALTER TABLE ALTER COLUMN on partition columns in 
> non-delta tables
> ---
>
> Key: SPARK-44837
> URL: https://issues.apache.org/jira/browse/SPARK-44837
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.3, 3.1.3, 3.2.4, 3.3.2, 3.4.1, 4.0.0
>Reporter: Michael Zhang
>Assignee: Michael Zhang
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
>  
> {code:java}
> -- hive table
> sql("create table some_table (x int, y int, z int) using parquet PARTITIONED 
> BY (x, y) " +
> "location '/Users/someone/runtime/tmp-data/some_table'")
> sql("alter table some_table alter column x comment 'some-comment'").collect()
> Can't find column `x` given table data columns [`z`].{code}
> Improve error message to indicate to users that the command is not supported 
> on partition columns in non-delta tables.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-45608) Migrate SchemaColumnConvertNotSupportedException onto DATATYPE_MISMATCH error classes

2023-10-19 Thread Max Gekk (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1574#comment-1574
 ] 

Max Gekk commented on SPARK-45608:
--

The ticket came from 
https://github.com/apache/spark/pull/43451#discussion_r1365683194

> Migrate SchemaColumnConvertNotSupportedException onto DATATYPE_MISMATCH error 
> classes
> -
>
> Key: SPARK-45608
> URL: https://issues.apache.org/jira/browse/SPARK-45608
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: Zamil Majdy
>Priority: Major
>
> SchemaColumnConvertNotSupportedException is not currently part of 
> SparkThrowable.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45366) Remove productHash from TreeNode

2023-10-19 Thread BingKun Pan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BingKun Pan resolved SPARK-45366.
-
Resolution: Duplicate

> Remove productHash from TreeNode
> 
>
> Key: SPARK-45366
> URL: https://issues.apache.org/jira/browse/SPARK-45366
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-44753) XML: Add Python and sparkR binding including Spark Connect

2023-10-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-44753:
---
Labels: pull-request-available  (was: )

> XML: Add Python and sparkR binding including Spark Connect
> --
>
> Key: SPARK-44753
> URL: https://issues.apache.org/jira/browse/SPARK-44753
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Sandip Agarwala
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45289) ClassCastException when reading Delta table on AWS S3

2023-10-19 Thread Tanawat Panmongkol (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tanawat Panmongkol resolved SPARK-45289.

Resolution: Fixed

> ClassCastException when reading Delta table on AWS S3
> -
>
> Key: SPARK-45289
> URL: https://issues.apache.org/jira/browse/SPARK-45289
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 3.5.0
> Environment: Spark version: 3.5.0
> Deployment mode: spark-shell
> OS: Ubuntu (Docker image)
> Java/JVM version: OpenJDK 11
> Packages: hadoop-aws:3.3.4, delta-core_2.12:2.4.0
>Reporter: Tanawat Panmongkol
>Priority: Major
>
> When attempting to read a Delta table from S3 using version 3.5.0, a 
> _*{{ClassCastException}}*_ occurs involving 
> {{_*org.apache.hadoop.fs.s3a.S3AFileStatus*_}} and 
> {_}*{{org.apache.spark.sql.execution.datasources.FileStatusWithMetadata}}*{_}.
>  The error appears to be related to the new feature SPARK-43039.
> _*Steps to Reproduce:*_
> {code:java}
> export AWS_ACCESS_KEY_ID=''
> export AWS_SECRET_ACCESS_KEY=''
> export AWS_REGION=''
> docker run --rm -it apache/spark:3.5.0-scala2.12-java11-ubuntu 
> /opt/spark/bin/spark-shell \
> --packages 
> 'org.apache.hadoop:hadoop-aws:3.3.4,io.delta:delta-core_2.12:2.4.0' \
> --conf 
> "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"
>  \
> --conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" \
> --conf "spark.hadoop.aws.region=$AWS_REGION" \
> --conf "spark.hadoop.fs.s3a.access.key=$AWS_ACCESS_KEY_ID" \
> --conf "spark.hadoop.fs.s3a.secret.key=$AWS_SECRET_ACCESS_KEY" \
> --conf "spark.hadoop.fs.s3.impl=org.apache.hadoop.fs.s3a.S3AFileSystem" \
> --conf "spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem" \
> --conf "spark.hadoop.fs.s3a.path.style.access=true" \
> --conf "spark.hadoop.fs.s3a.connection.ssl.enabled=true" \
> --conf "spark.jars.ivy=/tmp/ivy/cache"{code}
> {code:java}
> scala> 
> spark.read.format("delta").load("s3:").show()
>  {code}
> *Logs:*
> {code:java}
> java.lang.ClassCastException: class org.apache.hadoop.fs.s3a.S3AFileStatus 
> cannot be cast to class 
> org.apache.spark.sql.execution.datasources.FileStatusWithMetadata 
> (org.apache.hadoop.fs.s3a.S3AFileStatus is in unnamed module of loader 
> scala.reflect.internal.util.ScalaClassLoader$URLClassLoader @4552f905; 
> org.apache.spark.sql.execution.datasources.FileStatusWithMetadata is in 
> unnamed module of loader 'app')
>   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
>   at 
> scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
>   at 
> scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
>   at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)
>   at scala.collection.TraversableLike.map(TraversableLike.scala:286)
>   at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:108)
>   at 
> org.apache.spark.sql.execution.FileSourceScanLike.$anonfun$setFilesNumAndSizeMetric$2(DataSourceScanExec.scala:466)
>   at 
> org.apache.spark.sql.execution.FileSourceScanLike.$anonfun$setFilesNumAndSizeMetric$2$adapted(DataSourceScanExec.scala:466)
>   at scala.collection.immutable.List.map(List.scala:293)
>   at 
> org.apache.spark.sql.execution.FileSourceScanLike.setFilesNumAndSizeMetric(DataSourceScanExec.scala:466)
>   at 
> org.apache.spark.sql.execution.FileSourceScanLike.selectedPartitions(DataSourceScanExec.scala:257)
>   at 
> org.apache.spark.sql.execution.FileSourceScanLike.selectedPartitions$(DataSourceScanExec.scala:251)
>   at 
> org.apache.spark.sql.execution.FileSourceScanExec.selectedPartitions$lzycompute(DataSourceScanExec.scala:506)
>   at 
> org.apache.spark.sql.execution.FileSourceScanExec.selectedPartitions(DataSourceScanExec.scala:506)
>   at 
> org.apache.spark.sql.execution.FileSourceScanLike.dynamicallySelectedPartitions(DataSourceScanExec.scala:286)
>   at 
> org.apache.spark.sql.execution.FileSourceScanLike.dynamicallySelectedPartitions$(DataSourceScanExec.scala:267)
>   at 
> org.apache.spark.sql.execution.FileSourceScanExec.dynamicallySelectedPartitions$lzycompute(DataSourceScanExec.scala:506)
>   at 
> org.apache.spark.sql.execution.FileSourceScanExec.dynamicallySelectedPartitions(DataSourceScanExec.scala:506)
>   at 
> org.apache.spark.sql.execution.FileSourceScanExec.inputRDD$lzycompute(DataSourceScanExec.scala:553)
>   at 
> org.apache.spark.sql.execution.FileSourceScanExec.inputRDD(DataSourceScanExec.scala:537)
>   at 
> org.apache.spark.sql.execution.FileSourceScanExec.doExecute(DataSourceScanExec.scala:575)
>   at 
>

[jira] [Created] (SPARK-45614) Assign name to _LEGACY_ERROR_TEMP_215[6,7,8]

2023-10-19 Thread Deng Ziming (Jira)

Deng Ziming created SPARK-45614:
---

 Summary: Assign name to _LEGACY_ERROR_TEMP_215[6,7,8]
 Key: SPARK-45614
 URL: https://issues.apache.org/jira/browse/SPARK-45614
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.5.0
Reporter: Deng Ziming
Assignee: Deng Ziming
 Fix For: 4.0.0


Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2153* defined in 
{*}core/src/main/resources/error/error-classes.json{*}. The name should be 
short but complete (look at the example in error-classes.json).

Add a test which triggers the error from user code if such test still doesn't 
exist. Check exception fields by using {*}checkError(){*}. The last function 
checks valuable error fields only, and avoids dependencies from error text 
message. In this way, tech editors can modify error format in 
error-classes.json, and don't worry of Spark's internal tests. Migrate other 
tests that might trigger the error onto checkError().

If you cannot reproduce the error from user space (using SQL query), replace 
the error by an internal error, see {*}SparkException.internalError(){*}.

Improve the error message format in error-classes.json if the current is not 
clear. Propose a solution to users how to avoid and fix such kind of errors.

Please, look at the PR below as examples:
 * [https://github.com/apache/spark/pull/38685]
 * [https://github.com/apache/spark/pull/38656]
 * [https://github.com/apache/spark/pull/38490]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45591) Upgrade ASM to 9.6

2023-10-19 Thread Yang Jie (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie resolved SPARK-45591.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43431
[https://github.com/apache/spark/pull/43431]

> Upgrade ASM to 9.6
> --
>
> Key: SPARK-45591
> URL: https://issues.apache.org/jira/browse/SPARK-45591
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-45610) Handle "Auto-application to `()` is deprecated."

2023-10-19 Thread Yang Jie (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1539#comment-1539
 ] 

Yang Jie edited comment on SPARK-45610 at 10/20/23 2:31 AM:


Okay, I can start preparing this PR.


was (Author: luciferyang):
Okay, I can start preparing this PR.
 
 
 
 
 

> Handle "Auto-application to `()` is deprecated."
> 
>
> Key: SPARK-45610
> URL: https://issues.apache.org/jira/browse/SPARK-45610
> Project: Spark
>  Issue Type: Sub-task
>  Components: GraphX, MLlib, Spark Core, SQL, Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>
> For the following case, a compile warning will be issued in Scala 2.13：
>  
> {code:java}
> Welcome to Scala 2.13.12 (OpenJDK 64-Bit Server VM, Java 17.0.8).
> Type in expressions for evaluation. Or try :help.
> scala> class Foo {
>      |     def isEmpty(): Boolean = true
>      |     def isTrue(x: Boolean): Boolean = x
>      |   }
> class Foo
> scala> val foo = new Foo
> val foo: Foo = Foo@7061622
> scala> val ret = foo.isEmpty
>                      ^
>        warning: Auto-application to `()` is deprecated. Supply the empty 
> argument list `()` explicitly to invoke method isEmpty,
>        or remove the empty argument list from its definition (Java-defined 
> methods are exempt).
>        In Scala 3, an unapplied method like this will be eta-expanded into a 
> function. [quickfixable]
> val ret: Boolean = true {code}
> But for Scala 3, it is a compile error:
> {code:java}
> Welcome to Scala 3.3.1 (17.0.8, Java OpenJDK 64-Bit Server VM).
> Type in expressions for evaluation. Or try :help.
>                                                                               
>                                                                               
>                                                                               
>            
> scala> class Foo {
>      |     def isEmpty(): Boolean = true
>      |     def isTrue(x: Boolean): Boolean = x
>      |   }
> // defined class Foo
>                                                                               
>                                                                               
>                                                                               
>            
> scala> val foo = new Foo
> val foo: Foo = Foo@591f6f83
>                                                                               
>                                                                               
>                                                                               
>            
> scala> val ret = foo.isEmpty
> -- [E100] Syntax Error: 
> 
> 1 |val ret = foo.isEmpty
>   |          ^^^
>   |          method isEmpty in class Foo must be called with () argument
>   |
>   | longer explanation available when compiling with `-explain`
> 1 error found {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-45610) Handle "Auto-application to `()` is deprecated."

2023-10-19 Thread Yang Jie (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1539#comment-1539
 ] 

Yang Jie commented on SPARK-45610:
--

Okay, I can start preparing this PR.
 
 
 
 
 

> Handle "Auto-application to `()` is deprecated."
> 
>
> Key: SPARK-45610
> URL: https://issues.apache.org/jira/browse/SPARK-45610
> Project: Spark
>  Issue Type: Sub-task
>  Components: GraphX, MLlib, Spark Core, SQL, Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>
> For the following case, a compile warning will be issued in Scala 2.13：
>  
> {code:java}
> Welcome to Scala 2.13.12 (OpenJDK 64-Bit Server VM, Java 17.0.8).
> Type in expressions for evaluation. Or try :help.
> scala> class Foo {
>      |     def isEmpty(): Boolean = true
>      |     def isTrue(x: Boolean): Boolean = x
>      |   }
> class Foo
> scala> val foo = new Foo
> val foo: Foo = Foo@7061622
> scala> val ret = foo.isEmpty
>                      ^
>        warning: Auto-application to `()` is deprecated. Supply the empty 
> argument list `()` explicitly to invoke method isEmpty,
>        or remove the empty argument list from its definition (Java-defined 
> methods are exempt).
>        In Scala 3, an unapplied method like this will be eta-expanded into a 
> function. [quickfixable]
> val ret: Boolean = true {code}
> But for Scala 3, it is a compile error:
> {code:java}
> Welcome to Scala 3.3.1 (17.0.8, Java OpenJDK 64-Bit Server VM).
> Type in expressions for evaluation. Or try :help.
>                                                                               
>                                                                               
>                                                                               
>            
> scala> class Foo {
>      |     def isEmpty(): Boolean = true
>      |     def isTrue(x: Boolean): Boolean = x
>      |   }
> // defined class Foo
>                                                                               
>                                                                               
>                                                                               
>            
> scala> val foo = new Foo
> val foo: Foo = Foo@591f6f83
>                                                                               
>                                                                               
>                                                                               
>            
> scala> val ret = foo.isEmpty
> -- [E100] Syntax Error: 
> 
> 1 |val ret = foo.isEmpty
>   |          ^^^
>   |          method isEmpty in class Foo must be called with () argument
>   |
>   | longer explanation available when compiling with `-explain`
> 1 error found {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-44405) Reduce code duplication in group-based DELETE and MERGE tests

2023-10-19 Thread Min Zhao (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-44405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1538#comment-1538
 ] 

Min Zhao commented on SPARK-44405:
--

Hello, are you working on it and would you like me to try it?

> Reduce code duplication in group-based DELETE and MERGE tests
> -
>
> Key: SPARK-44405
> URL: https://issues.apache.org/jira/browse/SPARK-44405
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Anton Okolnychyi
>Priority: Major
>
> See [this|https://github.com/apache/spark/pull/41600#discussion_r1230014119] 
> discussion.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45601) stackoverflow when executing rule ExtractWindowExpressions

2023-10-19 Thread JacobZheng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

JacobZheng resolved SPARK-45601.

Fix Version/s: 3.3.0
   Resolution: Resolved

> stackoverflow when executing rule ExtractWindowExpressions
> --
>
> Key: SPARK-45601
> URL: https://issues.apache.org/jira/browse/SPARK-45601
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.3
>Reporter: JacobZheng
>Priority: Major
> Fix For: 3.3.0
>
>
> I am encountering stackoverflow errors while executing the following test 
> case. I looked at the source code and it is ExtractWindowExpressions not 
> extracting the window correctly and encountering a dead loop at 
> resolveOperatorsDownWithPruning that is causing it.
> {code:scala}
>  test("agg filter contains window") {
> val src = Seq((1, "b", "c")).toDF("col1", "col2", "col3")
>   .withColumn("test",
> expr("count(col1) filter (where min(col1) over(partition by col2 
> order by col3)>1)"))
> src.show()
>   }
> {code}
> Now my question is this kind of in agg filter (window) is the correct usage? 
> Or should I add a check like spark sql and throw an error "It is not allowed 
> to use window functions inside WHERE clause"?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-45601) stackoverflow when executing rule ExtractWindowExpressions

2023-10-19 Thread JacobZheng (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1536#comment-1536
 ] 

JacobZheng commented on SPARK-45601:


Got it, Thanks [~bersprockets]

> stackoverflow when executing rule ExtractWindowExpressions
> --
>
> Key: SPARK-45601
> URL: https://issues.apache.org/jira/browse/SPARK-45601
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.3
>Reporter: JacobZheng
>Priority: Major
>
> I am encountering stackoverflow errors while executing the following test 
> case. I looked at the source code and it is ExtractWindowExpressions not 
> extracting the window correctly and encountering a dead loop at 
> resolveOperatorsDownWithPruning that is causing it.
> {code:scala}
>  test("agg filter contains window") {
> val src = Seq((1, "b", "c")).toDF("col1", "col2", "col3")
>   .withColumn("test",
> expr("count(col1) filter (where min(col1) over(partition by col2 
> order by col3)>1)"))
> src.show()
>   }
> {code}
> Now my question is this kind of in agg filter (window) is the correct usage? 
> Or should I add a check like spark sql and throw an error "It is not allowed 
> to use window functions inside WHERE clause"?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45613) Expose DeterministicLevel as a DeveloperApi

2023-10-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45613:
---
Labels: pull-request-available  (was: )

> Expose DeterministicLevel as a DeveloperApi
> ---
>
> Key: SPARK-45613
> URL: https://issues.apache.org/jira/browse/SPARK-45613
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.0, 3.5.0, 4.0.0
>Reporter: Mridul Muralidharan
>Priority: Major
>  Labels: pull-request-available
>
> {{RDD.getOutputDeterministicLevel}} is a {{DeveloperApi}} which users can 
> override to specify the {{DeterministicLevel}} of the {{RDD}}.
> Unfortunately, {{DeterministicLevel}} itself is {{private[spark]}}.
> Expose {{DeterministicLevel}} to allow users to users this method.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45613) Expose DeterministicLevel as a DeveloperApi

2023-10-19 Thread Mridul Muralidharan (Jira)

Mridul Muralidharan created SPARK-45613:
---

 Summary: Expose DeterministicLevel as a DeveloperApi
 Key: SPARK-45613
 URL: https://issues.apache.org/jira/browse/SPARK-45613
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.5.0, 3.4.0, 4.0.0
Reporter: Mridul Muralidharan


{{RDD.getOutputDeterministicLevel}} is a {{DeveloperApi}} which users can 
override to specify the {{DeterministicLevel}} of the {{RDD}}.
Unfortunately, {{DeterministicLevel}} itself is {{private[spark]}}.

Expose {{DeterministicLevel}} to allow users to users this method.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45603) merge_spark_pr shall notice us about GITHUB_OAUTH_KEY expiry

2023-10-19 Thread Ruifeng Zheng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-45603.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43447
[https://github.com/apache/spark/pull/43447]

> merge_spark_pr shall notice us about GITHUB_OAUTH_KEY expiry
> 
>
> Key: SPARK-45603
> URL: https://issues.apache.org/jira/browse/SPARK-45603
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45603) merge_spark_pr shall notice us about GITHUB_OAUTH_KEY expiry

2023-10-19 Thread Ruifeng Zheng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-45603:
-

Assignee: Kent Yao

> merge_spark_pr shall notice us about GITHUB_OAUTH_KEY expiry
> 
>
> Key: SPARK-45603
> URL: https://issues.apache.org/jira/browse/SPARK-45603
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45612) Allow cached RDDs to be migrated to fallback storage during decommission

2023-10-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45612:
---
Labels: pull-request-available  (was: )

> Allow cached RDDs to be migrated to fallback storage during decommission
> 
>
> Key: SPARK-45612
> URL: https://issues.apache.org/jira/browse/SPARK-45612
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: Frank Yin
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45584) Execution fails when there are subqueries in TakeOrderedAndProjectExec

2023-10-19 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-45584:
---

Assignee: Allison Wang

> Execution fails when there are subqueries in TakeOrderedAndProjectExec
> --
>
> Key: SPARK-45584
> URL: https://issues.apache.org/jira/browse/SPARK-45584
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
>  Labels: pull-request-available
>
> When there are subqueries in TakeOrderedAndProjectExec, the query can throw 
> this exception:
>  java.lang.IllegalArgumentException: requirement failed: Subquery 
> subquery#242, [id=#109|#109] has not finished 
> This is because TakeOrderedAndProjectExec does not wait for subquery 
> execution.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45584) Execution fails when there are subqueries in TakeOrderedAndProjectExec

2023-10-19 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-45584.
-
Fix Version/s: 3.5.1
   4.0.0
   Resolution: Fixed

Issue resolved by pull request 43419
[https://github.com/apache/spark/pull/43419]

> Execution fails when there are subqueries in TakeOrderedAndProjectExec
> --
>
> Key: SPARK-45584
> URL: https://issues.apache.org/jira/browse/SPARK-45584
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.1, 4.0.0
>
>
> When there are subqueries in TakeOrderedAndProjectExec, the query can throw 
> this exception:
>  java.lang.IllegalArgumentException: requirement failed: Subquery 
> subquery#242, [id=#109|#109] has not finished 
> This is because TakeOrderedAndProjectExec does not wait for subquery 
> execution.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45611) spark.python.pyspark.sql.functions Typo at date_format Function

2023-10-19 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-45611.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43442
[https://github.com/apache/spark/pull/43442]

> spark.python.pyspark.sql.functions Typo at date_format Function
> ---
>
> Key: SPARK-45611
> URL: https://issues.apache.org/jira/browse/SPARK-45611
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: Mete Can Akar
>Assignee: Mete Can Akar
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: image-2023-10-19-19-46-22-918.png
>
>
> In the spark.python.pyspark.sql.functions module, at the {{date_format}} 
> method's doctest, there is a typo in the year format.
> Instead of '{{{}MM/dd/yyy'{}}}, it should be {{'MM/dd/'}} as the expected 
> output{{{}[Row(date='04/08/2015')]{}}} indicates the following format 
> {{"MM/dd/".}}
>  
> From the official documentation:
> !image-2023-10-19-19-46-22-918.png|width=633,height=365!
> {code:java}
> df = spark.createDataFrame([('2015-04-08',)], ['dt'])
> df.select(date_format('dt', 'MM/dd/yyy').alias('date')).collect()
> [Row(date='04/08/2015')]
> {code}
>  
> As a solution, I proposed the PR [https://github.com/apache/spark/pull/43442].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45611) spark.python.pyspark.sql.functions Typo at date_format Function

2023-10-19 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-45611:


Assignee: Mete Can Akar

> spark.python.pyspark.sql.functions Typo at date_format Function
> ---
>
> Key: SPARK-45611
> URL: https://issues.apache.org/jira/browse/SPARK-45611
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: Mete Can Akar
>Assignee: Mete Can Akar
>Priority: Minor
>  Labels: pull-request-available
> Attachments: image-2023-10-19-19-46-22-918.png
>
>
> In the spark.python.pyspark.sql.functions module, at the {{date_format}} 
> method's doctest, there is a typo in the year format.
> Instead of '{{{}MM/dd/yyy'{}}}, it should be {{'MM/dd/'}} as the expected 
> output{{{}[Row(date='04/08/2015')]{}}} indicates the following format 
> {{"MM/dd/".}}
>  
> From the official documentation:
> !image-2023-10-19-19-46-22-918.png|width=633,height=365!
> {code:java}
> df = spark.createDataFrame([('2015-04-08',)], ['dt'])
> df.select(date_format('dt', 'MM/dd/yyy').alias('date')).collect()
> [Row(date='04/08/2015')]
> {code}
>  
> As a solution, I proposed the PR [https://github.com/apache/spark/pull/43442].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45428) Add Matomo analytics to all released docs pages

2023-10-19 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-45428.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43454
[https://github.com/apache/spark/pull/43454]

> Add Matomo analytics to all released docs pages
> ---
>
> Key: SPARK-45428
> URL: https://issues.apache.org/jira/browse/SPARK-45428
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 3.5.0, 4.0.0
>Reporter: Allison Wang
>Assignee: BingKun Pan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Matomo analytics has been added to some pages of the Spark website. Here is 
> Sean's initial PR: 
> [https://github.com/apache/spark-website/pull/479.|https://www.google.com/url?q=https://github.com/apache/spark-website/pull/479=D=docs=1696544881650480=AOvVaw11SNfWcd4UJzlO8EJvzdoe]
> You can find analytics for Spark website here: https://analytics.apache.org
> We need to add this to all API pages. This is very important for us to 
> prioritize documentation improvements and search engine optimization.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45611) spark.python.pyspark.sql.functions Typo at date_format Function

2023-10-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45611:
---
Labels: pull-request-available  (was: )

> spark.python.pyspark.sql.functions Typo at date_format Function
> ---
>
> Key: SPARK-45611
> URL: https://issues.apache.org/jira/browse/SPARK-45611
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: Mete Can Akar
>Priority: Minor
>  Labels: pull-request-available
> Attachments: image-2023-10-19-19-46-22-918.png
>
>
> In the spark.python.pyspark.sql.functions module, at the {{date_format}} 
> method's doctest, there is a typo in the year format.
> Instead of '{{{}MM/dd/yyy'{}}}, it should be {{'MM/dd/'}} as the expected 
> output{{{}[Row(date='04/08/2015')]{}}} indicates the following format 
> {{"MM/dd/".}}
>  
> From the official documentation:
> !image-2023-10-19-19-46-22-918.png|width=633,height=365!
> {code:java}
> df = spark.createDataFrame([('2015-04-08',)], ['dt'])
> df.select(date_format('dt', 'MM/dd/yyy').alias('date')).collect()
> [Row(date='04/08/2015')]
> {code}
>  
> As a solution, I proposed the PR [https://github.com/apache/spark/pull/43442].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45612) Allow cached RDDs to be migrated to fallback storage during decommission

2023-10-19 Thread Frank Yin (Jira)

Frank Yin created SPARK-45612:
-

 Summary: Allow cached RDDs to be migrated to fallback storage 
during decommission
 Key: SPARK-45612
 URL: https://issues.apache.org/jira/browse/SPARK-45612
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.5.0
Reporter: Frank Yin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-45583) Spark SQL returning incorrect values for full outer join on keys with the same name.

2023-10-19 Thread Huw (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1502#comment-1502
 ] 

Huw commented on SPARK-45583:
-

Ahh, apologies, it looks like I was running 3.4.1 when I found this issue.

Testing in 3.5 it does appear to be resolved.

> Spark SQL returning incorrect values for full outer join on keys with the 
> same name.
> 
>
> Key: SPARK-45583
> URL: https://issues.apache.org/jira/browse/SPARK-45583
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.1
>Reporter: Huw
>Priority: Major
> Fix For: 3.5.0
>
>
> {{The following query gives the wrong results.}}
>  
> {{WITH people as (}}
> {{  SELECT * FROM (VALUES }}
> {{    (1, 'Peter'), }}
> {{    (2, 'Homer'), }}
> {{    (3, 'Ned'),}}
> {{    (3, 'Jenny')}}
> {{  ) AS Idiots(id, FirstName)}}
> {{{}){}}}{{{}, location as ({}}}
> {{  SELECT * FROM (VALUES}}
> {{    (1, 'sample0'),}}
> {{    (1, 'sample1'),}}
> {{    (2, 'sample2')  }}
> {{  ) as Locations(id, address)}}
> {{{}){}}}{{{}SELECT{}}}
> {{  *}}
> {{FROM}}
> {{  people}}
> {{FULL OUTER JOIN}}
> {{  location}}
> {{ON}}
> {{  people.id = location.id}}
> {{We find the following table:}}
> ||id: integer||FirstName: string||id: integer||address: string||
> |2|Homer|2|sample2|
> |null|Ned|null|null|
> |null|Jenny|null|null|
> |1|Peter|1|sample0|
> |1|Peter|1|sample1|
> {{But clearly the first `id` column is wrong, the nulls should be 3.}}
> If we rename the id column in (only) the person table to pid we get the 
> correct results:
> ||pid: integer||FirstName: string||id: integer||address: string||
> |2|Homer|2|sample2|
> |3|Ned|null|null|
> |3|Jenny|null|null|
> |1|Peter|1|sample0|
> |1|Peter|1|sample1|



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45583) Spark SQL returning incorrect values for full outer join on keys with the same name.

2023-10-19 Thread Huw (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huw updated SPARK-45583:

Affects Version/s: 3.4.1
   (was: 3.5.0)

> Spark SQL returning incorrect values for full outer join on keys with the 
> same name.
> 
>
> Key: SPARK-45583
> URL: https://issues.apache.org/jira/browse/SPARK-45583
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.1
>Reporter: Huw
>Priority: Major
>
> {{The following query gives the wrong results.}}
>  
> {{WITH people as (}}
> {{  SELECT * FROM (VALUES }}
> {{    (1, 'Peter'), }}
> {{    (2, 'Homer'), }}
> {{    (3, 'Ned'),}}
> {{    (3, 'Jenny')}}
> {{  ) AS Idiots(id, FirstName)}}
> {{{}){}}}{{{}, location as ({}}}
> {{  SELECT * FROM (VALUES}}
> {{    (1, 'sample0'),}}
> {{    (1, 'sample1'),}}
> {{    (2, 'sample2')  }}
> {{  ) as Locations(id, address)}}
> {{{}){}}}{{{}SELECT{}}}
> {{  *}}
> {{FROM}}
> {{  people}}
> {{FULL OUTER JOIN}}
> {{  location}}
> {{ON}}
> {{  people.id = location.id}}
> {{We find the following table:}}
> ||id: integer||FirstName: string||id: integer||address: string||
> |2|Homer|2|sample2|
> |null|Ned|null|null|
> |null|Jenny|null|null|
> |1|Peter|1|sample0|
> |1|Peter|1|sample1|
> {{But clearly the first `id` column is wrong, the nulls should be 3.}}
> If we rename the id column in (only) the person table to pid we get the 
> correct results:
> ||pid: integer||FirstName: string||id: integer||address: string||
> |2|Homer|2|sample2|
> |3|Ned|null|null|
> |3|Jenny|null|null|
> |1|Peter|1|sample0|
> |1|Peter|1|sample1|



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45583) Spark SQL returning incorrect values for full outer join on keys with the same name.

2023-10-19 Thread Huw (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huw updated SPARK-45583:

Fix Version/s: 3.5.0

> Spark SQL returning incorrect values for full outer join on keys with the 
> same name.
> 
>
> Key: SPARK-45583
> URL: https://issues.apache.org/jira/browse/SPARK-45583
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.1
>Reporter: Huw
>Priority: Major
> Fix For: 3.5.0
>
>
> {{The following query gives the wrong results.}}
>  
> {{WITH people as (}}
> {{  SELECT * FROM (VALUES }}
> {{    (1, 'Peter'), }}
> {{    (2, 'Homer'), }}
> {{    (3, 'Ned'),}}
> {{    (3, 'Jenny')}}
> {{  ) AS Idiots(id, FirstName)}}
> {{{}){}}}{{{}, location as ({}}}
> {{  SELECT * FROM (VALUES}}
> {{    (1, 'sample0'),}}
> {{    (1, 'sample1'),}}
> {{    (2, 'sample2')  }}
> {{  ) as Locations(id, address)}}
> {{{}){}}}{{{}SELECT{}}}
> {{  *}}
> {{FROM}}
> {{  people}}
> {{FULL OUTER JOIN}}
> {{  location}}
> {{ON}}
> {{  people.id = location.id}}
> {{We find the following table:}}
> ||id: integer||FirstName: string||id: integer||address: string||
> |2|Homer|2|sample2|
> |null|Ned|null|null|
> |null|Jenny|null|null|
> |1|Peter|1|sample0|
> |1|Peter|1|sample1|
> {{But clearly the first `id` column is wrong, the nulls should be 3.}}
> If we rename the id column in (only) the person table to pid we get the 
> correct results:
> ||pid: integer||FirstName: string||id: integer||address: string||
> |2|Homer|2|sample2|
> |3|Ned|null|null|
> |3|Jenny|null|null|
> |1|Peter|1|sample0|
> |1|Peter|1|sample1|



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45604) Converting timestamp_ntz to array can cause NPE or SEGFAULT on parquet vectorized reader

2023-10-19 Thread Zamil Majdy (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zamil Majdy updated SPARK-45604:

Description: 
Repro:

```

val path = "/tmp/sample_parquet_file"

spark.sql("SELECT CAST('2019-01-01' AS TIMESTAMP_NTZ) AS 
field").write.parquet(path)
spark.read.schema("field ARRAY").parquet(path).collect()

```

Depending on the memory mode, it will throw an NPE on OnHeap mode and SEGFAULT 
on OffHeap mode.

  was:
Repro:

val path = "/tmp/sample_parquet_file"

spark.sql("SELECT CAST('2019-01-01' AS TIMESTAMP_NTZ) AS 
field").write.parquet(path)
spark.read.schema("field ARRAY").parquet(path).collect()

Depending on the memory mode, it will throw an NPE on OnHeap mode and SEGFAULT 
on OffHeap mode.


> Converting timestamp_ntz to array can cause NPE or SEGFAULT on 
> parquet vectorized reader
> ---
>
> Key: SPARK-45604
> URL: https://issues.apache.org/jira/browse/SPARK-45604
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: Zamil Majdy
>Priority: Major
>  Labels: pull-request-available
>
> Repro:
> ```
> val path = "/tmp/sample_parquet_file"
> spark.sql("SELECT CAST('2019-01-01' AS TIMESTAMP_NTZ) AS 
> field").write.parquet(path)
> spark.read.schema("field ARRAY").parquet(path).collect()
> ```
> Depending on the memory mode, it will throw an NPE on OnHeap mode and 
> SEGFAULT on OffHeap mode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45604) Converting timestamp_ntz to array can cause NPE or SEGFAULT on parquet vectorized reader

2023-10-19 Thread Zamil Majdy (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zamil Majdy updated SPARK-45604:

Description: 
Repro:

val path = "/tmp/sample_parquet_file"

spark.sql("SELECT CAST('2019-01-01' AS TIMESTAMP_NTZ) AS 
field").write.parquet(path)
spark.read.schema("field ARRAY").parquet(path).collect()

Depending on the memory mode, it will throw an NPE on OnHeap mode and SEGFAULT 
on OffHeap mode.

  was:
Repro:


{{{}{}}}```
val path = "/tmp/zamil/timestamp"

spark.sql("SELECT CAST('2019-01-01' AS TIMESTAMP_NTZ) AS 
field").write.parquet(path)
spark.read.schema("field ARRAY").parquet(path).collect()
```

Depending on the memory mode, it will throw an NPE on OnHeap mode and SEGFAULT 
on OffHeap mode.


> Converting timestamp_ntz to array can cause NPE or SEGFAULT on 
> parquet vectorized reader
> ---
>
> Key: SPARK-45604
> URL: https://issues.apache.org/jira/browse/SPARK-45604
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: Zamil Majdy
>Priority: Major
>  Labels: pull-request-available
>
> Repro:
> val path = "/tmp/sample_parquet_file"
> spark.sql("SELECT CAST('2019-01-01' AS TIMESTAMP_NTZ) AS 
> field").write.parquet(path)
> spark.read.schema("field ARRAY").parquet(path).collect()
> Depending on the memory mode, it will throw an NPE on OnHeap mode and 
> SEGFAULT on OffHeap mode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45604) Converting timestamp_ntz to array can cause NPE or SEGFAULT on parquet vectorized reader

2023-10-19 Thread Zamil Majdy (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zamil Majdy updated SPARK-45604:

Description: 
Repro:


{{{}{}}}```
val path = "/tmp/zamil/timestamp"

spark.sql("SELECT CAST('2019-01-01' AS TIMESTAMP_NTZ) AS 
field").write.parquet(path)
spark.read.schema("field ARRAY").parquet(path).collect()
```

Depending on the memory mode, it will throw an NPE on OnHeap mode and SEGFAULT 
on OffHeap mode.

  was:
Repro:

{{val path = "/tmp/someparquetfile"}}
{{spark.sql("SELECT CAST('2019-01-01 00:00:00' AS TIMESTAMP_NTZ) AS 
field").write.mode("overwrite").parquet(path)}}
{{spark.read.schema("field array").parquet(path).collect()}}

Depending on the memory mode, it will throw an NPE on OnHeap mode and SEGFAULT 
on OffHeap mode.


> Converting timestamp_ntz to array can cause NPE or SEGFAULT on 
> parquet vectorized reader
> ---
>
> Key: SPARK-45604
> URL: https://issues.apache.org/jira/browse/SPARK-45604
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: Zamil Majdy
>Priority: Major
>  Labels: pull-request-available
>
> Repro:
> {{{}{}}}```
> val path = "/tmp/zamil/timestamp"
> spark.sql("SELECT CAST('2019-01-01' AS TIMESTAMP_NTZ) AS 
> field").write.parquet(path)
> spark.read.schema("field ARRAY").parquet(path).collect()
> ```
> Depending on the memory mode, it will throw an NPE on OnHeap mode and 
> SEGFAULT on OffHeap mode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45604) Converting timestamp_ntz to array can cause NPE or SEGFAULT on parquet vectorized reader

2023-10-19 Thread Zamil Majdy (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zamil Majdy updated SPARK-45604:

Description: 
Repro:

{{val path = "/tmp/someparquetfile"}}
{{spark.sql("SELECT CAST('2019-01-01 00:00:00' AS TIMESTAMP_NTZ) AS 
field").write.mode("overwrite").parquet(path)}}
{{spark.read.schema("field array").parquet(path).collect()}}

Depending on the memory mode, it will throw an NPE on OnHeap mode and SEGFAULT 
on OffHeap mode.

  was:
Repro:

{{val path = "/tmp/someparquetfile"
spark.sql("SELECT CAST('2019-01-01 00:00:00' AS TIMESTAMP_NTZ) AS 
field").write.mode("overwrite").parquet(path)
spark.read.schema("field array").parquet(path).collect()}}

Depending on the memory mode, it will throw an NPE on OnHeap mode and SEGFAULT 
on OffHeap mode.


> Converting timestamp_ntz to array can cause NPE or SEGFAULT on 
> parquet vectorized reader
> ---
>
> Key: SPARK-45604
> URL: https://issues.apache.org/jira/browse/SPARK-45604
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: Zamil Majdy
>Priority: Major
>  Labels: pull-request-available
>
> Repro:
> {{val path = "/tmp/someparquetfile"}}
> {{spark.sql("SELECT CAST('2019-01-01 00:00:00' AS TIMESTAMP_NTZ) AS 
> field").write.mode("overwrite").parquet(path)}}
> {{spark.read.schema("field array").parquet(path).collect()}}
> Depending on the memory mode, it will throw an NPE on OnHeap mode and 
> SEGFAULT on OffHeap mode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45604) Converting timestamp_ntz to array can cause NPE or SEGFAULT on parquet vectorized reader

2023-10-19 Thread Zamil Majdy (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zamil Majdy updated SPARK-45604:

Description: 
Repro:

{{val path = "/tmp/someparquetfile"
spark.sql("SELECT CAST('2019-01-01 00:00:00' AS TIMESTAMP_NTZ) AS 
field").write.mode("overwrite").parquet(path)
spark.read.schema("field array").parquet(path).collect()}}

Depending on the memory mode, it will throw an NPE on OnHeap mode and SEGFAULT 
on OffHeap mode.

  was:
Repro:

```
val path = "/tmp/someparquetfile"
val df = sql("SELECT MAP('key', CAST('2019-01-01 00:00:00' AS TIMESTAMP_NTZ)) 
AS field")

df.write.mode("overwrite").parquet(path)
spark.read.schema("field map>").parquet(path).collect()
```

Depending on the memory mode is used, it will produced NPE on on-heap mode, and 
segfault on off-heap


> Converting timestamp_ntz to array can cause NPE or SEGFAULT on 
> parquet vectorized reader
> ---
>
> Key: SPARK-45604
> URL: https://issues.apache.org/jira/browse/SPARK-45604
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: Zamil Majdy
>Priority: Major
>  Labels: pull-request-available
>
> Repro:
> {{val path = "/tmp/someparquetfile"
> spark.sql("SELECT CAST('2019-01-01 00:00:00' AS TIMESTAMP_NTZ) AS 
> field").write.mode("overwrite").parquet(path)
> spark.read.schema("field array").parquet(path).collect()}}
> Depending on the memory mode, it will throw an NPE on OnHeap mode and 
> SEGFAULT on OffHeap mode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45611) spark.python.pyspark.sql.functions Typo at date_format Function

2023-10-19 Thread Mete Can Akar (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mete Can Akar updated SPARK-45611:
--
Description: 
In the spark.python.pyspark.sql.functions module, at the {{date_format}} 
method's doctest, there is a typo in the year format.

Instead of '{{{}MM/dd/yyy'{}}}, it should be {{'MM/dd/'}} as the expected 
output{{{}[Row(date='04/08/2015')]{}}} indicates the following format 
{{"MM/dd/".}}

 

>From the official documentation:

!image-2023-10-19-19-46-22-918.png|width=633,height=365!
{code:java}
df = spark.createDataFrame([('2015-04-08',)], ['dt'])
df.select(date_format('dt', 'MM/dd/yyy').alias('date')).collect()
[Row(date='04/08/2015')]
{code}
 

As a solution, I proposed the PR [https://github.com/apache/spark/pull/43442].

  was:
In the spark.python.pyspark.sql.functions module, at the {{date_format}} 
method's doctest, there is a typo in the year format.

Instead of '{{{}MM/dd/yyy'{}}}, it should be {{'MM/dd/'}} as the expected 
output{{{}[Row(date='04/08/2015')]{}}} indicates the following format 
{{"MM/dd/".}}

 

!image-2023-10-19-19-46-22-918.png|width=633,height=365!
{code:java}
df = spark.createDataFrame([('2015-04-08',)], ['dt'])
df.select(date_format('dt', 'MM/dd/yyy').alias('date')).collect()
[Row(date='04/08/2015')]
{code}
 

As a solution, I proposed the PR [https://github.com/apache/spark/pull/43442].


> spark.python.pyspark.sql.functions Typo at date_format Function
> ---
>
> Key: SPARK-45611
> URL: https://issues.apache.org/jira/browse/SPARK-45611
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: Mete Can Akar
>Priority: Minor
> Attachments: image-2023-10-19-19-46-22-918.png
>
>
> In the spark.python.pyspark.sql.functions module, at the {{date_format}} 
> method's doctest, there is a typo in the year format.
> Instead of '{{{}MM/dd/yyy'{}}}, it should be {{'MM/dd/'}} as the expected 
> output{{{}[Row(date='04/08/2015')]{}}} indicates the following format 
> {{"MM/dd/".}}
>  
> From the official documentation:
> !image-2023-10-19-19-46-22-918.png|width=633,height=365!
> {code:java}
> df = spark.createDataFrame([('2015-04-08',)], ['dt'])
> df.select(date_format('dt', 'MM/dd/yyy').alias('date')).collect()
> [Row(date='04/08/2015')]
> {code}
>  
> As a solution, I proposed the PR [https://github.com/apache/spark/pull/43442].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45611) spark.python.pyspark.sql.functions Typo at date_format Function

2023-10-19 Thread Mete Can Akar (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mete Can Akar updated SPARK-45611:
--
Description: 
In the spark.python.pyspark.sql.functions module, at the {{date_format}} 
method's doctest, there is a typo in the year format.

Instead of '{{{}MM/dd/yyy'{}}}, it should be {{'MM/dd/'}} as the expected 
output{{{}[Row(date='04/08/2015')]{}}} indicates the following format 
{{"MM/dd/".}}

 

!image-2023-10-19-19-46-22-918.png|width=633,height=365!
{code:java}
df = spark.createDataFrame([('2015-04-08',)], ['dt'])
df.select(date_format('dt', 'MM/dd/yyy').alias('date')).collect()
[Row(date='04/08/2015')]
{code}
 

As a solution, I proposed the PR [https://github.com/apache/spark/pull/43442].

  was:
In the spark.python.pyspark.sql.functions module, at the {{date_format}} 
method's doctest, there is a typo in the year format.

Instead of '{{{}MM/dd/yyy'{}}}, it should be {{'MM/dd/'}} as the expected 
output{{{}[Row(date='04/08/2015')]{}}} indicates the following format 
{{"MM/dd/".}}

 
{code:java}
df = spark.createDataFrame([('2015-04-08',)], ['dt'])
df.select(date_format('dt', 'MM/dd/yyy').alias('date')).collect()
[Row(date='04/08/2015')]
{code}
 

As a solution, I proposed the PR [https://github.com/apache/spark/pull/43442].


> spark.python.pyspark.sql.functions Typo at date_format Function
> ---
>
> Key: SPARK-45611
> URL: https://issues.apache.org/jira/browse/SPARK-45611
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: Mete Can Akar
>Priority: Minor
> Attachments: image-2023-10-19-19-46-22-918.png
>
>
> In the spark.python.pyspark.sql.functions module, at the {{date_format}} 
> method's doctest, there is a typo in the year format.
> Instead of '{{{}MM/dd/yyy'{}}}, it should be {{'MM/dd/'}} as the expected 
> output{{{}[Row(date='04/08/2015')]{}}} indicates the following format 
> {{"MM/dd/".}}
>  
> !image-2023-10-19-19-46-22-918.png|width=633,height=365!
> {code:java}
> df = spark.createDataFrame([('2015-04-08',)], ['dt'])
> df.select(date_format('dt', 'MM/dd/yyy').alias('date')).collect()
> [Row(date='04/08/2015')]
> {code}
>  
> As a solution, I proposed the PR [https://github.com/apache/spark/pull/43442].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45611) spark.python.pyspark.sql.functions Typo at date_format Function

2023-10-19 Thread Mete Can Akar (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mete Can Akar updated SPARK-45611:
--
Attachment: image-2023-10-19-19-46-22-918.png

> spark.python.pyspark.sql.functions Typo at date_format Function
> ---
>
> Key: SPARK-45611
> URL: https://issues.apache.org/jira/browse/SPARK-45611
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: Mete Can Akar
>Priority: Minor
> Attachments: image-2023-10-19-19-46-22-918.png
>
>
> In the spark.python.pyspark.sql.functions module, at the {{date_format}} 
> method's doctest, there is a typo in the year format.
> Instead of '{{{}MM/dd/yyy'{}}}, it should be {{'MM/dd/'}} as the expected 
> output{{{}[Row(date='04/08/2015')]{}}} indicates the following format 
> {{"MM/dd/".}}
>  
> {code:java}
> df = spark.createDataFrame([('2015-04-08',)], ['dt'])
> df.select(date_format('dt', 'MM/dd/yyy').alias('date')).collect()
> [Row(date='04/08/2015')]
> {code}
>  
> As a solution, I proposed the PR [https://github.com/apache/spark/pull/43442].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45611) spark.python.pyspark.sql.functions Typo at date_format Function

2023-10-19 Thread Mete Can Akar (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mete Can Akar updated SPARK-45611:
--
Description: 
In the spark.python.pyspark.sql.functions module, at the {{date_format}} 
method's doctest, there is a typo in the year format.

Instead of '{{{}MM/dd/yyy'{}}}, it should be {{'MM/dd/'}} as the expected 
output{{{}[Row(date='04/08/2015')]{}}} indicates the following format 
{{"MM/dd/".}}

 
{code:java}
df = spark.createDataFrame([('2015-04-08',)], ['dt'])
df.select(date_format('dt', 'MM/dd/yyy').alias('date')).collect()
[Row(date='04/08/2015')]
{code}
 

As a solution, I proposed the PR 
[https://github.com/apache/spark/pull/43442].}}{}}}}}{}}}

  was:
In the spark.python.pyspark.sql.functions module, at the {{date_format}} 
method's doctest, there is a typo in the year format.

Instead of '{{{}MM/dd/yyy'{}}}, it should be {{'MM/dd/'}} as the expected 
output{{{}[Row(date='04/08/2015')]{}}} indicates the following format 
{{"MM/dd/".}}

{{
{code:java}
df = spark.createDataFrame([('2015-04-08',)], ['dt'])
df.select(date_format('dt', 'MM/dd/yyy').alias('date')).collect()
[Row(date='04/08/2015')]
{code}
}}

As a solution, I proposed the PR 
[https://github.com/apache/spark/pull/43442].}}{}}}}}{}}}


> spark.python.pyspark.sql.functions Typo at date_format Function
> ---
>
> Key: SPARK-45611
> URL: https://issues.apache.org/jira/browse/SPARK-45611
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: Mete Can Akar
>Priority: Minor
>
> In the spark.python.pyspark.sql.functions module, at the {{date_format}} 
> method's doctest, there is a typo in the year format.
> Instead of '{{{}MM/dd/yyy'{}}}, it should be {{'MM/dd/'}} as the expected 
> output{{{}[Row(date='04/08/2015')]{}}} indicates the following format 
> {{"MM/dd/".}}
>  
> {code:java}
> df = spark.createDataFrame([('2015-04-08',)], ['dt'])
> df.select(date_format('dt', 'MM/dd/yyy').alias('date')).collect()
> [Row(date='04/08/2015')]
> {code}
>  
> As a solution, I proposed the PR 
> [https://github.com/apache/spark/pull/43442].}}{}}}}}{}}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45611) spark.python.pyspark.sql.functions Typo at date_format Function

2023-10-19 Thread Mete Can Akar (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mete Can Akar updated SPARK-45611:
--
Description: 
In the spark.python.pyspark.sql.functions module, at the {{date_format}} 
method's doctest, there is a typo in the year format.

Instead of '{{{}MM/dd/yyy'{}}}, it should be {{'MM/dd/'}} as the expected 
output{{{}[Row(date='04/08/2015')]{}}} indicates the following format 
{{"MM/dd/".}}

{{
{code:java}
df = spark.createDataFrame([('2015-04-08',)], ['dt'])
df.select(date_format('dt', 'MM/dd/yyy').alias('date')).collect()
[Row(date='04/08/2015')]
{code}
}}

As a solution, I proposed the PR 
[https://github.com/apache/spark/pull/43442].}}{}}}}}{}}}

  was:
In the spark.python.pyspark.sql.functions module, at the {{date_format}} 
method's doctest, there is a typo in the year format.

Instead of '{{{}MM/dd/yyy'{}}}, it should be {{'MM/dd/'}} as the expected 
output{{{}[Row(date='04/08/2015')]{}}} indicates the following format 
{{"MM/dd/".}}

{{}}

As a solution, I proposed the PR 
https://github.com/apache/spark/pull/43442.{{{}{}}}{{{}{}}}


> spark.python.pyspark.sql.functions Typo at date_format Function
> ---
>
> Key: SPARK-45611
> URL: https://issues.apache.org/jira/browse/SPARK-45611
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: Mete Can Akar
>Priority: Minor
>
> In the spark.python.pyspark.sql.functions module, at the {{date_format}} 
> method's doctest, there is a typo in the year format.
> Instead of '{{{}MM/dd/yyy'{}}}, it should be {{'MM/dd/'}} as the expected 
> output{{{}[Row(date='04/08/2015')]{}}} indicates the following format 
> {{"MM/dd/".}}
> {{
> {code:java}
> df = spark.createDataFrame([('2015-04-08',)], ['dt'])
> df.select(date_format('dt', 'MM/dd/yyy').alias('date')).collect()
> [Row(date='04/08/2015')]
> {code}
> }}
> As a solution, I proposed the PR 
> [https://github.com/apache/spark/pull/43442].}}{}}}}}{}}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45611) spark.python.pyspark.sql.functions Typo at date_format Function

2023-10-19 Thread Mete Can Akar (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mete Can Akar updated SPARK-45611:
--
Description: 
In the spark.python.pyspark.sql.functions module, at the {{date_format}} 
method's doctest, there is a typo in the year format.

Instead of '{{{}MM/dd/yyy'{}}}, it should be {{'MM/dd/'}} as the expected 
output{{{}[Row(date='04/08/2015')]{}}} indicates the following format 
{{"MM/dd/".}}

 
{code:java}
df = spark.createDataFrame([('2015-04-08',)], ['dt'])
df.select(date_format('dt', 'MM/dd/yyy').alias('date')).collect()
[Row(date='04/08/2015')]
{code}
 

As a solution, I proposed the PR [https://github.com/apache/spark/pull/43442].

  was:
In the spark.python.pyspark.sql.functions module, at the {{date_format}} 
method's doctest, there is a typo in the year format.

Instead of '{{{}MM/dd/yyy'{}}}, it should be {{'MM/dd/'}} as the expected 
output{{{}[Row(date='04/08/2015')]{}}} indicates the following format 
{{"MM/dd/".}}

 
{code:java}
df = spark.createDataFrame([('2015-04-08',)], ['dt'])
df.select(date_format('dt', 'MM/dd/yyy').alias('date')).collect()
[Row(date='04/08/2015')]
{code}
 

As a solution, I proposed the PR 
[https://github.com/apache/spark/pull/43442].}}{}}}}}{}}}


> spark.python.pyspark.sql.functions Typo at date_format Function
> ---
>
> Key: SPARK-45611
> URL: https://issues.apache.org/jira/browse/SPARK-45611
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: Mete Can Akar
>Priority: Minor
>
> In the spark.python.pyspark.sql.functions module, at the {{date_format}} 
> method's doctest, there is a typo in the year format.
> Instead of '{{{}MM/dd/yyy'{}}}, it should be {{'MM/dd/'}} as the expected 
> output{{{}[Row(date='04/08/2015')]{}}} indicates the following format 
> {{"MM/dd/".}}
>  
> {code:java}
> df = spark.createDataFrame([('2015-04-08',)], ['dt'])
> df.select(date_format('dt', 'MM/dd/yyy').alias('date')).collect()
> [Row(date='04/08/2015')]
> {code}
>  
> As a solution, I proposed the PR [https://github.com/apache/spark/pull/43442].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45611) spark.python.pyspark.sql.functions Typo at date_format Function

2023-10-19 Thread Mete Can Akar (Jira)

Mete Can Akar created SPARK-45611:
-

 Summary: spark.python.pyspark.sql.functions Typo at date_format 
Function
 Key: SPARK-45611
 URL: https://issues.apache.org/jira/browse/SPARK-45611
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 3.5.0
Reporter: Mete Can Akar


In the spark.python.pyspark.sql.functions module, at the {{date_format}} 
method's doctest, there is a typo in the year format.

Instead of '{{{}MM/dd/yyy'{}}}, it should be {{'MM/dd/'}} as the expected 
output{{{}[Row(date='04/08/2015')]{}}} indicates the following format 
{{"MM/dd/".}}

{{}}

As a solution, I proposed the PR 
https://github.com/apache/spark/pull/43442.{{{}{}}}{{{}{}}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-45610) Handle "Auto-application to `()` is deprecated."

2023-10-19 Thread Sean R. Owen (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1405#comment-1405
 ] 

Sean R. Owen commented on SPARK-45610:
--

I think it's better to make big changes at major version boundaries. I'd expect 
we support Scala 3 at some point for Spark 4.x. Therefore I think it'd be OK to 
proceed with these changes now for 4.0.

> Handle "Auto-application to `()` is deprecated."
> 
>
> Key: SPARK-45610
> URL: https://issues.apache.org/jira/browse/SPARK-45610
> Project: Spark
>  Issue Type: Sub-task
>  Components: GraphX, MLlib, Spark Core, SQL, Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>
> For the following case, a compile warning will be issued in Scala 2.13：
>  
> {code:java}
> Welcome to Scala 2.13.12 (OpenJDK 64-Bit Server VM, Java 17.0.8).
> Type in expressions for evaluation. Or try :help.
> scala> class Foo {
>      |     def isEmpty(): Boolean = true
>      |     def isTrue(x: Boolean): Boolean = x
>      |   }
> class Foo
> scala> val foo = new Foo
> val foo: Foo = Foo@7061622
> scala> val ret = foo.isEmpty
>                      ^
>        warning: Auto-application to `()` is deprecated. Supply the empty 
> argument list `()` explicitly to invoke method isEmpty,
>        or remove the empty argument list from its definition (Java-defined 
> methods are exempt).
>        In Scala 3, an unapplied method like this will be eta-expanded into a 
> function. [quickfixable]
> val ret: Boolean = true {code}
> But for Scala 3, it is a compile error:
> {code:java}
> Welcome to Scala 3.3.1 (17.0.8, Java OpenJDK 64-Bit Server VM).
> Type in expressions for evaluation. Or try :help.
>                                                                               
>                                                                               
>                                                                               
>            
> scala> class Foo {
>      |     def isEmpty(): Boolean = true
>      |     def isTrue(x: Boolean): Boolean = x
>      |   }
> // defined class Foo
>                                                                               
>                                                                               
>                                                                               
>            
> scala> val foo = new Foo
> val foo: Foo = Foo@591f6f83
>                                                                               
>                                                                               
>                                                                               
>            
> scala> val ret = foo.isEmpty
> -- [E100] Syntax Error: 
> 
> 1 |val ret = foo.isEmpty
>   |          ^^^
>   |          method isEmpty in class Foo must be called with () argument
>   |
>   | longer explanation available when compiling with `-explain`
> 1 error found {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-45610) Handle "Auto-application to `()` is deprecated."

2023-10-19 Thread Yang Jie (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1396#comment-1396
 ] 

Yang Jie commented on SPARK-45610:
--

In Spark, this involves a massive amount of cases. Since this is a compile 
error for Scala 3, it seems that we will have to fix this when we prepare to 
support Scala 3.

As the plan to support Scala 3 is not clear at the moment, should we wait until 
the schedule for supporting Scala 3 is confirmed before we proceed with the fix?

 
I would like to know your thoughts. [~srowen]  [~dongjoon] [~gurwls223] 

> Handle "Auto-application to `()` is deprecated."
> 
>
> Key: SPARK-45610
> URL: https://issues.apache.org/jira/browse/SPARK-45610
> Project: Spark
>  Issue Type: Sub-task
>  Components: GraphX, MLlib, Spark Core, SQL, Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>
> For the following case, a compile warning will be issued in Scala 2.13：
>  
> {code:java}
> Welcome to Scala 2.13.12 (OpenJDK 64-Bit Server VM, Java 17.0.8).
> Type in expressions for evaluation. Or try :help.
> scala> class Foo {
>      |     def isEmpty(): Boolean = true
>      |     def isTrue(x: Boolean): Boolean = x
>      |   }
> class Foo
> scala> val foo = new Foo
> val foo: Foo = Foo@7061622
> scala> val ret = foo.isEmpty
>                      ^
>        warning: Auto-application to `()` is deprecated. Supply the empty 
> argument list `()` explicitly to invoke method isEmpty,
>        or remove the empty argument list from its definition (Java-defined 
> methods are exempt).
>        In Scala 3, an unapplied method like this will be eta-expanded into a 
> function. [quickfixable]
> val ret: Boolean = true {code}
> But for Scala 3, it is a compile error:
> {code:java}
> Welcome to Scala 3.3.1 (17.0.8, Java OpenJDK 64-Bit Server VM).
> Type in expressions for evaluation. Or try :help.
>                                                                               
>                                                                               
>                                                                               
>            
> scala> class Foo {
>      |     def isEmpty(): Boolean = true
>      |     def isTrue(x: Boolean): Boolean = x
>      |   }
> // defined class Foo
>                                                                               
>                                                                               
>                                                                               
>            
> scala> val foo = new Foo
> val foo: Foo = Foo@591f6f83
>                                                                               
>                                                                               
>                                                                               
>            
> scala> val ret = foo.isEmpty
> -- [E100] Syntax Error: 
> 
> 1 |val ret = foo.isEmpty
>   |          ^^^
>   |          method isEmpty in class Foo must be called with () argument
>   |
>   | longer explanation available when compiling with `-explain`
> 1 error found {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45610) Handle "Auto-application to `()` is deprecated."

2023-10-19 Thread Yang Jie (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-45610:
-
Description: 
For the following case, a compile warning will be issued in Scala 2.13：

 
{code:java}
Welcome to Scala 2.13.12 (OpenJDK 64-Bit Server VM, Java 17.0.8).
Type in expressions for evaluation. Or try :help.


scala> class Foo {
     |     def isEmpty(): Boolean = true
     |     def isTrue(x: Boolean): Boolean = x
     |   }
class Foo


scala> val foo = new Foo
val foo: Foo = Foo@7061622


scala> val ret = foo.isEmpty
                     ^
       warning: Auto-application to `()` is deprecated. Supply the empty 
argument list `()` explicitly to invoke method isEmpty,
       or remove the empty argument list from its definition (Java-defined 
methods are exempt).
       In Scala 3, an unapplied method like this will be eta-expanded into a 
function. [quickfixable]
val ret: Boolean = true {code}
But for Scala 3, it is a compile error:
{code:java}
Welcome to Scala 3.3.1 (17.0.8, Java OpenJDK 64-Bit Server VM).
Type in expressions for evaluation. Or try :help.
                                                                                
                                                                                
                                                                                
     
scala> class Foo {
     |     def isEmpty(): Boolean = true
     |     def isTrue(x: Boolean): Boolean = x
     |   }
// defined class Foo
                                                                                
                                                                                
                                                                                
     
scala> val foo = new Foo
val foo: Foo = Foo@591f6f83
                                                                                
                                                                                
                                                                                
     
scala> val ret = foo.isEmpty
-- [E100] Syntax Error: 
1 |val ret = foo.isEmpty
  |          ^^^
  |          method isEmpty in class Foo must be called with () argument
  |
  | longer explanation available when compiling with `-explain`
1 error found {code}

> Handle "Auto-application to `()` is deprecated."
> 
>
> Key: SPARK-45610
> URL: https://issues.apache.org/jira/browse/SPARK-45610
> Project: Spark
>  Issue Type: Sub-task
>  Components: GraphX, MLlib, Spark Core, SQL, Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>
> For the following case, a compile warning will be issued in Scala 2.13：
>  
> {code:java}
> Welcome to Scala 2.13.12 (OpenJDK 64-Bit Server VM, Java 17.0.8).
> Type in expressions for evaluation. Or try :help.
> scala> class Foo {
>      |     def isEmpty(): Boolean = true
>      |     def isTrue(x: Boolean): Boolean = x
>      |   }
> class Foo
> scala> val foo = new Foo
> val foo: Foo = Foo@7061622
> scala> val ret = foo.isEmpty
>                      ^
>        warning: Auto-application to `()` is deprecated. Supply the empty 
> argument list `()` explicitly to invoke method isEmpty,
>        or remove the empty argument list from its definition (Java-defined 
> methods are exempt).
>        In Scala 3, an unapplied method like this will be eta-expanded into a 
> function. [quickfixable]
> val ret: Boolean = true {code}
> But for Scala 3, it is a compile error:
> {code:java}
> Welcome to Scala 3.3.1 (17.0.8, Java OpenJDK 64-Bit Server VM).
> Type in expressions for evaluation. Or try :help.
>                                                                               
>                                                                               
>                                                                               
>            
> scala> class Foo {
>      |     def isEmpty(): Boolean = true
>      |     def isTrue(x: Boolean): Boolean = x
>      |   }
> // defined class Foo
>                                                                               
>                                                                               
>                                                                               
>            
> scala> val foo = new Foo
> val foo: Foo = Foo@591f6f83
>                                                                               
>                                                                               
>                                                                               
>            
> scala> val ret = foo.isEmpty
> -- [E100] Syntax Error: 
> 
> 1 |val ret =

[jira] [Created] (SPARK-45610) Handle "Auto-application to `()` is deprecated."

2023-10-19 Thread Yang Jie (Jira)

Yang Jie created SPARK-45610:


 Summary: Handle "Auto-application to `()` is deprecated."
 Key: SPARK-45610
 URL: https://issues.apache.org/jira/browse/SPARK-45610
 Project: Spark
  Issue Type: Sub-task
  Components: GraphX, MLlib, Spark Core, SQL, Structured Streaming
Affects Versions: 4.0.0
Reporter: Yang Jie






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-25078) Standalone does not work with spark.authenticate.secret and deploy-mode=cluster

2023-10-19 Thread Yaroslav (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-25078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1377#comment-1377
 ] 

Yaroslav commented on SPARK-25078:
--

Hi, this issue is still reproducible. In SPARK-8129  they changed the way 
Worker sends "spark.authenticate.secret" value to Driver from Java options to 
environment variable to be more secure (because other processes can freely view 
this java option while only the process owner can see its environment 
variables). So the sender should 
[add|https://github.com/apache/spark/blob/v3.5.0/core/src/main/scala/org/apache/spark/deploy/worker/CommandUtils.scala#L89-L92]
 the value to environment and the receiver should take it from there, not from 
spark config. They have created this universal method [getSecretKey 
|https://github.com/apache/spark/blob/v3.5.0/core/src/main/scala/org/apache/spark/SecurityManager.scala#L282-L307]which
 can get the value either from config or from env. But for some reason inside 
initializeAuth() they still 
[search|https://github.com/apache/spark/blob/v3.5.0/core/src/main/scala/org/apache/spark/SecurityManager.scala#L337]
 this key in spark config, which fails and throws such error. Doing such change 
would fix that and I suppose getSecretKey method was created exactly for such 
kind of use:

 
{code:java}
-        require(sparkConf.contains(SPARK_AUTH_SECRET_CONF),
+        require(getSecretKey() != null, {code}
I guess it won't affect anything since even if key is in the config and not in 
the environment, this method will still try to search there and return the 
value. Whilst searching only in config does not cover all cases.

So [~irashid] , [~maropu] could you please review status of this issue since 
it's Marked as Resolved (Incomplete) while the error is still easily 
reproducible and easily fixable as well?

Thanks!

 

 

> Standalone does not work with spark.authenticate.secret and 
> deploy-mode=cluster
> ---
>
> Key: SPARK-25078
> URL: https://issues.apache.org/jira/browse/SPARK-25078
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy
>Affects Versions: 2.4.0
>Reporter: Imran Rashid
>Priority: Major
>  Labels: bulk-closed
>
> When running a spark standalone cluster with spark.authenticate.secret setup, 
> you cannot submit a program in cluster mode, even with the right secret.  The 
> driver fails with:
> {noformat}
> 18/08/09 08:17:21 INFO SecurityManager: SecurityManager: authentication 
> enabled; ui acls disabled; users  with view permissions: Set(systest); groups 
> with view permissions: Set(); users  with modify permissions: Set(systest); 
> groups with modify permissions: Set()
> 18/08/09 08:17:21 ERROR SparkContext: Error initializing SparkContext.
> java.lang.IllegalArgumentException: requirement failed: A secret key must be 
> specified via the spark.authenticate.secret config.
> at scala.Predef$.require(Predef.scala:224)
> at 
> org.apache.spark.SecurityManager.initializeAuth(SecurityManager.scala:361)
> at org.apache.spark.SparkEnv$.create(SparkEnv.scala:238)
> at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:175)
> at 
> org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:257)
> at org.apache.spark.SparkContext.(SparkContext.scala:424)
> ...
> {noformat}
> but its actually doing the wrong check in 
> {{SecurityManager.initializeAuth()}}.  The secret is there, its just in an 
> environment variable {{_SPARK_AUTH_SECRET}} (so its not visible to another 
> process).
> *Workaround*: In your program, you can pass in a dummy secret to your spark 
> conf.  It doesn't matter what it is at all, later it'll be ignored and when 
> establishing connections, the secret from the env variable will be used.  Eg.
> {noformat}
> val conf = new SparkConf()
> conf.setIfMissing("spark.authenticate.secret", "doesn't matter")
> val sc = new SparkContext(conf)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45609) Include SqlState in SparkThrowable proto message

2023-10-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45609:
---
Labels: pull-request-available  (was: )

> Include SqlState in SparkThrowable proto message
> 
>
> Key: SPARK-45609
> URL: https://issues.apache.org/jira/browse/SPARK-45609
> Project: Spark
>  Issue Type: Test
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Yihong He
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45609) Include SqlState in SparkThrowable proto message

2023-10-19 Thread Yihong He (Jira)

Yihong He created SPARK-45609:
-

 Summary: Include SqlState in SparkThrowable proto message
 Key: SPARK-45609
 URL: https://issues.apache.org/jira/browse/SPARK-45609
 Project: Spark
  Issue Type: Test
  Components: Connect
Affects Versions: 4.0.0
Reporter: Yihong He






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-45598) Delta table 3.0.0 not working with Spark Connect 3.5.0

2023-10-19 Thread Faiz Halde (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1246#comment-1246
 ] 

Faiz Halde edited comment on SPARK-45598 at 10/19/23 4:04 PM:
--

Hi [~sdaberdaku] , corrected the title. I tested it with 3.0.0 delta. What I 
meant was, delta table does not work with {*}spark connect{*}. It does work 
with vanilla spark 3.5.0 otherwise


was (Author: JIRAUSER300204):
Hi [~sdaberdaku] , corrected the title. I tested it with 3.0.0 delta. What I 
meant was, delta table does not work with {*}spark connect{*}. It does work 
with spark 3.5.0 otherwise

> Delta table 3.0.0 not working with Spark Connect 3.5.0
> --
>
> Key: SPARK-45598
> URL: https://issues.apache.org/jira/browse/SPARK-45598
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Faiz Halde
>Priority: Major
>
> Spark version 3.5.0
> Spark Connect version 3.5.0
> Delta table 3.0-rc2
> Spark connect server was started using
> *{{./sbin/start-connect-server.sh --master spark://localhost:7077 --packages 
> org.apache.spark:spark-connect_2.12:3.5.0,io.delta:delta-spark_2.12:3.0.0rc2 
> --conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" --conf 
> "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"
>  --conf 
> 'spark.jars.repositories=[https://oss.sonatype.org/content/repositories/iodelta-1120']}}*
> {{Connect client depends on}}
> *libraryDependencies += "io.delta" %% "delta-spark" % "3.0.0rc2"*
> *and the connect libraries*
>  
> When trying to run a simple job that writes to a delta table
> {{val spark = SparkSession.builder().remote("sc://localhost").getOrCreate()}}
> {{val data = spark.read.json("profiles.json")}}
> {{data.write.format("delta").save("/tmp/delta")}}
>  
> {{Error log in connect client}}
> {{Exception in thread "main" org.apache.spark.SparkException: 
> io.grpc.StatusRuntimeException: INTERNAL: Job aborted due to stage failure: 
> Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in 
> stage 1.0 (TID 4) (172.23.128.15 executor 0): java.lang.ClassCastException: 
> cannot assign instance of java.lang.invoke.SerializedLambda to field 
> org.apache.spark.sql.catalyst.expressions.ScalaUDF.f of type scala.Function1 
> in instance of org.apache.spark.sql.catalyst.expressions.ScalaUDF}}
> {{    at 
> java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2301)}}
> {{    at 
> java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1431)}}
> {{    at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2437)}}
> {{    at 
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)}}
> {{    at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)}}
> {{    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)}}
> {{    at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119)}}
> {{    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657)}}
> {{    at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)}}
> {{    at 
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)}}
> {{    at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)}}
> {{    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)}}
> {{    at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)}}
> {{    at 
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)}}
> {{    at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)}}
> {{    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)}}
> {{    at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119)}}
> {{    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657)}}
> {{    at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)}}
> {{    at 
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)}}
> {{    at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)}}
> {{    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)}}
> {{...}}
> {{    at 
> org.apache.spark.sql.connect.client.GrpcExceptionConverter$.toThrowable(GrpcExceptionConverter.scala:110)}}
> {{    at 
> org.apache.spark.sql.connect.client.GrpcExceptionConverter$.convert(GrpcExceptionConverter.scala:41)}}
> {{    at 
> org.apache.spark.sql.connect.client.GrpcExceptionConverter$$anon$1.hasNext(GrpcExceptionConverter.scala:49)}}
> {{    at scala.collection.Iterator.foreach(Iterator.scala:943)}}
> {{    at scala.collection.Iterator.foreach$(Iterator.scala:943)}}
> {{    at 
>

[jira] [Updated] (SPARK-45368) Remove scala2.12 compatibility logic for DoubleType, FloatType, Decimal

2023-10-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45368:
---
Labels: pull-request-available  (was: )

> Remove scala2.12 compatibility logic for DoubleType, FloatType, Decimal
> ---
>
> Key: SPARK-45368
> URL: https://issues.apache.org/jira/browse/SPARK-45368
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-45601) stackoverflow when executing rule ExtractWindowExpressions

2023-10-19 Thread Bruce Robbins (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1304#comment-1304
 ] 

Bruce Robbins commented on SPARK-45601:
---

Possibly SPARK-38666

> stackoverflow when executing rule ExtractWindowExpressions
> --
>
> Key: SPARK-45601
> URL: https://issues.apache.org/jira/browse/SPARK-45601
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.3
>Reporter: JacobZheng
>Priority: Major
>
> I am encountering stackoverflow errors while executing the following test 
> case. I looked at the source code and it is ExtractWindowExpressions not 
> extracting the window correctly and encountering a dead loop at 
> resolveOperatorsDownWithPruning that is causing it.
> {code:scala}
>  test("agg filter contains window") {
> val src = Seq((1, "b", "c")).toDF("col1", "col2", "col3")
>   .withColumn("test",
> expr("count(col1) filter (where min(col1) over(partition by col2 
> order by col3)>1)"))
> src.show()
>   }
> {code}
> Now my question is this kind of in agg filter (window) is the correct usage? 
> Or should I add a check like spark sql and throw an error "It is not allowed 
> to use window functions inside WHERE clause"?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45608) Migrate SchemaColumnConvertNotSupportedException onto DATATYPE_MISMATCH error classes

2023-10-19 Thread Zamil Majdy (Jira)

Zamil Majdy created SPARK-45608:
---

 Summary: Migrate SchemaColumnConvertNotSupportedException onto 
DATATYPE_MISMATCH error classes
 Key: SPARK-45608
 URL: https://issues.apache.org/jira/browse/SPARK-45608
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core
Affects Versions: 3.5.0
Reporter: Zamil Majdy


SchemaColumnConvertNotSupportedException is not currently part of 
SparkThrowable.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45569) Assign name to _LEGACY_ERROR_TEMP_2152

2023-10-19 Thread Deng Ziming (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deng Ziming updated SPARK-45569:

Description: 
Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2152* defined in 
{*}core/src/main/resources/error/error-classes.json{*}. The name should be 
short but complete (look at the example in error-classes.json).

Add a test which triggers the error from user code if such test still doesn't 
exist. Check exception fields by using {*}checkError(){*}. The last function 
checks valuable error fields only, and avoids dependencies from error text 
message. In this way, tech editors can modify error format in 
error-classes.json, and don't worry of Spark's internal tests. Migrate other 
tests that might trigger the error onto checkError().

If you cannot reproduce the error from user space (using SQL query), replace 
the error by an internal error, see {*}SparkException.internalError(){*}.

Improve the error message format in error-classes.json if the current is not 
clear. Propose a solution to users how to avoid and fix such kind of errors.

Please, look at the PR below as examples:
 * [https://github.com/apache/spark/pull/38685]
 * [https://github.com/apache/spark/pull/38656]
 * [https://github.com/apache/spark/pull/38490]

  was:
in DatasetSuite test("CLASS_UNSUPPORTED_BY_MAP_OBJECTS when creating dataset") 
, we are using _LEGACY_ERROR_TEMP_2151, We should use proper error class name 
rather than `_LEGACY_ERROR_TEMP_xxx`.

 

*NOTE:* Please reply to this ticket before start working on it, to avoid 
working on same ticket at a time


> Assign name to _LEGACY_ERROR_TEMP_2152
> --
>
> Key: SPARK-45569
> URL: https://issues.apache.org/jira/browse/SPARK-45569
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Deng Ziming
>Assignee: Deng Ziming
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2152* defined in 
> {*}core/src/main/resources/error/error-classes.json{*}. The name should be 
> short but complete (look at the example in error-classes.json).
> Add a test which triggers the error from user code if such test still doesn't 
> exist. Check exception fields by using {*}checkError(){*}. The last function 
> checks valuable error fields only, and avoids dependencies from error text 
> message. In this way, tech editors can modify error format in 
> error-classes.json, and don't worry of Spark's internal tests. Migrate other 
> tests that might trigger the error onto checkError().
> If you cannot reproduce the error from user space (using SQL query), replace 
> the error by an internal error, see {*}SparkException.internalError(){*}.
> Improve the error message format in error-classes.json if the current is not 
> clear. Propose a solution to users how to avoid and fix such kind of errors.
> Please, look at the PR below as examples:
>  * [https://github.com/apache/spark/pull/38685]
>  * [https://github.com/apache/spark/pull/38656]
>  * [https://github.com/apache/spark/pull/38490]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45573) Assign name to _LEGACY_ERROR_TEMP_2153

2023-10-19 Thread Deng Ziming (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deng Ziming updated SPARK-45573:

Description: 
Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2153* defined in 
{*}core/src/main/resources/error/error-classes.json{*}. The name should be 
short but complete (look at the example in error-classes.json).

Add a test which triggers the error from user code if such test still doesn't 
exist. Check exception fields by using {*}checkError(){*}. The last function 
checks valuable error fields only, and avoids dependencies from error text 
message. In this way, tech editors can modify error format in 
error-classes.json, and don't worry of Spark's internal tests. Migrate other 
tests that might trigger the error onto checkError().

If you cannot reproduce the error from user space (using SQL query), replace 
the error by an internal error, see {*}SparkException.internalError(){*}.

Improve the error message format in error-classes.json if the current is not 
clear. Propose a solution to users how to avoid and fix such kind of errors.

Please, look at the PR below as examples:
 * [https://github.com/apache/spark/pull/38685]
 * [https://github.com/apache/spark/pull/38656]
 * [https://github.com/apache/spark/pull/38490]

  was:
in DatasetSuite test("CLASS_UNSUPPORTED_BY_MAP_OBJECTS when creating dataset") 
, we are using _LEGACY_ERROR_TEMP_2151, We should use proper error class name 
rather than `_LEGACY_ERROR_TEMP_xxx`.

 

*NOTE:* Please reply to this ticket before start working on it, to avoid 
working on same ticket at a time


> Assign name to _LEGACY_ERROR_TEMP_2153
> --
>
> Key: SPARK-45573
> URL: https://issues.apache.org/jira/browse/SPARK-45573
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Deng Ziming
>Assignee: Deng Ziming
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2153* defined in 
> {*}core/src/main/resources/error/error-classes.json{*}. The name should be 
> short but complete (look at the example in error-classes.json).
> Add a test which triggers the error from user code if such test still doesn't 
> exist. Check exception fields by using {*}checkError(){*}. The last function 
> checks valuable error fields only, and avoids dependencies from error text 
> message. In this way, tech editors can modify error format in 
> error-classes.json, and don't worry of Spark's internal tests. Migrate other 
> tests that might trigger the error onto checkError().
> If you cannot reproduce the error from user space (using SQL query), replace 
> the error by an internal error, see {*}SparkException.internalError(){*}.
> Improve the error message format in error-classes.json if the current is not 
> clear. Propose a solution to users how to avoid and fix such kind of errors.
> Please, look at the PR below as examples:
>  * [https://github.com/apache/spark/pull/38685]
>  * [https://github.com/apache/spark/pull/38656]
>  * [https://github.com/apache/spark/pull/38490]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-45598) Delta table 3.0.0 not working with Spark Connect 3.5.0

2023-10-19 Thread Faiz Halde (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1246#comment-1246
 ] 

Faiz Halde commented on SPARK-45598:


Hi [~sdaberdaku] , corrected the title. I tested it with 3.0.0 delta. What I 
meant was, delta table does not work with {*}spark connect{*}. It does work 
with spark 3.5.0 otherwise

> Delta table 3.0.0 not working with Spark Connect 3.5.0
> --
>
> Key: SPARK-45598
> URL: https://issues.apache.org/jira/browse/SPARK-45598
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Faiz Halde
>Priority: Major
>
> Spark version 3.5.0
> Spark Connect version 3.5.0
> Delta table 3.0-rc2
> Spark connect server was started using
> *{{./sbin/start-connect-server.sh --master spark://localhost:7077 --packages 
> org.apache.spark:spark-connect_2.12:3.5.0,io.delta:delta-spark_2.12:3.0.0rc2 
> --conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" --conf 
> "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"
>  --conf 
> 'spark.jars.repositories=[https://oss.sonatype.org/content/repositories/iodelta-1120']}}*
> {{Connect client depends on}}
> *libraryDependencies += "io.delta" %% "delta-spark" % "3.0.0rc2"*
> *and the connect libraries*
>  
> When trying to run a simple job that writes to a delta table
> {{val spark = SparkSession.builder().remote("sc://localhost").getOrCreate()}}
> {{val data = spark.read.json("profiles.json")}}
> {{data.write.format("delta").save("/tmp/delta")}}
>  
> {{Error log in connect client}}
> {{Exception in thread "main" org.apache.spark.SparkException: 
> io.grpc.StatusRuntimeException: INTERNAL: Job aborted due to stage failure: 
> Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in 
> stage 1.0 (TID 4) (172.23.128.15 executor 0): java.lang.ClassCastException: 
> cannot assign instance of java.lang.invoke.SerializedLambda to field 
> org.apache.spark.sql.catalyst.expressions.ScalaUDF.f of type scala.Function1 
> in instance of org.apache.spark.sql.catalyst.expressions.ScalaUDF}}
> {{    at 
> java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2301)}}
> {{    at 
> java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1431)}}
> {{    at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2437)}}
> {{    at 
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)}}
> {{    at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)}}
> {{    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)}}
> {{    at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119)}}
> {{    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657)}}
> {{    at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)}}
> {{    at 
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)}}
> {{    at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)}}
> {{    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)}}
> {{    at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)}}
> {{    at 
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)}}
> {{    at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)}}
> {{    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)}}
> {{    at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119)}}
> {{    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657)}}
> {{    at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)}}
> {{    at 
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)}}
> {{    at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)}}
> {{    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)}}
> {{...}}
> {{    at 
> org.apache.spark.sql.connect.client.GrpcExceptionConverter$.toThrowable(GrpcExceptionConverter.scala:110)}}
> {{    at 
> org.apache.spark.sql.connect.client.GrpcExceptionConverter$.convert(GrpcExceptionConverter.scala:41)}}
> {{    at 
> org.apache.spark.sql.connect.client.GrpcExceptionConverter$$anon$1.hasNext(GrpcExceptionConverter.scala:49)}}
> {{    at scala.collection.Iterator.foreach(Iterator.scala:943)}}
> {{    at scala.collection.Iterator.foreach$(Iterator.scala:943)}}
> {{    at 
> org.apache.spark.sql.connect.client.GrpcExceptionConverter$$anon$1.foreach(GrpcExceptionConverter.scala:46)}}
> {{    at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)}}
> {{    at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)}}
> {{    at

[jira] [Updated] (SPARK-45598) Delta table 3.0.0 not working with Spark Connect 3.5.0

2023-10-19 Thread Faiz Halde (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Faiz Halde updated SPARK-45598:
---
Summary: Delta table 3.0.0 not working with Spark Connect 3.5.0  (was: 
Delta table 3.0-rc2 not working with Spark Connect 3.5.0)

> Delta table 3.0.0 not working with Spark Connect 3.5.0
> --
>
> Key: SPARK-45598
> URL: https://issues.apache.org/jira/browse/SPARK-45598
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Faiz Halde
>Priority: Major
>
> Spark version 3.5.0
> Spark Connect version 3.5.0
> Delta table 3.0-rc2
> Spark connect server was started using
> *{{./sbin/start-connect-server.sh --master spark://localhost:7077 --packages 
> org.apache.spark:spark-connect_2.12:3.5.0,io.delta:delta-spark_2.12:3.0.0rc2 
> --conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" --conf 
> "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"
>  --conf 
> 'spark.jars.repositories=[https://oss.sonatype.org/content/repositories/iodelta-1120']}}*
> {{Connect client depends on}}
> *libraryDependencies += "io.delta" %% "delta-spark" % "3.0.0rc2"*
> *and the connect libraries*
>  
> When trying to run a simple job that writes to a delta table
> {{val spark = SparkSession.builder().remote("sc://localhost").getOrCreate()}}
> {{val data = spark.read.json("profiles.json")}}
> {{data.write.format("delta").save("/tmp/delta")}}
>  
> {{Error log in connect client}}
> {{Exception in thread "main" org.apache.spark.SparkException: 
> io.grpc.StatusRuntimeException: INTERNAL: Job aborted due to stage failure: 
> Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in 
> stage 1.0 (TID 4) (172.23.128.15 executor 0): java.lang.ClassCastException: 
> cannot assign instance of java.lang.invoke.SerializedLambda to field 
> org.apache.spark.sql.catalyst.expressions.ScalaUDF.f of type scala.Function1 
> in instance of org.apache.spark.sql.catalyst.expressions.ScalaUDF}}
> {{    at 
> java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2301)}}
> {{    at 
> java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1431)}}
> {{    at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2437)}}
> {{    at 
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)}}
> {{    at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)}}
> {{    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)}}
> {{    at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119)}}
> {{    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657)}}
> {{    at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)}}
> {{    at 
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)}}
> {{    at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)}}
> {{    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)}}
> {{    at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)}}
> {{    at 
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)}}
> {{    at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)}}
> {{    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)}}
> {{    at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119)}}
> {{    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657)}}
> {{    at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)}}
> {{    at 
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)}}
> {{    at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)}}
> {{    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)}}
> {{...}}
> {{    at 
> org.apache.spark.sql.connect.client.GrpcExceptionConverter$.toThrowable(GrpcExceptionConverter.scala:110)}}
> {{    at 
> org.apache.spark.sql.connect.client.GrpcExceptionConverter$.convert(GrpcExceptionConverter.scala:41)}}
> {{    at 
> org.apache.spark.sql.connect.client.GrpcExceptionConverter$$anon$1.hasNext(GrpcExceptionConverter.scala:49)}}
> {{    at scala.collection.Iterator.foreach(Iterator.scala:943)}}
> {{    at scala.collection.Iterator.foreach$(Iterator.scala:943)}}
> {{    at 
> org.apache.spark.sql.connect.client.GrpcExceptionConverter$$anon$1.foreach(GrpcExceptionConverter.scala:46)}}
> {{    at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)}}
> {{    at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)}}
> {{    at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105)}}
> {{    at

[jira] [Resolved] (SPARK-45573) Assign name to _LEGACY_ERROR_TEMP_2153

2023-10-19 Thread Max Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-45573.
--
Resolution: Fixed

> Assign name to _LEGACY_ERROR_TEMP_2153
> --
>
> Key: SPARK-45573
> URL: https://issues.apache.org/jira/browse/SPARK-45573
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Deng Ziming
>Assignee: Deng Ziming
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> in DatasetSuite test("CLASS_UNSUPPORTED_BY_MAP_OBJECTS when creating 
> dataset") , we are using _LEGACY_ERROR_TEMP_2151, We should use proper error 
> class name rather than `_LEGACY_ERROR_TEMP_xxx`.
>  
> *NOTE:* Please reply to this ticket before start working on it, to avoid 
> working on same ticket at a time



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-45573) Assign name to _LEGACY_ERROR_TEMP_2153

2023-10-19 Thread Max Gekk (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1238#comment-1238
 ] 

Max Gekk commented on SPARK-45573:
--

Resolved by https://github.com/apache/spark/pull/43414

> Assign name to _LEGACY_ERROR_TEMP_2153
> --
>
> Key: SPARK-45573
> URL: https://issues.apache.org/jira/browse/SPARK-45573
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Deng Ziming
>Assignee: Deng Ziming
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> in DatasetSuite test("CLASS_UNSUPPORTED_BY_MAP_OBJECTS when creating 
> dataset") , we are using _LEGACY_ERROR_TEMP_2151, We should use proper error 
> class name rather than `_LEGACY_ERROR_TEMP_xxx`.
>  
> *NOTE:* Please reply to this ticket before start working on it, to avoid 
> working on same ticket at a time



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45543) InferWindowGroupLimit causes bug if the other window functions haven't the same window frame as the rank-like functions

2023-10-19 Thread Jiaan Geng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiaan Geng resolved SPARK-45543.

Fix Version/s: 3.5.1
   4.0.0
   Resolution: Fixed

Issue resolved by pull request 43385
[https://github.com/apache/spark/pull/43385]

> InferWindowGroupLimit causes bug if the other window functions haven't the 
> same window frame as the rank-like functions
> ---
>
> Key: SPARK-45543
> URL: https://issues.apache.org/jira/browse/SPARK-45543
> Project: Spark
>  Issue Type: Bug
>  Components: Optimizer, Spark Core, SQL
>Affects Versions: 3.5.0
>Reporter: Ron Serruya
>Assignee: Jiaan Geng
>Priority: Critical
>  Labels: correctness, data-loss, pull-request-available
> Fix For: 3.5.1, 4.0.0
>
>
> First, it's my first bug, so I'm hoping I'm doing it right, also, as I'm not 
> very knowledgeable about spark internals, I hope I diagnosed the problem 
> correctly
> I found the degradation in spark version 3.5.0:
> When using multiple windows that share the same partition and ordering (but 
> with different "frame boundaries", where one window is a ranking function, 
> "WindowGroupLimit" is added to the plan causing wrong values to be created 
> from the other windows.
> *This behavior didn't exist in versions 3.3 and 3.4.*
> Example:
>  
> {code:python}
> import pysparkfrom pyspark.sql import functions as F, Window  
> df = spark.createDataFrame([
> {'row_id': 1, 'name': 'Dave', 'score': 1, 'year': 2020},
> {'row_id': 1, 'name': 'Dave', 'score': 2, 'year': 2022},
> {'row_id': 1, 'name': 'Dave', 'score': 3, 'year': 2023},
> {'row_id': 2, 'name': 'Amy', 'score': 6, 'year': 2021},
> ])
> # Create first window for row number
> window_spec = Window.partitionBy('row_id', 'name').orderBy(F.desc('year'))
> # Create additional window from the first window with unbounded frame
> unbound_spec = window_spec.rowsBetween(Window.unboundedPreceding, 
> Window.unboundedFollowing)
> # Try to keep the first row by year, and also collect all scores into a list
> df2 = df.withColumn(
> 'rn', 
> F.row_number().over(window_spec)
> ).withColumn(
> 'all_scores', 
> F.collect_list('score').over(unbound_spec)
> ){code}
> So far everything works, and if we display df2:
>  
> {noformat}
> ++--+-++---+--+
> |name|row_id|score|year|rn |all_scores|
> ++--+-++---+--+
> |Dave|1 |3|2023|1  |[3, 2, 1] |
> |Dave|1 |2|2022|2  |[3, 2, 1] |
> |Dave|1 |1|2020|3  |[3, 2, 1] |
> |Amy |2 |6|2021|1  |[6]   |
> ++--+-++---+--+{noformat}
>  
> However, once we filter to keep only the first row number:
>  
> {noformat}
> df2.filter("rn=1").show(truncate=False)
> ++--+-++---+--+
> |name|row_id|score|year|rn |all_scores|
> ++--+-++---+--+
> |Dave|1 |3|2023|1  |[3]   |
> |Amy |2 |6|2021|1  |[6]   |
> ++--+-++---+--+{noformat}
> As you can see just filtering changed the "all_scores" array for Dave.
> (This example uses `collect_list`, however, the same result happens with 
> other functions, such as max, min, count, etc)
>  
> Now, if instead of using the two windows we used, I will use the first window 
> and a window with different ordering, or create a completely new window with 
> same partition but no ordering, it will work fine:
> {code:python}
> new_window = Window.partitionBy('row_id', 
> 'name').rowsBetween(Window.unboundedPreceding, Window.unboundedFollowing)
> df3 = df.withColumn(
> 'rn',
> F.row_number().over(window_spec)
> ).withColumn(
> 'all_scores',
> F.collect_list('score').over(new_window)
> )
> df3.filter("rn=1").show(truncate=False){code}
> {noformat}
> ++--+-++---+--+
> |name|row_id|score|year|rn |all_scores|
> ++--+-++---+--+
> |Dave|1 |3|2023|1  |[3, 2, 1] |
> |Amy |2 |6|2021|1  |[6]   |
> ++--+-++---+--+
> {noformat}
> In addition, if we use all 3 windows to create 3 different columns, it will 
> also work ok. So it seems the issue happens only when all the windows used 
> share the same partition and ordering.
> Here is the final plan for the faulty dataframe:
> {noformat}
> df2.filter("rn=1").explain()
> == Physical Plan ==
> AdaptiveSparkPlan isFinalPlan=false
> +- Filter (rn#9 = 1)
>    +- Window [row_number() windowspecdefinition(row_id#1L, name#0, year#3L 
> DESC NULLS LAST, specifiedwindowframe(RowFrame, unboundedpreceding$(), 
> currentrow$())) AS rn#9, collect_list(score#2L, 0, 0) 
> windowspecdefinition(row_id#1L, name#0, year#3L DESC NULLS

[jira] [Commented] (SPARK-45598) Delta table 3.0-rc2 not working with Spark Connect 3.5.0

2023-10-19 Thread Sebastian Daberdaku (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1172#comment-1172
 ] 

Sebastian Daberdaku commented on SPARK-45598:
-

Hello [~haldefaiz], you need to use the latest delta-spark version 3.0.0 which 
came out just yesterday. It now supports delta with Spark 3.5.0.
[https://github.com/delta-io/delta/releases/tag/v3.0.0]

> Delta table 3.0-rc2 not working with Spark Connect 3.5.0
> 
>
> Key: SPARK-45598
> URL: https://issues.apache.org/jira/browse/SPARK-45598
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Faiz Halde
>Priority: Major
>
> Spark version 3.5.0
> Spark Connect version 3.5.0
> Delta table 3.0-rc2
> Spark connect server was started using
> *{{./sbin/start-connect-server.sh --master spark://localhost:7077 --packages 
> org.apache.spark:spark-connect_2.12:3.5.0,io.delta:delta-spark_2.12:3.0.0rc2 
> --conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" --conf 
> "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"
>  --conf 
> 'spark.jars.repositories=[https://oss.sonatype.org/content/repositories/iodelta-1120']}}*
> {{Connect client depends on}}
> *libraryDependencies += "io.delta" %% "delta-spark" % "3.0.0rc2"*
> *and the connect libraries*
>  
> When trying to run a simple job that writes to a delta table
> {{val spark = SparkSession.builder().remote("sc://localhost").getOrCreate()}}
> {{val data = spark.read.json("profiles.json")}}
> {{data.write.format("delta").save("/tmp/delta")}}
>  
> {{Error log in connect client}}
> {{Exception in thread "main" org.apache.spark.SparkException: 
> io.grpc.StatusRuntimeException: INTERNAL: Job aborted due to stage failure: 
> Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in 
> stage 1.0 (TID 4) (172.23.128.15 executor 0): java.lang.ClassCastException: 
> cannot assign instance of java.lang.invoke.SerializedLambda to field 
> org.apache.spark.sql.catalyst.expressions.ScalaUDF.f of type scala.Function1 
> in instance of org.apache.spark.sql.catalyst.expressions.ScalaUDF}}
> {{    at 
> java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2301)}}
> {{    at 
> java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1431)}}
> {{    at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2437)}}
> {{    at 
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)}}
> {{    at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)}}
> {{    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)}}
> {{    at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119)}}
> {{    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657)}}
> {{    at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)}}
> {{    at 
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)}}
> {{    at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)}}
> {{    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)}}
> {{    at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)}}
> {{    at 
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)}}
> {{    at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)}}
> {{    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)}}
> {{    at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119)}}
> {{    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657)}}
> {{    at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)}}
> {{    at 
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)}}
> {{    at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)}}
> {{    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)}}
> {{...}}
> {{    at 
> org.apache.spark.sql.connect.client.GrpcExceptionConverter$.toThrowable(GrpcExceptionConverter.scala:110)}}
> {{    at 
> org.apache.spark.sql.connect.client.GrpcExceptionConverter$.convert(GrpcExceptionConverter.scala:41)}}
> {{    at 
> org.apache.spark.sql.connect.client.GrpcExceptionConverter$$anon$1.hasNext(GrpcExceptionConverter.scala:49)}}
> {{    at scala.collection.Iterator.foreach(Iterator.scala:943)}}
> {{    at scala.collection.Iterator.foreach$(Iterator.scala:943)}}
> {{    at 
> org.apache.spark.sql.connect.client.GrpcExceptionConverter$$anon$1.foreach(GrpcExceptionConverter.scala:46)}}
> {{    at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)}}
> {{    at

[jira] [Commented] (SPARK-45289) ClassCastException when reading Delta table on AWS S3

2023-10-19 Thread Sebastian Daberdaku (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1170#comment-1170
 ] 

Sebastian Daberdaku commented on SPARK-45289:
-

Hello [~tanawatpan], you need to use the latest delta-spark version 3.0.0 which 
came out just yesterday. It now supports delta with Spark 3.5.0.
https://github.com/delta-io/delta/releases/tag/v3.0.0

> ClassCastException when reading Delta table on AWS S3
> -
>
> Key: SPARK-45289
> URL: https://issues.apache.org/jira/browse/SPARK-45289
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 3.5.0
> Environment: Spark version: 3.5.0
> Deployment mode: spark-shell
> OS: Ubuntu (Docker image)
> Java/JVM version: OpenJDK 11
> Packages: hadoop-aws:3.3.4, delta-core_2.12:2.4.0
>Reporter: Tanawat Panmongkol
>Priority: Major
>
> When attempting to read a Delta table from S3 using version 3.5.0, a 
> _*{{ClassCastException}}*_ occurs involving 
> {{_*org.apache.hadoop.fs.s3a.S3AFileStatus*_}} and 
> {_}*{{org.apache.spark.sql.execution.datasources.FileStatusWithMetadata}}*{_}.
>  The error appears to be related to the new feature SPARK-43039.
> _*Steps to Reproduce:*_
> {code:java}
> export AWS_ACCESS_KEY_ID=''
> export AWS_SECRET_ACCESS_KEY=''
> export AWS_REGION=''
> docker run --rm -it apache/spark:3.5.0-scala2.12-java11-ubuntu 
> /opt/spark/bin/spark-shell \
> --packages 
> 'org.apache.hadoop:hadoop-aws:3.3.4,io.delta:delta-core_2.12:2.4.0' \
> --conf 
> "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"
>  \
> --conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" \
> --conf "spark.hadoop.aws.region=$AWS_REGION" \
> --conf "spark.hadoop.fs.s3a.access.key=$AWS_ACCESS_KEY_ID" \
> --conf "spark.hadoop.fs.s3a.secret.key=$AWS_SECRET_ACCESS_KEY" \
> --conf "spark.hadoop.fs.s3.impl=org.apache.hadoop.fs.s3a.S3AFileSystem" \
> --conf "spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem" \
> --conf "spark.hadoop.fs.s3a.path.style.access=true" \
> --conf "spark.hadoop.fs.s3a.connection.ssl.enabled=true" \
> --conf "spark.jars.ivy=/tmp/ivy/cache"{code}
> {code:java}
> scala> 
> spark.read.format("delta").load("s3:").show()
>  {code}
> *Logs:*
> {code:java}
> java.lang.ClassCastException: class org.apache.hadoop.fs.s3a.S3AFileStatus 
> cannot be cast to class 
> org.apache.spark.sql.execution.datasources.FileStatusWithMetadata 
> (org.apache.hadoop.fs.s3a.S3AFileStatus is in unnamed module of loader 
> scala.reflect.internal.util.ScalaClassLoader$URLClassLoader @4552f905; 
> org.apache.spark.sql.execution.datasources.FileStatusWithMetadata is in 
> unnamed module of loader 'app')
>   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
>   at 
> scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
>   at 
> scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
>   at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)
>   at scala.collection.TraversableLike.map(TraversableLike.scala:286)
>   at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:108)
>   at 
> org.apache.spark.sql.execution.FileSourceScanLike.$anonfun$setFilesNumAndSizeMetric$2(DataSourceScanExec.scala:466)
>   at 
> org.apache.spark.sql.execution.FileSourceScanLike.$anonfun$setFilesNumAndSizeMetric$2$adapted(DataSourceScanExec.scala:466)
>   at scala.collection.immutable.List.map(List.scala:293)
>   at 
> org.apache.spark.sql.execution.FileSourceScanLike.setFilesNumAndSizeMetric(DataSourceScanExec.scala:466)
>   at 
> org.apache.spark.sql.execution.FileSourceScanLike.selectedPartitions(DataSourceScanExec.scala:257)
>   at 
> org.apache.spark.sql.execution.FileSourceScanLike.selectedPartitions$(DataSourceScanExec.scala:251)
>   at 
> org.apache.spark.sql.execution.FileSourceScanExec.selectedPartitions$lzycompute(DataSourceScanExec.scala:506)
>   at 
> org.apache.spark.sql.execution.FileSourceScanExec.selectedPartitions(DataSourceScanExec.scala:506)
>   at 
> org.apache.spark.sql.execution.FileSourceScanLike.dynamicallySelectedPartitions(DataSourceScanExec.scala:286)
>   at 
> org.apache.spark.sql.execution.FileSourceScanLike.dynamicallySelectedPartitions$(DataSourceScanExec.scala:267)
>   at 
> org.apache.spark.sql.execution.FileSourceScanExec.dynamicallySelectedPartitions$lzycompute(DataSourceScanExec.scala:506)
>   at 
> org.apache.spark.sql.execution.FileSourceScanExec.dynamicallySelectedPartitions(DataSourceScanExec.scala:506)
>   at 
> org.apache.spark.sql.execution.FileSourceScanExec.inputRDD$lzycompute(DataSourceScanExec.scala:553)
>   at 
>

[jira] [Assigned] (SPARK-45428) Add Matomo analytics to all released docs pages

2023-10-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45428:
--

Assignee: BingKun Pan  (was: Apache Spark)

> Add Matomo analytics to all released docs pages
> ---
>
> Key: SPARK-45428
> URL: https://issues.apache.org/jira/browse/SPARK-45428
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 3.5.0, 4.0.0
>Reporter: Allison Wang
>Assignee: BingKun Pan
>Priority: Major
>  Labels: pull-request-available
>
> Matomo analytics has been added to some pages of the Spark website. Here is 
> Sean's initial PR: 
> [https://github.com/apache/spark-website/pull/479.|https://www.google.com/url?q=https://github.com/apache/spark-website/pull/479=D=docs=1696544881650480=AOvVaw11SNfWcd4UJzlO8EJvzdoe]
> You can find analytics for Spark website here: https://analytics.apache.org
> We need to add this to all API pages. This is very important for us to 
> prioritize documentation improvements and search engine optimization.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45428) Add Matomo analytics to all released docs pages

2023-10-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45428:
--

Assignee: Apache Spark  (was: BingKun Pan)

> Add Matomo analytics to all released docs pages
> ---
>
> Key: SPARK-45428
> URL: https://issues.apache.org/jira/browse/SPARK-45428
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 3.5.0, 4.0.0
>Reporter: Allison Wang
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>
> Matomo analytics has been added to some pages of the Spark website. Here is 
> Sean's initial PR: 
> [https://github.com/apache/spark-website/pull/479.|https://www.google.com/url?q=https://github.com/apache/spark-website/pull/479=D=docs=1696544881650480=AOvVaw11SNfWcd4UJzlO8EJvzdoe]
> You can find analytics for Spark website here: https://analytics.apache.org
> We need to add this to all API pages. This is very important for us to 
> prioritize documentation improvements and search engine optimization.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45428) Add Matomo analytics to all released docs pages

2023-10-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45428:
--

Assignee: Apache Spark  (was: BingKun Pan)

> Add Matomo analytics to all released docs pages
> ---
>
> Key: SPARK-45428
> URL: https://issues.apache.org/jira/browse/SPARK-45428
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 3.5.0, 4.0.0
>Reporter: Allison Wang
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>
> Matomo analytics has been added to some pages of the Spark website. Here is 
> Sean's initial PR: 
> [https://github.com/apache/spark-website/pull/479.|https://www.google.com/url?q=https://github.com/apache/spark-website/pull/479=D=docs=1696544881650480=AOvVaw11SNfWcd4UJzlO8EJvzdoe]
> You can find analytics for Spark website here: https://analytics.apache.org
> We need to add this to all API pages. This is very important for us to 
> prioritize documentation improvements and search engine optimization.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45428) Add Matomo analytics to all released docs pages

2023-10-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45428:
--

Assignee: BingKun Pan  (was: Apache Spark)

> Add Matomo analytics to all released docs pages
> ---
>
> Key: SPARK-45428
> URL: https://issues.apache.org/jira/browse/SPARK-45428
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 3.5.0, 4.0.0
>Reporter: Allison Wang
>Assignee: BingKun Pan
>Priority: Major
>  Labels: pull-request-available
>
> Matomo analytics has been added to some pages of the Spark website. Here is 
> Sean's initial PR: 
> [https://github.com/apache/spark-website/pull/479.|https://www.google.com/url?q=https://github.com/apache/spark-website/pull/479=D=docs=1696544881650480=AOvVaw11SNfWcd4UJzlO8EJvzdoe]
> You can find analytics for Spark website here: https://analytics.apache.org
> We need to add this to all API pages. This is very important for us to 
> prioritize documentation improvements and search engine optimization.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45428) Add Matomo analytics to all released docs pages

2023-10-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45428:
--

Assignee: Apache Spark  (was: BingKun Pan)

> Add Matomo analytics to all released docs pages
> ---
>
> Key: SPARK-45428
> URL: https://issues.apache.org/jira/browse/SPARK-45428
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 3.5.0, 4.0.0
>Reporter: Allison Wang
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>
> Matomo analytics has been added to some pages of the Spark website. Here is 
> Sean's initial PR: 
> [https://github.com/apache/spark-website/pull/479.|https://www.google.com/url?q=https://github.com/apache/spark-website/pull/479=D=docs=1696544881650480=AOvVaw11SNfWcd4UJzlO8EJvzdoe]
> You can find analytics for Spark website here: https://analytics.apache.org
> We need to add this to all API pages. This is very important for us to 
> prioritize documentation improvements and search engine optimization.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45428) Add Matomo analytics to all released docs pages

2023-10-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45428:
--

Assignee: BingKun Pan  (was: Apache Spark)

> Add Matomo analytics to all released docs pages
> ---
>
> Key: SPARK-45428
> URL: https://issues.apache.org/jira/browse/SPARK-45428
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 3.5.0, 4.0.0
>Reporter: Allison Wang
>Assignee: BingKun Pan
>Priority: Major
>  Labels: pull-request-available
>
> Matomo analytics has been added to some pages of the Spark website. Here is 
> Sean's initial PR: 
> [https://github.com/apache/spark-website/pull/479.|https://www.google.com/url?q=https://github.com/apache/spark-website/pull/479=D=docs=1696544881650480=AOvVaw11SNfWcd4UJzlO8EJvzdoe]
> You can find analytics for Spark website here: https://analytics.apache.org
> We need to add this to all API pages. This is very important for us to 
> prioritize documentation improvements and search engine optimization.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45605) Replace `s.c.MapOps.mapValues` with `s.c.MapOps.view.mapValues`

2023-10-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45605:
--

Assignee: (was: Apache Spark)

>Replace `s.c.MapOps.mapValues` with `s.c.MapOps.view.mapValues`
> --
>
> Key: SPARK-45605
> URL: https://issues.apache.org/jira/browse/SPARK-45605
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, DStreams, Examples, MLlib, Spark Core, SS
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>
> {code:java}
> @deprecated("Use .view.mapValues(f). A future version will include a strict 
> version of this method (for now, .view.mapValues(f).toMap).", "2.13.0")
> def mapValues[W](f: V => W): MapView[K, W] = new MapView.MapValues(this, f) 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45428) Add Matomo analytics to all released docs pages

2023-10-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45428:
--

Assignee: BingKun Pan  (was: Apache Spark)

> Add Matomo analytics to all released docs pages
> ---
>
> Key: SPARK-45428
> URL: https://issues.apache.org/jira/browse/SPARK-45428
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 3.5.0, 4.0.0
>Reporter: Allison Wang
>Assignee: BingKun Pan
>Priority: Major
>  Labels: pull-request-available
>
> Matomo analytics has been added to some pages of the Spark website. Here is 
> Sean's initial PR: 
> [https://github.com/apache/spark-website/pull/479.|https://www.google.com/url?q=https://github.com/apache/spark-website/pull/479=D=docs=1696544881650480=AOvVaw11SNfWcd4UJzlO8EJvzdoe]
> You can find analytics for Spark website here: https://analytics.apache.org
> We need to add this to all API pages. This is very important for us to 
> prioritize documentation improvements and search engine optimization.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45428) Add Matomo analytics to all released docs pages

2023-10-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45428:
--

Assignee: Apache Spark  (was: BingKun Pan)

> Add Matomo analytics to all released docs pages
> ---
>
> Key: SPARK-45428
> URL: https://issues.apache.org/jira/browse/SPARK-45428
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 3.5.0, 4.0.0
>Reporter: Allison Wang
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>
> Matomo analytics has been added to some pages of the Spark website. Here is 
> Sean's initial PR: 
> [https://github.com/apache/spark-website/pull/479.|https://www.google.com/url?q=https://github.com/apache/spark-website/pull/479=D=docs=1696544881650480=AOvVaw11SNfWcd4UJzlO8EJvzdoe]
> You can find analytics for Spark website here: https://analytics.apache.org
> We need to add this to all API pages. This is very important for us to 
> prioritize documentation improvements and search engine optimization.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45428) Add Matomo analytics to all released docs pages

2023-10-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45428:
--

Assignee: BingKun Pan  (was: Apache Spark)

> Add Matomo analytics to all released docs pages
> ---
>
> Key: SPARK-45428
> URL: https://issues.apache.org/jira/browse/SPARK-45428
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 3.5.0, 4.0.0
>Reporter: Allison Wang
>Assignee: BingKun Pan
>Priority: Major
>  Labels: pull-request-available
>
> Matomo analytics has been added to some pages of the Spark website. Here is 
> Sean's initial PR: 
> [https://github.com/apache/spark-website/pull/479.|https://www.google.com/url?q=https://github.com/apache/spark-website/pull/479=D=docs=1696544881650480=AOvVaw11SNfWcd4UJzlO8EJvzdoe]
> You can find analytics for Spark website here: https://analytics.apache.org
> We need to add this to all API pages. This is very important for us to 
> prioritize documentation improvements and search engine optimization.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45428) Add Matomo analytics to all released docs pages

2023-10-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45428:
--

Assignee: Apache Spark  (was: BingKun Pan)

> Add Matomo analytics to all released docs pages
> ---
>
> Key: SPARK-45428
> URL: https://issues.apache.org/jira/browse/SPARK-45428
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 3.5.0, 4.0.0
>Reporter: Allison Wang
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>
> Matomo analytics has been added to some pages of the Spark website. Here is 
> Sean's initial PR: 
> [https://github.com/apache/spark-website/pull/479.|https://www.google.com/url?q=https://github.com/apache/spark-website/pull/479=D=docs=1696544881650480=AOvVaw11SNfWcd4UJzlO8EJvzdoe]
> You can find analytics for Spark website here: https://analytics.apache.org
> We need to add this to all API pages. This is very important for us to 
> prioritize documentation improvements and search engine optimization.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45428) Add Matomo analytics to all released docs pages

2023-10-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45428:
---
Labels: pull-request-available  (was: )

> Add Matomo analytics to all released docs pages
> ---
>
> Key: SPARK-45428
> URL: https://issues.apache.org/jira/browse/SPARK-45428
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 3.5.0, 4.0.0
>Reporter: Allison Wang
>Assignee: BingKun Pan
>Priority: Major
>  Labels: pull-request-available
>
> Matomo analytics has been added to some pages of the Spark website. Here is 
> Sean's initial PR: 
> [https://github.com/apache/spark-website/pull/479.|https://www.google.com/url?q=https://github.com/apache/spark-website/pull/479=D=docs=1696544881650480=AOvVaw11SNfWcd4UJzlO8EJvzdoe]
> You can find analytics for Spark website here: https://analytics.apache.org
> We need to add this to all API pages. This is very important for us to 
> prioritize documentation improvements and search engine optimization.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45594) Auto repartition before writing data into partitioned or bucket table

2023-10-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45594:
--

Assignee: (was: Apache Spark)

>  Auto repartition before writing data into partitioned or bucket table
> --
>
> Key: SPARK-45594
> URL: https://issues.apache.org/jira/browse/SPARK-45594
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Wan Kun
>Priority: Major
>  Labels: pull-request-available
>
> Now, when writing data into partitioned table, there will be at least 
> *dynamicPartitions * ShuffleNum* files; when writing data into bucket table, 
> there will be at least *bucketNums * shuffleNum* files.
> We can shuffle by the dynamic partitions or bucket columns before writing 
> data into the table and will create ShuffleNum files.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45594) Auto repartition before writing data into partitioned or bucket table

2023-10-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45594:
--

Assignee: Apache Spark

>  Auto repartition before writing data into partitioned or bucket table
> --
>
> Key: SPARK-45594
> URL: https://issues.apache.org/jira/browse/SPARK-45594
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Wan Kun
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>
> Now, when writing data into partitioned table, there will be at least 
> *dynamicPartitions * ShuffleNum* files; when writing data into bucket table, 
> there will be at least *bucketNums * shuffleNum* files.
> We can shuffle by the dynamic partitions or bucket columns before writing 
> data into the table and will create ShuffleNum files.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45594) Auto repartition before writing data into partitioned or bucket table

2023-10-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45594:
--

Assignee: (was: Apache Spark)

>  Auto repartition before writing data into partitioned or bucket table
> --
>
> Key: SPARK-45594
> URL: https://issues.apache.org/jira/browse/SPARK-45594
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Wan Kun
>Priority: Major
>  Labels: pull-request-available
>
> Now, when writing data into partitioned table, there will be at least 
> *dynamicPartitions * ShuffleNum* files; when writing data into bucket table, 
> there will be at least *bucketNums * shuffleNum* files.
> We can shuffle by the dynamic partitions or bucket columns before writing 
> data into the table and will create ShuffleNum files.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45607) Collapse repartition operators with a project

2023-10-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45607:
--

Assignee: (was: Apache Spark)

> Collapse repartition operators with a project
> -
>
> Key: SPARK-45607
> URL: https://issues.apache.org/jira/browse/SPARK-45607
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Wan Kun
>Priority: Major
>  Labels: pull-request-available
>
> We can collapse two repartition operators with a project between them.
> For example:
> df.repartition($"a").select($"a", $"b", $"a" + $"b").repartition($"b")
> is same to
> df.select($"a", $"b", $"a" + $"b").repartition($"b")



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45594) Auto repartition before writing data into partitioned or bucket table

2023-10-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45594:
--

Assignee: Apache Spark

>  Auto repartition before writing data into partitioned or bucket table
> --
>
> Key: SPARK-45594
> URL: https://issues.apache.org/jira/browse/SPARK-45594
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Wan Kun
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>
> Now, when writing data into partitioned table, there will be at least 
> *dynamicPartitions * ShuffleNum* files; when writing data into bucket table, 
> there will be at least *bucketNums * shuffleNum* files.
> We can shuffle by the dynamic partitions or bucket columns before writing 
> data into the table and will create ShuffleNum files.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45607) Collapse repartition operators with a project

2023-10-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45607:
--

Assignee: Apache Spark

> Collapse repartition operators with a project
> -
>
> Key: SPARK-45607
> URL: https://issues.apache.org/jira/browse/SPARK-45607
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Wan Kun
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>
> We can collapse two repartition operators with a project between them.
> For example:
> df.repartition($"a").select($"a", $"b", $"a" + $"b").repartition($"b")
> is same to
> df.select($"a", $"b", $"a" + $"b").repartition($"b")



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45607) Collapse repartition operators with a project

2023-10-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45607:
---
Labels: pull-request-available  (was: )

> Collapse repartition operators with a project
> -
>
> Key: SPARK-45607
> URL: https://issues.apache.org/jira/browse/SPARK-45607
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Wan Kun
>Priority: Major
>  Labels: pull-request-available
>
> We can collapse two repartition operators with a project between them.
> For example:
> df.repartition($"a").select($"a", $"b", $"a" + $"b").repartition($"b")
> is same to
> df.select($"a", $"b", $"a" + $"b").repartition($"b")



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45607) Collapse repartition operators with a project

2023-10-19 Thread Wan Kun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wan Kun updated SPARK-45607:

Summary: Collapse repartition operators with a project  (was: Collapse 
repartition operators with project)

> Collapse repartition operators with a project
> -
>
> Key: SPARK-45607
> URL: https://issues.apache.org/jira/browse/SPARK-45607
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Wan Kun
>Priority: Major
>
> We can collapse two repartition operators with a project between them.
> For example:
> df.repartition($"a").select($"a", $"b", $"a" + $"b").repartition($"b")
> is same to
> df.select($"a", $"b", $"a" + $"b").repartition($"b")



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45607) Collapse repartition operators with project

2023-10-19 Thread Wan Kun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wan Kun updated SPARK-45607:

Summary: Collapse repartition operators with project  (was: Collapse 
repartitions with project)

> Collapse repartition operators with project
> ---
>
> Key: SPARK-45607
> URL: https://issues.apache.org/jira/browse/SPARK-45607
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Wan Kun
>Priority: Major
>
> We can collapse two repartition operators with a project between them.
> For example:
> df.repartition($"a").select($"a", $"b", $"a" + $"b").repartition($"b")
> is same to
> df.select($"a", $"b", $"a" + $"b").repartition($"b")



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45607) Collapse repartitions with project

2023-10-19 Thread Wan Kun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wan Kun updated SPARK-45607:

Summary: Collapse repartitions with project  (was: Collapse repartition 
with project)

> Collapse repartitions with project
> --
>
> Key: SPARK-45607
> URL: https://issues.apache.org/jira/browse/SPARK-45607
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Wan Kun
>Priority: Major
>
> We can collapse two repartition operators with a project between them.
> For example:
> df.repartition($"a").select($"a", $"b", $"a" + $"b").repartition($"b")
> is same to
> df.select($"a", $"b", $"a" + $"b").repartition($"b")



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45607) Collapse repartition with project

2023-10-19 Thread Wan Kun (Jira)

Wan Kun created SPARK-45607:
---

 Summary: Collapse repartition with project
 Key: SPARK-45607
 URL: https://issues.apache.org/jira/browse/SPARK-45607
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: Wan Kun


We can collapse two repartition operators with a project between them.

For example:

df.repartition($"a").select($"a", $"b", $"a" + $"b").repartition($"b")

is same to

df.select($"a", $"b", $"a" + $"b").repartition($"b")



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45604) Converting timestamp_ntz to array can cause NPE or SEGFAULT on parquet vectorized reader

2023-10-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45604:
---
Labels: pull-request-available  (was: )

> Converting timestamp_ntz to array can cause NPE or SEGFAULT on 
> parquet vectorized reader
> ---
>
> Key: SPARK-45604
> URL: https://issues.apache.org/jira/browse/SPARK-45604
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: Zamil Majdy
>Priority: Major
>  Labels: pull-request-available
>
> Repro:
> ```
> val path = "/tmp/someparquetfile"
> val df = sql("SELECT MAP('key', CAST('2019-01-01 00:00:00' AS TIMESTAMP_NTZ)) 
> AS field")
> df.write.mode("overwrite").parquet(path)
> spark.read.schema("field map array>").parquet(path).collect()
> ```
> Depending on the memory mode is used, it will produced NPE on on-heap mode, 
> and segfault on off-heap



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45554) Introduce flexible parameter to assertSchemaEqual

2023-10-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45554:
---
Labels: pull-request-available  (was: )

> Introduce flexible parameter to assertSchemaEqual
> -
>
> Key: SPARK-45554
> URL: https://issues.apache.org/jira/browse/SPARK-45554
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
>
> Add new parameter ignoreColumnNames to the assertSchemaEqual.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45606) Release restrictions on multi-layer runtime filter

2023-10-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45606:
---
Labels: pull-request-available  (was: )

> Release restrictions on multi-layer runtime filter
> --
>
> Key: SPARK-45606
> URL: https://issues.apache.org/jira/browse/SPARK-45606
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Jiaan Geng
>Assignee: Jiaan Geng
>Priority: Major
>  Labels: pull-request-available
>
> Before https://issues.apache.org/jira/browse/SPARK-41674, Spark only supports 
> insert runtime filter for application side of shuffle join on single-layer. 
> Considered it's not worth to insert more runtime filter if one side of the 
> shuffle join already exists runtime filter, Spark restricts it.
> After https://issues.apache.org/jira/browse/SPARK-41674, Spark supports 
> insert runtime filter for one side of any shuffle join on multi-layer. But 
> the restrictions on multi-layer runtime filter looks outdated.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45605) Replace `s.c.MapOps.mapValues` with `s.c.MapOps.view.mapValues`

2023-10-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45605:
---
Labels: pull-request-available  (was: )

>Replace `s.c.MapOps.mapValues` with `s.c.MapOps.view.mapValues`
> --
>
> Key: SPARK-45605
> URL: https://issues.apache.org/jira/browse/SPARK-45605
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, DStreams, Examples, MLlib, Spark Core, SS
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>
> {code:java}
> @deprecated("Use .view.mapValues(f). A future version will include a strict 
> version of this method (for now, .view.mapValues(f).toMap).", "2.13.0")
> def mapValues[W](f: V => W): MapView[K, W] = new MapView.MapValues(this, f) 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45605) Replace `s.c.MapOps.mapValues` with `s.c.MapOps.view.mapValues`

2023-10-19 Thread Yang Jie (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-45605:
-
Description: 
{code:java}
@deprecated("Use .view.mapValues(f). A future version will include a strict 
version of this method (for now, .view.mapValues(f).toMap).", "2.13.0")
def mapValues[W](f: V => W): MapView[K, W] = new MapView.MapValues(this, f) 
{code}

  was:
{code:java}
// code placeholder
{code}


>Replace `s.c.MapOps.mapValues` with `s.c.MapOps.view.mapValues`
> --
>
> Key: SPARK-45605
> URL: https://issues.apache.org/jira/browse/SPARK-45605
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, DStreams, Examples, MLlib, Spark Core, SS
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>
> {code:java}
> @deprecated("Use .view.mapValues(f). A future version will include a strict 
> version of this method (for now, .view.mapValues(f).toMap).", "2.13.0")
> def mapValues[W](f: V => W): MapView[K, W] = new MapView.MapValues(this, f) 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45606) Release restrictions on multi-layer runtime filter

2023-10-19 Thread Jiaan Geng (Jira)

Jiaan Geng created SPARK-45606:
--

 Summary: Release restrictions on multi-layer runtime filter
 Key: SPARK-45606
 URL: https://issues.apache.org/jira/browse/SPARK-45606
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 4.0.0
Reporter: Jiaan Geng
Assignee: Jiaan Geng


Before https://issues.apache.org/jira/browse/SPARK-41674, Spark only supports 
insert runtime filter for application side of shuffle join on single-layer. 
Considered it's not worth to insert more runtime filter if one side of the 
shuffle join already exists runtime filter, Spark restricts it.

After https://issues.apache.org/jira/browse/SPARK-41674, Spark supports insert 
runtime filter for one side of any shuffle join on multi-layer. But the 
restrictions on multi-layer runtime filter looks outdated.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45605) Replace `s.c.MapOps.mapValues` with `s.c.MapOps.view.mapValues`

2023-10-19 Thread Yang Jie (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-45605:
-
Description: 
{code:java}
// code placeholder
{code}

>Replace `s.c.MapOps.mapValues` with `s.c.MapOps.view.mapValues`
> --
>
> Key: SPARK-45605
> URL: https://issues.apache.org/jira/browse/SPARK-45605
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, DStreams, Examples, MLlib, Spark Core, SS
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>
> {code:java}
> // code placeholder
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45605) Replace `s.c.MapOps.mapValues` with `s.c.MapOps.view.mapValues`

2023-10-19 Thread Yang Jie (Jira)

Yang Jie created SPARK-45605:


 Summary:Replace `s.c.MapOps.mapValues` with 
`s.c.MapOps.view.mapValues`
 Key: SPARK-45605
 URL: https://issues.apache.org/jira/browse/SPARK-45605
 Project: Spark
  Issue Type: Sub-task
  Components: SS, Connect, DStreams, Examples, MLlib, Spark Core
Affects Versions: 4.0.0
Reporter: Yang Jie






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45604) Converting timestamp_ntz to array can cause NPE or SEGFAULT on parquet vectorized reader

2023-10-19 Thread Zamil Majdy (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zamil Majdy updated SPARK-45604:

Description: 
Repro:

```
val path = "/tmp/someparquetfile"
val df = sql("SELECT MAP('key', CAST('2019-01-01 00:00:00' AS TIMESTAMP_NTZ)) 
AS field")

df.write.mode("overwrite").parquet(path)
spark.read.schema("field map>").parquet(path).collect()
```

Depending on the memory mode is used, it will produced NPE on on-heap mode, and 
segfault on off-heap

  was:
Repro:

```
val path = "/tmp/zamil/timestamp"
val df = sql("SELECT MAP('key', CAST('2019-01-01 00:00:00' AS TIMESTAMP_NTZ)) 
AS field")

df.write.mode("overwrite").parquet(path)
spark.read.schema("field map>").parquet(path).collect()
```

Depending on the memory mode is used, it will produced NPE on on-heap mode, and 
segfault on off-heap


> Converting timestamp_ntz to array can cause NPE or SEGFAULT on 
> parquet vectorized reader
> ---
>
> Key: SPARK-45604
> URL: https://issues.apache.org/jira/browse/SPARK-45604
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: Zamil Majdy
>Priority: Major
>
> Repro:
> ```
> val path = "/tmp/someparquetfile"
> val df = sql("SELECT MAP('key', CAST('2019-01-01 00:00:00' AS TIMESTAMP_NTZ)) 
> AS field")
> df.write.mode("overwrite").parquet(path)
> spark.read.schema("field map array>").parquet(path).collect()
> ```
> Depending on the memory mode is used, it will produced NPE on on-heap mode, 
> and segfault on off-heap



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45604) Converting timestamp_ntz to array can cause NPE or SEGFAULT on parquet vectorized reader

2023-10-19 Thread Zamil Majdy (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zamil Majdy updated SPARK-45604:

Description: 
Repro:

```
spark.conf.set("spark.databricks.photon.enabled", "false")

val path = "/tmp/zamil/timestamp"
val df = sql("SELECT MAP('key', CAST('2019-01-01 00:00:00' AS TIMESTAMP_NTZ)) 
AS field")

df.write.mode("overwrite").parquet(path)
spark.read.schema("field map>").parquet(path).collect()
```

Depending on the memory mode is used, it will produced NPE on on-heap mode, and 
segfault on off-heap

  was:
Repro:

 

```
spark.conf.set("spark.databricks.photon.enabled", "false")

val path = "/tmp/zamil/timestamp"
val df = sql("SELECT MAP('key', CAST('2019-01-01 00:00:00' AS TIMESTAMP_NTZ)) 
AS field")

df.write.mode("overwrite").parquet(path)
spark.read.schema("field map>").parquet(path).collect()
```
Depending on the memory mode is used, it will produced NPE on on-heap mode, and 
segfault on off-heap


> Converting timestamp_ntz to array can cause NPE or SEGFAULT on 
> parquet vectorized reader
> ---
>
> Key: SPARK-45604
> URL: https://issues.apache.org/jira/browse/SPARK-45604
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: Zamil Majdy
>Priority: Major
>
> Repro:
> ```
> spark.conf.set("spark.databricks.photon.enabled", "false")
> val path = "/tmp/zamil/timestamp"
> val df = sql("SELECT MAP('key', CAST('2019-01-01 00:00:00' AS TIMESTAMP_NTZ)) 
> AS field")
> df.write.mode("overwrite").parquet(path)
> spark.read.schema("field map array>").parquet(path).collect()
> ```
> Depending on the memory mode is used, it will produced NPE on on-heap mode, 
> and segfault on off-heap



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45604) Converting timestamp_ntz to array can cause NPE or SEGFAULT on parquet vectorized reader

2023-10-19 Thread Zamil Majdy (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zamil Majdy updated SPARK-45604:

Description: 
Repro:

```
val path = "/tmp/zamil/timestamp"
val df = sql("SELECT MAP('key', CAST('2019-01-01 00:00:00' AS TIMESTAMP_NTZ)) 
AS field")

df.write.mode("overwrite").parquet(path)
spark.read.schema("field map>").parquet(path).collect()
```

Depending on the memory mode is used, it will produced NPE on on-heap mode, and 
segfault on off-heap

  was:
Repro:

```
spark.conf.set("spark.databricks.photon.enabled", "false")

val path = "/tmp/zamil/timestamp"
val df = sql("SELECT MAP('key', CAST('2019-01-01 00:00:00' AS TIMESTAMP_NTZ)) 
AS field")

df.write.mode("overwrite").parquet(path)
spark.read.schema("field map>").parquet(path).collect()
```

Depending on the memory mode is used, it will produced NPE on on-heap mode, and 
segfault on off-heap


> Converting timestamp_ntz to array can cause NPE or SEGFAULT on 
> parquet vectorized reader
> ---
>
> Key: SPARK-45604
> URL: https://issues.apache.org/jira/browse/SPARK-45604
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: Zamil Majdy
>Priority: Major
>
> Repro:
> ```
> val path = "/tmp/zamil/timestamp"
> val df = sql("SELECT MAP('key', CAST('2019-01-01 00:00:00' AS TIMESTAMP_NTZ)) 
> AS field")
> df.write.mode("overwrite").parquet(path)
> spark.read.schema("field map array>").parquet(path).collect()
> ```
> Depending on the memory mode is used, it will produced NPE on on-heap mode, 
> and segfault on off-heap



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45604) Converting timestamp_ntz to array can cause NPE or SEGFAULT on parquet vectorized reader

2023-10-19 Thread Zamil Majdy (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zamil Majdy updated SPARK-45604:

Description: 
Repro:

 

```
spark.conf.set("spark.databricks.photon.enabled", "false")

val path = "/tmp/zamil/timestamp"
val df = sql("SELECT MAP('key', CAST('2019-01-01 00:00:00' AS TIMESTAMP_NTZ)) 
AS field")

df.write.mode("overwrite").parquet(path)
spark.read.schema("field map>").parquet(path).collect()
```
Depending on the memory mode is used, it will produced NPE on on-heap mode, and 
segfault on off-heap

  was:
Repro:

 

{{```}}
spark.conf.set("spark.databricks.photon.enabled", "false")

val path = "/tmp/zamil/timestamp"
val df = sql("SELECT MAP('key', CAST('2019-01-01 00:00:00' AS TIMESTAMP_NTZ)) 
AS field")

df.write.mode("overwrite").parquet(path)
spark.read.schema("field map>").parquet(path).collect()
{{{}{}}}```

Depending on the memory mode is used, it will produced NPE on on-heap mode, and 
segfault on off-heap


> Converting timestamp_ntz to array can cause NPE or SEGFAULT on 
> parquet vectorized reader
> ---
>
> Key: SPARK-45604
> URL: https://issues.apache.org/jira/browse/SPARK-45604
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: Zamil Majdy
>Priority: Major
>
> Repro:
>  
> ```
> spark.conf.set("spark.databricks.photon.enabled", "false")
> val path = "/tmp/zamil/timestamp"
> val df = sql("SELECT MAP('key', CAST('2019-01-01 00:00:00' AS TIMESTAMP_NTZ)) 
> AS field")
> df.write.mode("overwrite").parquet(path)
> spark.read.schema("field map array>").parquet(path).collect()
> ```
> Depending on the memory mode is used, it will produced NPE on on-heap mode, 
> and segfault on off-heap



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45604) Converting timestamp_ntz to array can cause NPE or SEGFAULT on parquet vectorized reader

2023-10-19 Thread Zamil Majdy (Jira)

Zamil Majdy created SPARK-45604:
---

 Summary: Converting timestamp_ntz to array can 
cause NPE or SEGFAULT on parquet vectorized reader
 Key: SPARK-45604
 URL: https://issues.apache.org/jira/browse/SPARK-45604
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.5.0
Reporter: Zamil Majdy


Repro:

 

{{{}```{}}}{{{}{}}}
spark.conf.set("spark.databricks.photon.enabled", "false")

{{}}
val path = "/tmp/somepath"
val df = sql("SELECT MAP('key', CAST('2019-01-01 00:00:00' AS TIMESTAMP_NTZ)) 
AS field")

{{}}
df.write.mode("overwrite").parquet(path)
spark.read.schema("field map>").parquet(path).collect()
{{{}{}}}{{{}```{}}}

Depending on the memory mode is used, it will produced NPE on on-heap mode, and 
segfault on off-heap



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45604) Converting timestamp_ntz to array can cause NPE or SEGFAULT on parquet vectorized reader

2023-10-19 Thread Zamil Majdy (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zamil Majdy updated SPARK-45604:

Description: 
Repro:

 

{{```}}
spark.conf.set("spark.databricks.photon.enabled", "false")

val path = "/tmp/zamil/timestamp"
val df = sql("SELECT MAP('key', CAST('2019-01-01 00:00:00' AS TIMESTAMP_NTZ)) 
AS field")

df.write.mode("overwrite").parquet(path)
spark.read.schema("field map>").parquet(path).collect()
{{{}{}}}```

Depending on the memory mode is used, it will produced NPE on on-heap mode, and 
segfault on off-heap

  was:
Repro:

 

{{{}```{}}}{{{}{}}}
spark.conf.set("spark.databricks.photon.enabled", "false")

{{}}
val path = "/tmp/somepath"
val df = sql("SELECT MAP('key', CAST('2019-01-01 00:00:00' AS TIMESTAMP_NTZ)) 
AS field")

{{}}
df.write.mode("overwrite").parquet(path)
spark.read.schema("field map>").parquet(path).collect()
{{{}{}}}{{{}```{}}}

Depending on the memory mode is used, it will produced NPE on on-heap mode, and 
segfault on off-heap


> Converting timestamp_ntz to array can cause NPE or SEGFAULT on 
> parquet vectorized reader
> ---
>
> Key: SPARK-45604
> URL: https://issues.apache.org/jira/browse/SPARK-45604
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: Zamil Majdy
>Priority: Major
>
> Repro:
>  
> {{```}}
> spark.conf.set("spark.databricks.photon.enabled", "false")
> val path = "/tmp/zamil/timestamp"
> val df = sql("SELECT MAP('key', CAST('2019-01-01 00:00:00' AS TIMESTAMP_NTZ)) 
> AS field")
> df.write.mode("overwrite").parquet(path)
> spark.read.schema("field map array>").parquet(path).collect()
> {{{}{}}}```
> Depending on the memory mode is used, it will produced NPE on on-heap mode, 
> and segfault on off-heap



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-44734) Add documentation for type casting rules in Python UDFs/UDTFs

2023-10-19 Thread BingKun Pan (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-44734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1064#comment-1064
 ] 

BingKun Pan commented on SPARK-44734:
-

[~phildakin]
Sorry, I didn't see that the previous PR is actually strongly related to this 
PR. For completeness, you can continue with this PR and I will stop this work.

> Add documentation for type casting rules in Python UDFs/UDTFs
> -
>
> Key: SPARK-44734
> URL: https://issues.apache.org/jira/browse/SPARK-44734
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Priority: Major
>
> In addition to type mappings between Spark data types and Python data types 
> (SPARK-44733), we should add the type casting rules for regular and 
> arrow-optimized Python UDFs/UDTFs. 
> We currently have this table in code:
>  * Arrow: 
> [https://github.com/apache/spark/blob/master/python/pyspark/sql/pandas/functions.py#L311-L329]
>  * Python UDF: 
> [https://github.com/apache/spark/blob/master/python/pyspark/sql/udf.py#L101-L116]
> We should add a proper documentation page for the type casting rules. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45569) Assign name to _LEGACY_ERROR_TEMP_2152

2023-10-19 Thread Max Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-45569.
--
Resolution: Fixed

Issue resolved by pull request 43414
[https://github.com/apache/spark/pull/43414]

> Assign name to _LEGACY_ERROR_TEMP_2152
> --
>
> Key: SPARK-45569
> URL: https://issues.apache.org/jira/browse/SPARK-45569
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Deng Ziming
>Assignee: Deng Ziming
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> in DatasetSuite test("CLASS_UNSUPPORTED_BY_MAP_OBJECTS when creating 
> dataset") , we are using _LEGACY_ERROR_TEMP_2151, We should use proper error 
> class name rather than `_LEGACY_ERROR_TEMP_xxx`.
>  
> *NOTE:* Please reply to this ticket before start working on it, to avoid 
> working on same ticket at a time



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

98 matches

Mail list logo