[jira] [Assigned] (SPARK-42894) Implement cache, persist, unpersist, and storageLevel

2023-03-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42894:


Assignee: Apache Spark

> Implement cache, persist, unpersist, and storageLevel
> -
>
> Key: SPARK-42894
> URL: https://issues.apache.org/jira/browse/SPARK-42894
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42894) Implement cache, persist, unpersist, and storageLevel

2023-03-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42894:


Assignee: (was: Apache Spark)

> Implement cache, persist, unpersist, and storageLevel
> -
>
> Key: SPARK-42894
> URL: https://issues.apache.org/jira/browse/SPARK-42894
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42894) Implement cache, persist, unpersist, and storageLevel

2023-03-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17703493#comment-17703493
 ] 

Apache Spark commented on SPARK-42894:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/40516

> Implement cache, persist, unpersist, and storageLevel
> -
>
> Key: SPARK-42894
> URL: https://issues.apache.org/jira/browse/SPARK-42894
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42894) Implement cache, persist, unpersist, and storageLevel

2023-03-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17703492#comment-17703492
 ] 

Apache Spark commented on SPARK-42894:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/40516

> Implement cache, persist, unpersist, and storageLevel
> -
>
> Key: SPARK-42894
> URL: https://issues.apache.org/jira/browse/SPARK-42894
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42880) Improve the yarn document for lo4j2 configuration

2023-03-21 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-42880.
--
Fix Version/s: 3.5.0
 Assignee: Zhifang Li
   Resolution: Fixed

Resolved by https://github.com/apache/spark/pull/40504

> Improve the yarn document for lo4j2 configuration
> -
>
> Key: SPARK-42880
> URL: https://issues.apache.org/jira/browse/SPARK-42880
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, YARN
>Affects Versions: 3.3.2
>Reporter: Zhifang Li
>Assignee: Zhifang Li
>Priority: Minor
> Fix For: 3.5.0
>
>
> Since Spark3.3 has changed log4j1 to log4j2, some documents should also be 
> updated. 
> For example, docs/running-on-yarn.md still uses log4j1 syntax as follows.
> `log4j.appender.file_appender.File=${spark.yarn.app.container.log.dir}/spark.log`.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42880) Improve the yarn document for lo4j2 configuration

2023-03-21 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen updated SPARK-42880:
-
Component/s: Documentation

> Improve the yarn document for lo4j2 configuration
> -
>
> Key: SPARK-42880
> URL: https://issues.apache.org/jira/browse/SPARK-42880
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, Spark Core, YARN
>Affects Versions: 3.3.2
>Reporter: Zhifang Li
>Assignee: Zhifang Li
>Priority: Trivial
> Fix For: 3.5.0
>
>
> Since Spark3.3 has changed log4j1 to log4j2, some documents should also be 
> updated. 
> For example, docs/running-on-yarn.md still uses log4j1 syntax as follows.
> `log4j.appender.file_appender.File=${spark.yarn.app.container.log.dir}/spark.log`.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42880) Improve the yarn document for lo4j2 configuration

2023-03-21 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen updated SPARK-42880:
-
Priority: Trivial  (was: Minor)

> Improve the yarn document for lo4j2 configuration
> -
>
> Key: SPARK-42880
> URL: https://issues.apache.org/jira/browse/SPARK-42880
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, YARN
>Affects Versions: 3.3.2
>Reporter: Zhifang Li
>Assignee: Zhifang Li
>Priority: Trivial
> Fix For: 3.5.0
>
>
> Since Spark3.3 has changed log4j1 to log4j2, some documents should also be 
> updated. 
> For example, docs/running-on-yarn.md still uses log4j1 syntax as follows.
> `log4j.appender.file_appender.File=${spark.yarn.app.container.log.dir}/spark.log`.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42052) Codegen Support for HiveSimpleUDF

2023-03-21 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-42052:
---

Assignee: BingKun Pan

> Codegen Support for HiveSimpleUDF
> -
>
> Key: SPARK-42052
> URL: https://issues.apache.org/jira/browse/SPARK-42052
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Kent Yao
>Assignee: BingKun Pan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42052) Codegen Support for HiveSimpleUDF

2023-03-21 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-42052.
-
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 40397
[https://github.com/apache/spark/pull/40397]

> Codegen Support for HiveSimpleUDF
> -
>
> Key: SPARK-42052
> URL: https://issues.apache.org/jira/browse/SPARK-42052
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Kent Yao
>Assignee: BingKun Pan
>Priority: Major
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42894) Implement cache, persist, unpersist, and storageLevel

2023-03-21 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-42894:
-
Affects Version/s: (was: 3.4.0)

> Implement cache, persist, unpersist, and storageLevel
> -
>
> Key: SPARK-42894
> URL: https://issues.apache.org/jira/browse/SPARK-42894
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42894) Implement cache, persist, unpersist, and storageLevel

2023-03-21 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-42894:
-
Summary: Implement cache, persist, unpersist, and storageLevel  (was: 
Implement cache, persist, unpersist)

> Implement cache, persist, unpersist, and storageLevel
> -
>
> Key: SPARK-42894
> URL: https://issues.apache.org/jira/browse/SPARK-42894
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0, 3.5.0
>Reporter: Yang Jie
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42894) Implement cache, persist, unpersist

2023-03-21 Thread Yang Jie (Jira)
Yang Jie created SPARK-42894:


 Summary: Implement cache, persist, unpersist
 Key: SPARK-42894
 URL: https://issues.apache.org/jira/browse/SPARK-42894
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.4.0, 3.5.0
Reporter: Yang Jie






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42786) Impl typed select in Dataset

2023-03-21 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-42786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hövell resolved SPARK-42786.
---
Fix Version/s: 3.5.0
 Assignee: Zhen Li
   Resolution: Fixed

> Impl typed select in Dataset
> 
>
> Key: SPARK-42786
> URL: https://issues.apache.org/jira/browse/SPARK-42786
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Zhen Li
>Assignee: Zhen Li
>Priority: Major
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42884) Add Ammonite REPL support

2023-03-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17703459#comment-17703459
 ] 

Apache Spark commented on SPARK-42884:
--

User 'hvanhovell' has created a pull request for this issue:
https://github.com/apache/spark/pull/40515

> Add Ammonite REPL support
> -
>
> Key: SPARK-42884
> URL: https://issues.apache.org/jira/browse/SPARK-42884
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Assignee: Herman van Hövell
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42884) Add Ammonite REPL support

2023-03-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42884:


Assignee: Apache Spark  (was: Herman van Hövell)

> Add Ammonite REPL support
> -
>
> Key: SPARK-42884
> URL: https://issues.apache.org/jira/browse/SPARK-42884
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42884) Add Ammonite REPL support

2023-03-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42884:


Assignee: Herman van Hövell  (was: Apache Spark)

> Add Ammonite REPL support
> -
>
> Key: SPARK-42884
> URL: https://issues.apache.org/jira/browse/SPARK-42884
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Assignee: Herman van Hövell
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42884) Add Ammonite REPL support

2023-03-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17703458#comment-17703458
 ] 

Apache Spark commented on SPARK-42884:
--

User 'hvanhovell' has created a pull request for this issue:
https://github.com/apache/spark/pull/40515

> Add Ammonite REPL support
> -
>
> Key: SPARK-42884
> URL: https://issues.apache.org/jira/browse/SPARK-42884
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Assignee: Herman van Hövell
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41233) High-order function: array_prepend

2023-03-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17703454#comment-17703454
 ] 

Apache Spark commented on SPARK-41233:
--

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/40514

> High-order function: array_prepend
> --
>
> Key: SPARK-41233
> URL: https://issues.apache.org/jira/browse/SPARK-41233
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
> Fix For: 3.5.0
>
>
> refer to 
> https://docs.snowflake.com/en/developer-guide/snowpark/reference/python/api/snowflake.snowpark.functions.array_prepend.html
> 1, about the data type validation:
> In Snowflake’s array_append, array_prepend and array_insert functions, the 
> element data type does not need to match the data type of the existing 
> elements in the array.
> While in Spark, we want to leverage the same data type validation as 
> array_remove.
> 2, about the NULL handling
> Currently, SparkSQL, SnowSQL and PostgreSQL deal with NULL values in 
> different ways.
> Existing functions array_contains, array_position and array_remove in 
> SparkSQL handle NULL in this way, if the input array or/and element is NULL, 
> returns NULL. However, this behavior should be broken.
> We should implement the NULL handling in array_prepend in this way:
> 2.1, if the array is NULL, returns NULL;
> 2.2 if the array is not NULL, the element is NULL, append the NULL value into 
> the array



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41233) High-order function: array_prepend

2023-03-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17703455#comment-17703455
 ] 

Apache Spark commented on SPARK-41233:
--

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/40514

> High-order function: array_prepend
> --
>
> Key: SPARK-41233
> URL: https://issues.apache.org/jira/browse/SPARK-41233
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
> Fix For: 3.5.0
>
>
> refer to 
> https://docs.snowflake.com/en/developer-guide/snowpark/reference/python/api/snowflake.snowpark.functions.array_prepend.html
> 1, about the data type validation:
> In Snowflake’s array_append, array_prepend and array_insert functions, the 
> element data type does not need to match the data type of the existing 
> elements in the array.
> While in Spark, we want to leverage the same data type validation as 
> array_remove.
> 2, about the NULL handling
> Currently, SparkSQL, SnowSQL and PostgreSQL deal with NULL values in 
> different ways.
> Existing functions array_contains, array_position and array_remove in 
> SparkSQL handle NULL in this way, if the input array or/and element is NULL, 
> returns NULL. However, this behavior should be broken.
> We should implement the NULL handling in array_prepend in this way:
> 2.1, if the array is NULL, returns NULL;
> 2.2 if the array is not NULL, the element is NULL, append the NULL value into 
> the array



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42893) Block Arrow-optimized Python UDFs

2023-03-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42893:


Assignee: Apache Spark

> Block Arrow-optimized Python UDFs
> -
>
> Key: SPARK-42893
> URL: https://issues.apache.org/jira/browse/SPARK-42893
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Assignee: Apache Spark
>Priority: Major
>
> Considering the upcoming improvements on the result inconsistencies between 
> traditional Pickled Python UDFs and Arrow-optimized Python UDFs, we'd better 
> block the feature, otherwise, users who try out the feature will expect 
> behavior changes in the next release.
> In addition, since Spark Connect Python Client(SCPC) has been introduced in 
> Spark 3.4, we'd better ensure the feature is ready in both vanilla PySpark 
> and SCPC at the same time for compatibility.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42893) Block Arrow-optimized Python UDFs

2023-03-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17703446#comment-17703446
 ] 

Apache Spark commented on SPARK-42893:
--

User 'xinrong-meng' has created a pull request for this issue:
https://github.com/apache/spark/pull/40513

> Block Arrow-optimized Python UDFs
> -
>
> Key: SPARK-42893
> URL: https://issues.apache.org/jira/browse/SPARK-42893
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Considering the upcoming improvements on the result inconsistencies between 
> traditional Pickled Python UDFs and Arrow-optimized Python UDFs, we'd better 
> block the feature, otherwise, users who try out the feature will expect 
> behavior changes in the next release.
> In addition, since Spark Connect Python Client(SCPC) has been introduced in 
> Spark 3.4, we'd better ensure the feature is ready in both vanilla PySpark 
> and SCPC at the same time for compatibility.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42893) Block Arrow-optimized Python UDFs

2023-03-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17703447#comment-17703447
 ] 

Apache Spark commented on SPARK-42893:
--

User 'xinrong-meng' has created a pull request for this issue:
https://github.com/apache/spark/pull/40513

> Block Arrow-optimized Python UDFs
> -
>
> Key: SPARK-42893
> URL: https://issues.apache.org/jira/browse/SPARK-42893
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Considering the upcoming improvements on the result inconsistencies between 
> traditional Pickled Python UDFs and Arrow-optimized Python UDFs, we'd better 
> block the feature, otherwise, users who try out the feature will expect 
> behavior changes in the next release.
> In addition, since Spark Connect Python Client(SCPC) has been introduced in 
> Spark 3.4, we'd better ensure the feature is ready in both vanilla PySpark 
> and SCPC at the same time for compatibility.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42893) Block Arrow-optimized Python UDFs

2023-03-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42893:


Assignee: (was: Apache Spark)

> Block Arrow-optimized Python UDFs
> -
>
> Key: SPARK-42893
> URL: https://issues.apache.org/jira/browse/SPARK-42893
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Considering the upcoming improvements on the result inconsistencies between 
> traditional Pickled Python UDFs and Arrow-optimized Python UDFs, we'd better 
> block the feature, otherwise, users who try out the feature will expect 
> behavior changes in the next release.
> In addition, since Spark Connect Python Client(SCPC) has been introduced in 
> Spark 3.4, we'd better ensure the feature is ready in both vanilla PySpark 
> and SCPC at the same time for compatibility.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-40307) Introduce Arrow-optimized Python UDFs

2023-03-21 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng reopened SPARK-40307:
--

> Introduce Arrow-optimized Python UDFs
> -
>
> Key: SPARK-40307
> URL: https://issues.apache.org/jira/browse/SPARK-40307
> Project: Spark
>  Issue Type: Umbrella
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
>
> Python user-defined function (UDF) enables users to run arbitrary code 
> against PySpark columns. It uses Pickle for (de)serialization and executes 
> row by row.
> One major performance bottleneck of Python UDFs is (de)serialization, that 
> is, the data interchanging between the worker JVM and the spawned Python 
> subprocess which actually executes the UDF. We should seek an alternative to 
> handle the (de)serialization: Arrow, which is used in the (de)serialization 
> of Pandas UDF already.
> There should be two ways to enable/disable the Arrow optimization for Python 
> UDFs:
> - the Spark configuration `spark.sql.execution.pythonUDF.arrow.enabled`, 
> disabled by default.
> - the `useArrow` parameter of the `udf` function, None by default.
> The Spark configuration takes effect only when `useArrow` is None. Otherwise, 
> `useArrow` decides whether a specific user-defined function is optimized by 
> Arrow or not.
> The reason why we introduce these two ways is to provide both a convenient, 
> per-Spark-session control and a finer-grained, per-UDF control of the Arrow 
> optimization for Python UDFs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40307) Introduce Arrow-optimized Python UDFs

2023-03-21 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-40307:
-
Affects Version/s: 3.5.0

> Introduce Arrow-optimized Python UDFs
> -
>
> Key: SPARK-40307
> URL: https://issues.apache.org/jira/browse/SPARK-40307
> Project: Spark
>  Issue Type: Umbrella
>  Components: PySpark
>Affects Versions: 3.4.0, 3.5.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
>
> Python user-defined function (UDF) enables users to run arbitrary code 
> against PySpark columns. It uses Pickle for (de)serialization and executes 
> row by row.
> One major performance bottleneck of Python UDFs is (de)serialization, that 
> is, the data interchanging between the worker JVM and the spawned Python 
> subprocess which actually executes the UDF. We should seek an alternative to 
> handle the (de)serialization: Arrow, which is used in the (de)serialization 
> of Pandas UDF already.
> There should be two ways to enable/disable the Arrow optimization for Python 
> UDFs:
> - the Spark configuration `spark.sql.execution.pythonUDF.arrow.enabled`, 
> disabled by default.
> - the `useArrow` parameter of the `udf` function, None by default.
> The Spark configuration takes effect only when `useArrow` is None. Otherwise, 
> `useArrow` decides whether a specific user-defined function is optimized by 
> Arrow or not.
> The reason why we introduce these two ways is to provide both a convenient, 
> per-Spark-session control and a finer-grained, per-UDF control of the Arrow 
> optimization for Python UDFs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42893) Block the usage of Arrow-optimized Python UDFs

2023-03-21 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-42893:
-
Description: 
Considering the upcoming improvements on the result inconsistencies between 
traditional Pickled Python UDFs and Arrow-optimized Python UDFs, we'd better 
block the feature, otherwise, users who try out the feature will expect 
behavior changes in the next release.

In addition, since Spark Connect Python Client(SCPC) has been introduced in 
Spark 3.4, we'd better ensure the feature is ready in both vanilla PySpark and 
SCPC at the same time for compatibility.

  was:Considering the upcoming improvements on the result inconsistencies 
between traditional Pickled Python UDFs and Arrow-optimized Python UDFs, we'd 
better block the feature, otherwise, users who try out the feature will expect 
behavior changes in the next release.


> Block the usage of Arrow-optimized Python UDFs
> --
>
> Key: SPARK-42893
> URL: https://issues.apache.org/jira/browse/SPARK-42893
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Considering the upcoming improvements on the result inconsistencies between 
> traditional Pickled Python UDFs and Arrow-optimized Python UDFs, we'd better 
> block the feature, otherwise, users who try out the feature will expect 
> behavior changes in the next release.
> In addition, since Spark Connect Python Client(SCPC) has been introduced in 
> Spark 3.4, we'd better ensure the feature is ready in both vanilla PySpark 
> and SCPC at the same time for compatibility.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42893) Block Arrow-optimized Python UDFs

2023-03-21 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-42893:
-
Summary: Block Arrow-optimized Python UDFs  (was: Block the usage of 
Arrow-optimized Python UDFs)

> Block Arrow-optimized Python UDFs
> -
>
> Key: SPARK-42893
> URL: https://issues.apache.org/jira/browse/SPARK-42893
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Considering the upcoming improvements on the result inconsistencies between 
> traditional Pickled Python UDFs and Arrow-optimized Python UDFs, we'd better 
> block the feature, otherwise, users who try out the feature will expect 
> behavior changes in the next release.
> In addition, since Spark Connect Python Client(SCPC) has been introduced in 
> Spark 3.4, we'd better ensure the feature is ready in both vanilla PySpark 
> and SCPC at the same time for compatibility.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42893) Block the usage of Arrow-optimized Python UDFs

2023-03-21 Thread Xinrong Meng (Jira)
Xinrong Meng created SPARK-42893:


 Summary: Block the usage of Arrow-optimized Python UDFs
 Key: SPARK-42893
 URL: https://issues.apache.org/jira/browse/SPARK-42893
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.4.0
Reporter: Xinrong Meng


Considering the upcoming improvements on the result inconsistencies between 
traditional Pickled Python UDFs and Arrow-optimized Python UDFs, we'd better 
block the feature, otherwise, users who try out the feature will expect 
behavior changes in the next release.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42892) Move sameType and relevant methods out of DataType

2023-03-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42892:


Assignee: Rui Wang  (was: Apache Spark)

> Move sameType and relevant methods out of DataType
> --
>
> Key: SPARK-42892
> URL: https://issues.apache.org/jira/browse/SPARK-42892
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42892) Move sameType and relevant methods out of DataType

2023-03-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42892:


Assignee: Apache Spark  (was: Rui Wang)

> Move sameType and relevant methods out of DataType
> --
>
> Key: SPARK-42892
> URL: https://issues.apache.org/jira/browse/SPARK-42892
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Rui Wang
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42892) Move sameType and relevant methods out of DataType

2023-03-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17703436#comment-17703436
 ] 

Apache Spark commented on SPARK-42892:
--

User 'amaliujia' has created a pull request for this issue:
https://github.com/apache/spark/pull/40512

> Move sameType and relevant methods out of DataType
> --
>
> Key: SPARK-42892
> URL: https://issues.apache.org/jira/browse/SPARK-42892
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42892) Move sameType and relevant methods out of DataType

2023-03-21 Thread Rui Wang (Jira)
Rui Wang created SPARK-42892:


 Summary: Move sameType and relevant methods out of DataType
 Key: SPARK-42892
 URL: https://issues.apache.org/jira/browse/SPARK-42892
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.5.0
Reporter: Rui Wang
Assignee: Rui Wang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42889) Implement cache, persist, unpersist, and storageLevel

2023-03-21 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-42889:


Assignee: Takuya Ueshin

> Implement cache, persist, unpersist, and storageLevel
> -
>
> Key: SPARK-42889
> URL: https://issues.apache.org/jira/browse/SPARK-42889
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Takuya Ueshin
>Assignee: Takuya Ueshin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42889) Implement cache, persist, unpersist, and storageLevel

2023-03-21 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-42889.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 40510
[https://github.com/apache/spark/pull/40510]

> Implement cache, persist, unpersist, and storageLevel
> -
>
> Key: SPARK-42889
> URL: https://issues.apache.org/jira/browse/SPARK-42889
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Takuya Ueshin
>Assignee: Takuya Ueshin
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42816) Increase max message size to 128MB

2023-03-21 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-42816:
-

Assignee: Martin Grund

> Increase max message size to 128MB
> --
>
> Key: SPARK-42816
> URL: https://issues.apache.org/jira/browse/SPARK-42816
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Assignee: Martin Grund
>Priority: Major
>
> Support messages up to 128MB



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42816) Increase max message size to 128MB

2023-03-21 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-42816.
---
Fix Version/s: 3.4.1
   Resolution: Fixed

Issue resolved by pull request 40447
[https://github.com/apache/spark/pull/40447]

> Increase max message size to 128MB
> --
>
> Key: SPARK-42816
> URL: https://issues.apache.org/jira/browse/SPARK-42816
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Assignee: Martin Grund
>Priority: Major
> Fix For: 3.4.1
>
>
> Support messages up to 128MB



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42891) Implement CoGrouped Map API

2023-03-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42891:


Assignee: (was: Apache Spark)

> Implement CoGrouped Map API
> ---
>
> Key: SPARK-42891
> URL: https://issues.apache.org/jira/browse/SPARK-42891
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Implement CoGrouped Map API.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42891) Implement CoGrouped Map API

2023-03-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17703429#comment-17703429
 ] 

Apache Spark commented on SPARK-42891:
--

User 'xinrong-meng' has created a pull request for this issue:
https://github.com/apache/spark/pull/40487

> Implement CoGrouped Map API
> ---
>
> Key: SPARK-42891
> URL: https://issues.apache.org/jira/browse/SPARK-42891
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Implement CoGrouped Map API.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42891) Implement CoGrouped Map API

2023-03-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17703428#comment-17703428
 ] 

Apache Spark commented on SPARK-42891:
--

User 'xinrong-meng' has created a pull request for this issue:
https://github.com/apache/spark/pull/40487

> Implement CoGrouped Map API
> ---
>
> Key: SPARK-42891
> URL: https://issues.apache.org/jira/browse/SPARK-42891
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Implement CoGrouped Map API.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42891) Implement CoGrouped Map API

2023-03-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42891:


Assignee: Apache Spark

> Implement CoGrouped Map API
> ---
>
> Key: SPARK-42891
> URL: https://issues.apache.org/jira/browse/SPARK-42891
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Assignee: Apache Spark
>Priority: Major
>
> Implement CoGrouped Map API.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42340) Implement Grouped Map API

2023-03-21 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-42340:
-
Summary: Implement Grouped Map API  (was: Implement 
GroupedData.applyInPandas)

> Implement Grouped Map API
> -
>
> Key: SPARK-42340
> URL: https://issues.apache.org/jira/browse/SPARK-42340
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Takuya Ueshin
>Assignee: Xinrong Meng
>Priority: Major
> Fix For: 3.4.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42891) Implement CoGrouped Map API

2023-03-21 Thread Xinrong Meng (Jira)
Xinrong Meng created SPARK-42891:


 Summary: Implement CoGrouped Map API
 Key: SPARK-42891
 URL: https://issues.apache.org/jira/browse/SPARK-42891
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, PySpark
Affects Versions: 3.4.0
Reporter: Xinrong Meng


Implement CoGrouped Map API.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42888) Upgrade GCS connector to 2.2.11.

2023-03-21 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-42888:
--
Affects Version/s: 3.4.0
   (was: 3.4.1)

> Upgrade GCS connector to 2.2.11.
> 
>
> Key: SPARK-42888
> URL: https://issues.apache.org/jira/browse/SPARK-42888
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
>Priority: Minor
> Fix For: 3.4.0
>
>
> Upgrade the [GCS 
> Connector|https://github.com/GoogleCloudDataproc/hadoop-connectors/tree/v2.2.11/gcs]
>  bundled in the Spark distro from version 2.2.7 to 2.2.11. The new release 
> contains multiple bug fixes and enhancements discussed in the [Release 
> Notes|https://github.com/GoogleCloudDataproc/hadoop-connectors/blob/v2.2.11/gcs/CHANGES.md].
>  Notable changes include:
> * Improved socket timeout handling.
> * Trace logging capabilities.
> * Fix bug that prevented usage of GCS as a [Hadoop Credential 
> Provider|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/CredentialProviderAPI.html].
> * Dependency upgrades.
> * Support OAuth2 based client authentication.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42888) Upgrade GCS connector to 2.2.11.

2023-03-21 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-42888:
--
Fix Version/s: 3.4.1
   (was: 3.4.0)

> Upgrade GCS connector to 2.2.11.
> 
>
> Key: SPARK-42888
> URL: https://issues.apache.org/jira/browse/SPARK-42888
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
>Priority: Minor
> Fix For: 3.4.1
>
>
> Upgrade the [GCS 
> Connector|https://github.com/GoogleCloudDataproc/hadoop-connectors/tree/v2.2.11/gcs]
>  bundled in the Spark distro from version 2.2.7 to 2.2.11. The new release 
> contains multiple bug fixes and enhancements discussed in the [Release 
> Notes|https://github.com/GoogleCloudDataproc/hadoop-connectors/blob/v2.2.11/gcs/CHANGES.md].
>  Notable changes include:
> * Improved socket timeout handling.
> * Trace logging capabilities.
> * Fix bug that prevented usage of GCS as a [Hadoop Credential 
> Provider|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/CredentialProviderAPI.html].
> * Dependency upgrades.
> * Support OAuth2 based client authentication.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42888) Upgrade GCS connector to 2.2.11.

2023-03-21 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-42888:
-

Assignee: Chris Nauroth

> Upgrade GCS connector to 2.2.11.
> 
>
> Key: SPARK-42888
> URL: https://issues.apache.org/jira/browse/SPARK-42888
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.1
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
>Priority: Minor
> Fix For: 3.4.0
>
>
> Upgrade the [GCS 
> Connector|https://github.com/GoogleCloudDataproc/hadoop-connectors/tree/v2.2.11/gcs]
>  bundled in the Spark distro from version 2.2.7 to 2.2.11. The new release 
> contains multiple bug fixes and enhancements discussed in the [Release 
> Notes|https://github.com/GoogleCloudDataproc/hadoop-connectors/blob/v2.2.11/gcs/CHANGES.md].
>  Notable changes include:
> * Improved socket timeout handling.
> * Trace logging capabilities.
> * Fix bug that prevented usage of GCS as a [Hadoop Credential 
> Provider|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/CredentialProviderAPI.html].
> * Dependency upgrades.
> * Support OAuth2 based client authentication.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42888) Upgrade GCS connector to 2.2.11.

2023-03-21 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-42888.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 40511
[https://github.com/apache/spark/pull/40511]

> Upgrade GCS connector to 2.2.11.
> 
>
> Key: SPARK-42888
> URL: https://issues.apache.org/jira/browse/SPARK-42888
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.1
>Reporter: Chris Nauroth
>Priority: Minor
> Fix For: 3.4.0
>
>
> Upgrade the [GCS 
> Connector|https://github.com/GoogleCloudDataproc/hadoop-connectors/tree/v2.2.11/gcs]
>  bundled in the Spark distro from version 2.2.7 to 2.2.11. The new release 
> contains multiple bug fixes and enhancements discussed in the [Release 
> Notes|https://github.com/GoogleCloudDataproc/hadoop-connectors/blob/v2.2.11/gcs/CHANGES.md].
>  Notable changes include:
> * Improved socket timeout handling.
> * Trace logging capabilities.
> * Fix bug that prevented usage of GCS as a [Hadoop Credential 
> Provider|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/CredentialProviderAPI.html].
> * Dependency upgrades.
> * Support OAuth2 based client authentication.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42890) Add Identifier to the InMemoryTableScan node on the SQL page

2023-03-21 Thread Yian Liou (Jira)
Yian Liou created SPARK-42890:
-

 Summary: Add Identifier to the InMemoryTableScan node on the SQL 
page
 Key: SPARK-42890
 URL: https://issues.apache.org/jira/browse/SPARK-42890
 Project: Spark
  Issue Type: Improvement
  Components: Web UI
Affects Versions: 3.3.2
Reporter: Yian Liou


On the SQL page in the Web UI, there is no distinction for which 
InMemoryTableScan is being used at a specific point in the DAG. This Jira aims 
to add a repeat identifier to distinguish which InMemoryTableScan is being used 
at a certain location.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42829) Add Identifier to the cached RDD node on the Stages page

2023-03-21 Thread Yian Liou (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yian Liou updated SPARK-42829:
--
Summary: Add Identifier to the cached RDD node on the Stages page   (was: 
Added Identifier to the cached RDD operator on the Stages page )

> Add Identifier to the cached RDD node on the Stages page 
> -
>
> Key: SPARK-42829
> URL: https://issues.apache.org/jira/browse/SPARK-42829
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 3.3.2
>Reporter: Yian Liou
>Priority: Major
> Attachments: Screen Shot 2023-03-20 at 3.55.40 PM.png
>
>
> On the stages page in the Web UI, there is no distinction for which cached 
> RDD is being executed in a particular stage. This Jira aims to add an repeat 
> identifier to distinguish which cached RDD is being executed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42194) Allow `columns` parameter when creating DataFrame with Series.

2023-03-21 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-42194:
-
Fix Version/s: 3.4.1
   (was: 3.4.0)

> Allow `columns` parameter when creating DataFrame with Series.
> --
>
> Key: SPARK-42194
> URL: https://issues.apache.org/jira/browse/SPARK-42194
> Project: Spark
>  Issue Type: Sub-task
>  Components: ps
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
> Fix For: 3.4.1
>
>
> pandas API on Spark doesn't allow creating DataFrame with Series by 
> specifying the `columns` parameter as below:
> {code:java}
> >>> ps.DataFrame(psser, columns=["labels"])
> Traceback (most recent call last):
>   File "", line 1, in 
>   File ".../spark/python/pyspark/pandas/frame.py", line 539, in __init__
>     assert columns is None
> AssertionError {code}
> We should make it available.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42194) Allow `columns` parameter when creating DataFrame with Series.

2023-03-21 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-42194:
-
Component/s: Pandas API on Spark
 (was: ps)

> Allow `columns` parameter when creating DataFrame with Series.
> --
>
> Key: SPARK-42194
> URL: https://issues.apache.org/jira/browse/SPARK-42194
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
> Fix For: 3.4.1
>
>
> pandas API on Spark doesn't allow creating DataFrame with Series by 
> specifying the `columns` parameter as below:
> {code:java}
> >>> ps.DataFrame(psser, columns=["labels"])
> Traceback (most recent call last):
>   File "", line 1, in 
>   File ".../spark/python/pyspark/pandas/frame.py", line 539, in __init__
>     assert columns is None
> AssertionError {code}
> We should make it available.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42871) Upgrade slf4j to 2.0.7

2023-03-21 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-42871:


Assignee: Yang Jie

> Upgrade slf4j to 2.0.7
> --
>
> Key: SPARK-42871
> URL: https://issues.apache.org/jira/browse/SPARK-42871
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>
> https://www.slf4j.org/news.html#2.0.7



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42871) Upgrade slf4j to 2.0.7

2023-03-21 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-42871.
--
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 40489
[https://github.com/apache/spark/pull/40489]

> Upgrade slf4j to 2.0.7
> --
>
> Key: SPARK-42871
> URL: https://issues.apache.org/jira/browse/SPARK-42871
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
> Fix For: 3.5.0
>
>
> https://www.slf4j.org/news.html#2.0.7



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42662) Add proto message for pandas API on Spark default index

2023-03-21 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-42662.
--
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 40507
[https://github.com/apache/spark/pull/40507]

> Add proto message for pandas API on Spark default index
> ---
>
> Key: SPARK-42662
> URL: https://issues.apache.org/jira/browse/SPARK-42662
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Pandas API on Spark, PySpark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
> Fix For: 3.5.0
>
>
> Add `DistributedSequenceID` into proto message to support the 
> distributed-sequence index of the pandas API on Spark in Spark Connect as 
> well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42662) Add proto message for pandas API on Spark default index

2023-03-21 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-42662:


Assignee: Haejoon Lee

> Add proto message for pandas API on Spark default index
> ---
>
> Key: SPARK-42662
> URL: https://issues.apache.org/jira/browse/SPARK-42662
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Pandas API on Spark, PySpark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>
> Add `DistributedSequenceID` into proto message to support the 
> distributed-sequence index of the pandas API on Spark in Spark Connect as 
> well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42888) Upgrade GCS connector to 2.2.11.

2023-03-21 Thread Chris Nauroth (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated SPARK-42888:
--
Summary: Upgrade GCS connector to 2.2.11.  (was: Upgrade GCS connector from 
2.2.7 to 2.2.11.)

> Upgrade GCS connector to 2.2.11.
> 
>
> Key: SPARK-42888
> URL: https://issues.apache.org/jira/browse/SPARK-42888
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.1
>Reporter: Chris Nauroth
>Priority: Minor
>
> Upgrade the [GCS 
> Connector|https://github.com/GoogleCloudDataproc/hadoop-connectors/tree/v2.2.11/gcs]
>  bundled in the Spark distro from version 2.2.7 to 2.2.11. The new release 
> contains multiple bug fixes and enhancements discussed in the [Release 
> Notes|https://github.com/GoogleCloudDataproc/hadoop-connectors/blob/v2.2.11/gcs/CHANGES.md].
>  Notable changes include:
> * Improved socket timeout handling.
> * Trace logging capabilities.
> * Fix bug that prevented usage of GCS as a [Hadoop Credential 
> Provider|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/CredentialProviderAPI.html].
> * Dependency upgrades.
> * Support OAuth2 based client authentication.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42829) Added Identifier to the cached RDD operator on the Stages page

2023-03-21 Thread Yian Liou (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17703403#comment-17703403
 ] 

Yian Liou commented on SPARK-42829:
---

Will file another Jira for adding repeat identifier for InMemoryTableScan 
operator on SQL page which is related to this jira.

> Added Identifier to the cached RDD operator on the Stages page 
> ---
>
> Key: SPARK-42829
> URL: https://issues.apache.org/jira/browse/SPARK-42829
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 3.3.2
>Reporter: Yian Liou
>Priority: Major
> Attachments: Screen Shot 2023-03-20 at 3.55.40 PM.png
>
>
> On the stages page in the Web UI, there is no distinction for which cached 
> RDD is being executed in a particular stage. This Jira aims to add an repeat 
> identifier to distinguish which cached RDD is being executed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42888) Upgrade GCS connector from 2.2.7 to 2.2.11.

2023-03-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17703401#comment-17703401
 ] 

Apache Spark commented on SPARK-42888:
--

User 'cnauroth' has created a pull request for this issue:
https://github.com/apache/spark/pull/40511

> Upgrade GCS connector from 2.2.7 to 2.2.11.
> ---
>
> Key: SPARK-42888
> URL: https://issues.apache.org/jira/browse/SPARK-42888
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.1
>Reporter: Chris Nauroth
>Priority: Minor
>
> Upgrade the [GCS 
> Connector|https://github.com/GoogleCloudDataproc/hadoop-connectors/tree/v2.2.11/gcs]
>  bundled in the Spark distro from version 2.2.7 to 2.2.11. The new release 
> contains multiple bug fixes and enhancements discussed in the [Release 
> Notes|https://github.com/GoogleCloudDataproc/hadoop-connectors/blob/v2.2.11/gcs/CHANGES.md].
>  Notable changes include:
> * Improved socket timeout handling.
> * Trace logging capabilities.
> * Fix bug that prevented usage of GCS as a [Hadoop Credential 
> Provider|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/CredentialProviderAPI.html].
> * Dependency upgrades.
> * Support OAuth2 based client authentication.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42888) Upgrade GCS connector from 2.2.7 to 2.2.11.

2023-03-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42888:


Assignee: (was: Apache Spark)

> Upgrade GCS connector from 2.2.7 to 2.2.11.
> ---
>
> Key: SPARK-42888
> URL: https://issues.apache.org/jira/browse/SPARK-42888
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.1
>Reporter: Chris Nauroth
>Priority: Minor
>
> Upgrade the [GCS 
> Connector|https://github.com/GoogleCloudDataproc/hadoop-connectors/tree/v2.2.11/gcs]
>  bundled in the Spark distro from version 2.2.7 to 2.2.11. The new release 
> contains multiple bug fixes and enhancements discussed in the [Release 
> Notes|https://github.com/GoogleCloudDataproc/hadoop-connectors/blob/v2.2.11/gcs/CHANGES.md].
>  Notable changes include:
> * Improved socket timeout handling.
> * Trace logging capabilities.
> * Fix bug that prevented usage of GCS as a [Hadoop Credential 
> Provider|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/CredentialProviderAPI.html].
> * Dependency upgrades.
> * Support OAuth2 based client authentication.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42888) Upgrade GCS connector from 2.2.7 to 2.2.11.

2023-03-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42888:


Assignee: Apache Spark

> Upgrade GCS connector from 2.2.7 to 2.2.11.
> ---
>
> Key: SPARK-42888
> URL: https://issues.apache.org/jira/browse/SPARK-42888
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.1
>Reporter: Chris Nauroth
>Assignee: Apache Spark
>Priority: Minor
>
> Upgrade the [GCS 
> Connector|https://github.com/GoogleCloudDataproc/hadoop-connectors/tree/v2.2.11/gcs]
>  bundled in the Spark distro from version 2.2.7 to 2.2.11. The new release 
> contains multiple bug fixes and enhancements discussed in the [Release 
> Notes|https://github.com/GoogleCloudDataproc/hadoop-connectors/blob/v2.2.11/gcs/CHANGES.md].
>  Notable changes include:
> * Improved socket timeout handling.
> * Trace logging capabilities.
> * Fix bug that prevented usage of GCS as a [Hadoop Credential 
> Provider|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/CredentialProviderAPI.html].
> * Dependency upgrades.
> * Support OAuth2 based client authentication.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42888) Upgrade GCS connector from 2.2.7 to 2.2.11.

2023-03-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17703400#comment-17703400
 ] 

Apache Spark commented on SPARK-42888:
--

User 'cnauroth' has created a pull request for this issue:
https://github.com/apache/spark/pull/40511

> Upgrade GCS connector from 2.2.7 to 2.2.11.
> ---
>
> Key: SPARK-42888
> URL: https://issues.apache.org/jira/browse/SPARK-42888
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.1
>Reporter: Chris Nauroth
>Priority: Minor
>
> Upgrade the [GCS 
> Connector|https://github.com/GoogleCloudDataproc/hadoop-connectors/tree/v2.2.11/gcs]
>  bundled in the Spark distro from version 2.2.7 to 2.2.11. The new release 
> contains multiple bug fixes and enhancements discussed in the [Release 
> Notes|https://github.com/GoogleCloudDataproc/hadoop-connectors/blob/v2.2.11/gcs/CHANGES.md].
>  Notable changes include:
> * Improved socket timeout handling.
> * Trace logging capabilities.
> * Fix bug that prevented usage of GCS as a [Hadoop Credential 
> Provider|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/CredentialProviderAPI.html].
> * Dependency upgrades.
> * Support OAuth2 based client authentication.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42889) Implement cache, persist, unpersist, and storageLevel

2023-03-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17703392#comment-17703392
 ] 

Apache Spark commented on SPARK-42889:
--

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/40510

> Implement cache, persist, unpersist, and storageLevel
> -
>
> Key: SPARK-42889
> URL: https://issues.apache.org/jira/browse/SPARK-42889
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Takuya Ueshin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42889) Implement cache, persist, unpersist, and storageLevel

2023-03-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42889:


Assignee: (was: Apache Spark)

> Implement cache, persist, unpersist, and storageLevel
> -
>
> Key: SPARK-42889
> URL: https://issues.apache.org/jira/browse/SPARK-42889
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Takuya Ueshin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42889) Implement cache, persist, unpersist, and storageLevel

2023-03-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42889:


Assignee: Apache Spark

> Implement cache, persist, unpersist, and storageLevel
> -
>
> Key: SPARK-42889
> URL: https://issues.apache.org/jira/browse/SPARK-42889
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Takuya Ueshin
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42889) Implement cache, persist, unpersist, and storageLevel

2023-03-21 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-42889:
-

 Summary: Implement cache, persist, unpersist, and storageLevel
 Key: SPARK-42889
 URL: https://issues.apache.org/jira/browse/SPARK-42889
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Takuya Ueshin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42885) Upgrade `kubernetes-client` to 6.5.1

2023-03-21 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-42885.
---
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 40509
[https://github.com/apache/spark/pull/40509]

> Upgrade `kubernetes-client` to 6.5.1
> 
>
> Key: SPARK-42885
> URL: https://issues.apache.org/jira/browse/SPARK-42885
> Project: Spark
>  Issue Type: Bug
>  Components: Build, Kubernetes
>Affects Versions: 3.5.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42885) Upgrade `kubernetes-client` to 6.5.1

2023-03-21 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-42885:
-

Assignee: Dongjoon Hyun

> Upgrade `kubernetes-client` to 6.5.1
> 
>
> Key: SPARK-42885
> URL: https://issues.apache.org/jira/browse/SPARK-42885
> Project: Spark
>  Issue Type: Bug
>  Components: Build, Kubernetes
>Affects Versions: 3.5.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42888) Upgrade GCS connector from 2.2.7 to 2.2.11.

2023-03-21 Thread Chris Nauroth (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17703357#comment-17703357
 ] 

Chris Nauroth commented on SPARK-42888:
---

I have a patch in progress and will send a pull request.

> Upgrade GCS connector from 2.2.7 to 2.2.11.
> ---
>
> Key: SPARK-42888
> URL: https://issues.apache.org/jira/browse/SPARK-42888
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.1
>Reporter: Chris Nauroth
>Priority: Minor
>
> Upgrade the [GCS 
> Connector|https://github.com/GoogleCloudDataproc/hadoop-connectors/tree/v2.2.11/gcs]
>  bundled in the Spark distro from version 2.2.7 to 2.2.11. The new release 
> contains multiple bug fixes and enhancements discussed in the [Release 
> Notes|https://github.com/GoogleCloudDataproc/hadoop-connectors/blob/v2.2.11/gcs/CHANGES.md].
>  Notable changes include:
> * Improved socket timeout handling.
> * Trace logging capabilities.
> * Fix bug that prevented usage of GCS as a [Hadoop Credential 
> Provider|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/CredentialProviderAPI.html].
> * Dependency upgrades.
> * Support OAuth2 based client authentication.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42888) Upgrade GCS connector from 2.2.7 to 2.2.11.

2023-03-21 Thread Chris Nauroth (Jira)
Chris Nauroth created SPARK-42888:
-

 Summary: Upgrade GCS connector from 2.2.7 to 2.2.11.
 Key: SPARK-42888
 URL: https://issues.apache.org/jira/browse/SPARK-42888
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.4.1
Reporter: Chris Nauroth


Upgrade the [GCS 
Connector|https://github.com/GoogleCloudDataproc/hadoop-connectors/tree/v2.2.11/gcs]
 bundled in the Spark distro from version 2.2.7 to 2.2.11. The new release 
contains multiple bug fixes and enhancements discussed in the [Release 
Notes|https://github.com/GoogleCloudDataproc/hadoop-connectors/blob/v2.2.11/gcs/CHANGES.md].
 Notable changes include:
* Improved socket timeout handling.
* Trace logging capabilities.
* Fix bug that prevented usage of GCS as a [Hadoop Credential 
Provider|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/CredentialProviderAPI.html].
* Dependency upgrades.
* Support OAuth2 based client authentication.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42887) Simple DataType interface

2023-03-21 Thread Rui Wang (Jira)
Rui Wang created SPARK-42887:


 Summary: Simple DataType interface
 Key: SPARK-42887
 URL: https://issues.apache.org/jira/browse/SPARK-42887
 Project: Spark
  Issue Type: Task
  Components: SQL
Affects Versions: 3.5.0
Reporter: Rui Wang
Assignee: Rui Wang


This JIRA proposes to move non public API from existing DataType class to make 
DataType become a simple interface. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42886) ClassNotFoundException: scala.math.Ordering$Reverse

2023-03-21 Thread Steve Chong (Jira)
Steve Chong created SPARK-42886:
---

 Summary: ClassNotFoundException: scala.math.Ordering$Reverse
 Key: SPARK-42886
 URL: https://issues.apache.org/jira/browse/SPARK-42886
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 3.3.2
 Environment: Development environment

MacBook Pro

Java JDK ibm-1.8-362

 
Reporter: Steve Chong


Hi,

We are using the spark-mllib_2.12 dependency in a Java project.

We are attempting to upgrade from version 3.3.1 to 3.3.2. This results in unit 
tests breaking with exception: ClassNotFoundException: 
scala.math.Ordering$Reverse

A change was made to add the class to the KyroSerializer  
https://issues.apache.org/jira/browse/SPARK-42071

scala.math.Ordering$Reverse was introduced int Scala 2.12.12. The maven 
dependency tree (mvn dependency:tree) shows that spark-mllib_2.12 brings in 
scala-library version 2.12.8. Therefore, it doesn't contain 
scala.math.Ordering$Reverse. 

If the scala-library transitive dependency is excluded from the POM and an 
explicit dependency declared on with version >=2.12.12, the tests will pass.

Should the scala-library version contained in 3.3.2 be upgraded to >=2.12.12?

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42838) Assign a name to the error class _LEGACY_ERROR_TEMP_2000

2023-03-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42838:


Assignee: (was: Apache Spark)

> Assign a name to the error class _LEGACY_ERROR_TEMP_2000
> 
>
> Key: SPARK-42838
> URL: https://issues.apache.org/jira/browse/SPARK-42838
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Max Gekk
>Priority: Minor
>  Labels: starter
>
> Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2000* defined in 
> {*}core/src/main/resources/error/error-classes.json{*}. The name should be 
> short but complete (look at the example in error-classes.json).
> Add a test which triggers the error from user code if such test still doesn't 
> exist. Check exception fields by using {*}checkError(){*}. The last function 
> checks valuable error fields only, and avoids dependencies from error text 
> message. In this way, tech editors can modify error format in 
> error-classes.json, and don't worry of Spark's internal tests. Migrate other 
> tests that might trigger the error onto checkError().
> If you cannot reproduce the error from user space (using SQL query), replace 
> the error by an internal error, see {*}SparkException.internalError(){*}.
> Improve the error message format in error-classes.json if the current is not 
> clear. Propose a solution to users how to avoid and fix such kind of errors.
> Please, look at the PR below as examples:
>  * [https://github.com/apache/spark/pull/38685]
>  * [https://github.com/apache/spark/pull/38656]
>  * [https://github.com/apache/spark/pull/38490]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42838) Assign a name to the error class _LEGACY_ERROR_TEMP_2000

2023-03-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17703286#comment-17703286
 ] 

Apache Spark commented on SPARK-42838:
--

User 'unical1988' has created a pull request for this issue:
https://github.com/apache/spark/pull/40468

> Assign a name to the error class _LEGACY_ERROR_TEMP_2000
> 
>
> Key: SPARK-42838
> URL: https://issues.apache.org/jira/browse/SPARK-42838
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Max Gekk
>Priority: Minor
>  Labels: starter
>
> Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2000* defined in 
> {*}core/src/main/resources/error/error-classes.json{*}. The name should be 
> short but complete (look at the example in error-classes.json).
> Add a test which triggers the error from user code if such test still doesn't 
> exist. Check exception fields by using {*}checkError(){*}. The last function 
> checks valuable error fields only, and avoids dependencies from error text 
> message. In this way, tech editors can modify error format in 
> error-classes.json, and don't worry of Spark's internal tests. Migrate other 
> tests that might trigger the error onto checkError().
> If you cannot reproduce the error from user space (using SQL query), replace 
> the error by an internal error, see {*}SparkException.internalError(){*}.
> Improve the error message format in error-classes.json if the current is not 
> clear. Propose a solution to users how to avoid and fix such kind of errors.
> Please, look at the PR below as examples:
>  * [https://github.com/apache/spark/pull/38685]
>  * [https://github.com/apache/spark/pull/38656]
>  * [https://github.com/apache/spark/pull/38490]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42838) Assign a name to the error class _LEGACY_ERROR_TEMP_2000

2023-03-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42838:


Assignee: Apache Spark

> Assign a name to the error class _LEGACY_ERROR_TEMP_2000
> 
>
> Key: SPARK-42838
> URL: https://issues.apache.org/jira/browse/SPARK-42838
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Max Gekk
>Assignee: Apache Spark
>Priority: Minor
>  Labels: starter
>
> Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2000* defined in 
> {*}core/src/main/resources/error/error-classes.json{*}. The name should be 
> short but complete (look at the example in error-classes.json).
> Add a test which triggers the error from user code if such test still doesn't 
> exist. Check exception fields by using {*}checkError(){*}. The last function 
> checks valuable error fields only, and avoids dependencies from error text 
> message. In this way, tech editors can modify error format in 
> error-classes.json, and don't worry of Spark's internal tests. Migrate other 
> tests that might trigger the error onto checkError().
> If you cannot reproduce the error from user space (using SQL query), replace 
> the error by an internal error, see {*}SparkException.internalError(){*}.
> Improve the error message format in error-classes.json if the current is not 
> clear. Propose a solution to users how to avoid and fix such kind of errors.
> Please, look at the PR below as examples:
>  * [https://github.com/apache/spark/pull/38685]
>  * [https://github.com/apache/spark/pull/38656]
>  * [https://github.com/apache/spark/pull/38490]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42885) Upgrade `kubernetes-client` to 6.5.1

2023-03-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42885:


Assignee: (was: Apache Spark)

> Upgrade `kubernetes-client` to 6.5.1
> 
>
> Key: SPARK-42885
> URL: https://issues.apache.org/jira/browse/SPARK-42885
> Project: Spark
>  Issue Type: Bug
>  Components: Build, Kubernetes
>Affects Versions: 3.5.0
>Reporter: Dongjoon Hyun
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42885) Upgrade `kubernetes-client` to 6.5.1

2023-03-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42885:


Assignee: Apache Spark

> Upgrade `kubernetes-client` to 6.5.1
> 
>
> Key: SPARK-42885
> URL: https://issues.apache.org/jira/browse/SPARK-42885
> Project: Spark
>  Issue Type: Bug
>  Components: Build, Kubernetes
>Affects Versions: 3.5.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42885) Upgrade `kubernetes-client` to 6.5.1

2023-03-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17703272#comment-17703272
 ] 

Apache Spark commented on SPARK-42885:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/40509

> Upgrade `kubernetes-client` to 6.5.1
> 
>
> Key: SPARK-42885
> URL: https://issues.apache.org/jira/browse/SPARK-42885
> Project: Spark
>  Issue Type: Bug
>  Components: Build, Kubernetes
>Affects Versions: 3.5.0
>Reporter: Dongjoon Hyun
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42885) Upgrade `kubernetes-client` to 6.5.1

2023-03-21 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-42885:
-

 Summary: Upgrade `kubernetes-client` to 6.5.1
 Key: SPARK-42885
 URL: https://issues.apache.org/jira/browse/SPARK-42885
 Project: Spark
  Issue Type: Bug
  Components: Build, Kubernetes
Affects Versions: 3.5.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42813) Print application info when waitAppCompletion is false

2023-03-21 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-42813:
-

Assignee: Cheng Pan

> Print application info when waitAppCompletion is false
> --
>
> Key: SPARK-42813
> URL: https://issues.apache.org/jira/browse/SPARK-42813
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.3.2
>Reporter: Cheng Pan
>Assignee: Cheng Pan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42813) Print application info when waitAppCompletion is false

2023-03-21 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-42813.
---
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 40444
[https://github.com/apache/spark/pull/40444]

> Print application info when waitAppCompletion is false
> --
>
> Key: SPARK-42813
> URL: https://issues.apache.org/jira/browse/SPARK-42813
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.3.2
>Reporter: Cheng Pan
>Assignee: Cheng Pan
>Priority: Major
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42884) Add Ammonite REPL support

2023-03-21 Thread Jira
Herman van Hövell created SPARK-42884:
-

 Summary: Add Ammonite REPL support
 Key: SPARK-42884
 URL: https://issues.apache.org/jira/browse/SPARK-42884
 Project: Spark
  Issue Type: New Feature
  Components: Connect
Affects Versions: 3.4.0
Reporter: Herman van Hövell
Assignee: Herman van Hövell






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42662) Add proto message for pandas API on Spark default index

2023-03-21 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-42662:

Description: Add `DistributedSequenceID` into proto message to support the 
distributed-sequence index of the pandas API on Spark in Spark Connect as well. 
 (was: Turn `withSequenceColumn` into PySpark internal API to support the 
distributed-sequence index of the pandas API on Spark in Spark Connect as well.)

> Add proto message for pandas API on Spark default index
> ---
>
> Key: SPARK-42662
> URL: https://issues.apache.org/jira/browse/SPARK-42662
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Pandas API on Spark, PySpark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Add `DistributedSequenceID` into proto message to support the 
> distributed-sequence index of the pandas API on Spark in Spark Connect as 
> well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42662) Add proto message for pandas API on Spark default index

2023-03-21 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-42662:

Summary: Add proto message for pandas API on Spark default index  (was: Add 
`_distributed_sequence_id` for distributed-sequence index.)

> Add proto message for pandas API on Spark default index
> ---
>
> Key: SPARK-42662
> URL: https://issues.apache.org/jira/browse/SPARK-42662
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Pandas API on Spark, PySpark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Turn `withSequenceColumn` into PySpark internal API to support the 
> distributed-sequence index of the pandas API on Spark in Spark Connect as 
> well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42808) Avoid getting availableProcessors every time in MapOutputTrackerMaster#getStatistics

2023-03-21 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-42808.
--
Fix Version/s: 3.5.0
 Assignee: dzcxzl
   Resolution: Fixed

Resolved by https://github.com/apache/spark/pull/40440

> Avoid getting availableProcessors every time in 
> MapOutputTrackerMaster#getStatistics
> 
>
> Key: SPARK-42808
> URL: https://issues.apache.org/jira/browse/SPARK-42808
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.2
>Reporter: dzcxzl
>Assignee: dzcxzl
>Priority: Minor
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42536) Upgrade log4j2 to 2.20.0

2023-03-21 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-42536.
--
Fix Version/s: 3.5.0
 Assignee: Yang Jie
   Resolution: Fixed

Resolved by https://github.com/apache/spark/pull/40490

> Upgrade log4j2 to 2.20.0
> 
>
> Key: SPARK-42536
> URL: https://issues.apache.org/jira/browse/SPARK-42536
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
> Fix For: 3.5.0
>
>
> [https://logging.apache.org/log4j/2.x/release-notes/2.20.0.html]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42851) EquivalentExpressions methods need to be consistently guarded by supportedExpression

2023-03-21 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-42851.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 40473
[https://github.com/apache/spark/pull/40473]

> EquivalentExpressions methods need to be consistently guarded by 
> supportedExpression
> 
>
> Key: SPARK-42851
> URL: https://issues.apache.org/jira/browse/SPARK-42851
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.2, 3.4.0
>Reporter: Kris Mok
>Assignee: Kris Mok
>Priority: Major
> Fix For: 3.4.0
>
>
> SPARK-41468 tried to fix a bug but introduced a new regression. Its change to 
> {{EquivalentExpressions}} added a {{supportedExpression()}} guard to the 
> {{addExprTree()}} and {{getExprState()}} methods, but didn't add the same 
> guard to the other "add" entry point -- {{addExpr()}}.
> As such, uses that add single expressions to CSE via {{addExpr()}} may 
> succeed, but upon retrieval via {{getExprState()}} it'd inconsistently get a 
> {{None}} due to failing the guard.
> We need to make sure the "add" and "get" methods are consistent. It could be 
> done by one of:
> 1. Adding the same {{supportedExpression()}} guard to {{addExpr()}}, or
> 2. Removing the guard from {{getExprState()}}, relying solely on the guard on 
> the "add" path to make sure only intended state is added.
> (or other alternative refactorings to fuse the guard into various methods to 
> make it more efficient)
> There are pros and cons to the two directions above, because {{addExpr()}} 
> used to allow (potentially incorrect) more expressions to get CSE'd, making 
> it more restrictive may cause performance regressions (for the cases that 
> happened to work).
> Example:
> {code:sql}
> select max(transform(array(id), x -> x)), max(transform(array(id), x -> x)) 
> from range(2)
> {code}
> Running this query on Spark 3.2 branch returns the correct value:
> {code}
> scala> spark.sql("select max(transform(array(id), x -> x)), 
> max(transform(array(id), x -> x)) from range(2)").collect
> res0: Array[org.apache.spark.sql.Row] = 
> Array([WrappedArray(1),WrappedArray(1)])
> {code}
> Here, {{transform(array(id), x -> x)}} is an {{AggregateExpression}} that was 
> (potentially unsafely) recognized by {{addExpr()}} as a common subexpression, 
> and {{getExprState()}} doesn't do extra guarding, so during physical 
> planning, in {{PhysicalAggregation}} this expression gets CSE'd in both the 
> aggregation expression list and the result expressions list.
> {code}
> AdaptiveSparkPlan isFinalPlan=false
> +- SortAggregate(key=[], functions=[max(transform(array(id#0L), 
> lambdafunction(lambda x#1L, lambda x#1L, false)))])
>+- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=11]
>   +- SortAggregate(key=[], functions=[partial_max(transform(array(id#0L), 
> lambdafunction(lambda x#1L, lambda x#1L, false)))])
>  +- Range (0, 2, step=1, splits=16)
> {code}
> Running the same query on current master triggers an error when binding the 
> result expression to the aggregate expression in the Aggregate operators (for 
> a WSCG-enabled operator like {{HashAggregateExec}}, the same error would show 
> up during codegen):
> {code}
> ERROR TaskSetManager: Task 0 in stage 2.0 failed 1 times; aborting job
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
> stage 2.0 failed 1 times, most recent failure: Lost task 0.0 in stage 2.0 
> (TID 16) (ip-10-110-16-93.us-west-2.compute.internal executor driver): 
> java.lang.IllegalStateException: Couldn't find max(transform(array(id#0L), 
> lambdafunction(lambda x#2L, lambda x#2L, false)))#4 in 
> [max(transform(array(id#0L), lambdafunction(lambda x#1L, lambda x#1L, 
> false)))#3]
>   at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:80)
>   at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:73)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:512)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:104)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:512)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$3(TreeNode.scala:517)
>   at 
> org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1249)
>   at 
> org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1248)
>   at 
> 

[jira] [Assigned] (SPARK-42851) EquivalentExpressions methods need to be consistently guarded by supportedExpression

2023-03-21 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-42851:
---

Assignee: Kris Mok

> EquivalentExpressions methods need to be consistently guarded by 
> supportedExpression
> 
>
> Key: SPARK-42851
> URL: https://issues.apache.org/jira/browse/SPARK-42851
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.2, 3.4.0
>Reporter: Kris Mok
>Assignee: Kris Mok
>Priority: Major
>
> SPARK-41468 tried to fix a bug but introduced a new regression. Its change to 
> {{EquivalentExpressions}} added a {{supportedExpression()}} guard to the 
> {{addExprTree()}} and {{getExprState()}} methods, but didn't add the same 
> guard to the other "add" entry point -- {{addExpr()}}.
> As such, uses that add single expressions to CSE via {{addExpr()}} may 
> succeed, but upon retrieval via {{getExprState()}} it'd inconsistently get a 
> {{None}} due to failing the guard.
> We need to make sure the "add" and "get" methods are consistent. It could be 
> done by one of:
> 1. Adding the same {{supportedExpression()}} guard to {{addExpr()}}, or
> 2. Removing the guard from {{getExprState()}}, relying solely on the guard on 
> the "add" path to make sure only intended state is added.
> (or other alternative refactorings to fuse the guard into various methods to 
> make it more efficient)
> There are pros and cons to the two directions above, because {{addExpr()}} 
> used to allow (potentially incorrect) more expressions to get CSE'd, making 
> it more restrictive may cause performance regressions (for the cases that 
> happened to work).
> Example:
> {code:sql}
> select max(transform(array(id), x -> x)), max(transform(array(id), x -> x)) 
> from range(2)
> {code}
> Running this query on Spark 3.2 branch returns the correct value:
> {code}
> scala> spark.sql("select max(transform(array(id), x -> x)), 
> max(transform(array(id), x -> x)) from range(2)").collect
> res0: Array[org.apache.spark.sql.Row] = 
> Array([WrappedArray(1),WrappedArray(1)])
> {code}
> Here, {{transform(array(id), x -> x)}} is an {{AggregateExpression}} that was 
> (potentially unsafely) recognized by {{addExpr()}} as a common subexpression, 
> and {{getExprState()}} doesn't do extra guarding, so during physical 
> planning, in {{PhysicalAggregation}} this expression gets CSE'd in both the 
> aggregation expression list and the result expressions list.
> {code}
> AdaptiveSparkPlan isFinalPlan=false
> +- SortAggregate(key=[], functions=[max(transform(array(id#0L), 
> lambdafunction(lambda x#1L, lambda x#1L, false)))])
>+- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=11]
>   +- SortAggregate(key=[], functions=[partial_max(transform(array(id#0L), 
> lambdafunction(lambda x#1L, lambda x#1L, false)))])
>  +- Range (0, 2, step=1, splits=16)
> {code}
> Running the same query on current master triggers an error when binding the 
> result expression to the aggregate expression in the Aggregate operators (for 
> a WSCG-enabled operator like {{HashAggregateExec}}, the same error would show 
> up during codegen):
> {code}
> ERROR TaskSetManager: Task 0 in stage 2.0 failed 1 times; aborting job
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
> stage 2.0 failed 1 times, most recent failure: Lost task 0.0 in stage 2.0 
> (TID 16) (ip-10-110-16-93.us-west-2.compute.internal executor driver): 
> java.lang.IllegalStateException: Couldn't find max(transform(array(id#0L), 
> lambdafunction(lambda x#2L, lambda x#2L, false)))#4 in 
> [max(transform(array(id#0L), lambdafunction(lambda x#1L, lambda x#1L, 
> false)))#3]
>   at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:80)
>   at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:73)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:512)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:104)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:512)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$3(TreeNode.scala:517)
>   at 
> org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1249)
>   at 
> org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1248)
>   at 
> org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:532)
>   at 
> 

[jira] [Commented] (SPARK-32893) Structured Streaming and Dynamic Allocation on StandaloneCluster

2023-03-21 Thread Ranga Reddy (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17703182#comment-17703182
 ] 

Ranga Reddy commented on SPARK-32893:
-

Similar kind of issue is created for 
[Kubernetes|https://issues.apache.org/jira/issues/?jql=project+%3D+SPARK+AND+component+%3D+Kubernetes]
 - https://issues.apache.org/jira/browse/SPARK-35625

> Structured Streaming and Dynamic Allocation on StandaloneCluster
> 
>
> Key: SPARK-32893
> URL: https://issues.apache.org/jira/browse/SPARK-32893
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.0.1
>Reporter: Duarte Ferreira
>Priority: Major
>
> We are currently using Spark 3.0.1 Standalone cluster to run our Structured 
> streaming applications.
> We set the following configurations when running the application in cluster 
> mode:
>  * spark.dynamicAllocation.enabled = true
>  * spark.shuffle.service.enabled = true
>  * spark.cores.max =5
>  * spark.executor.memory = 1G
>  * spark.executor.cores = 1
> We also have the configurations set to enable spark.shuffle.service.enabled 
> on each worker and have a cluster composed of 1 master and 2 slaves.
> The application reads data from a kafka Topic (readTopic) using [This 
> documentation, 
> |https://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html]applies
>  some transformations on the DataSet using spark SQL and writes data to 
> another Kafka Topic (writeTopic).
> When we start the application it behaves correctly, it starts with 0 
> executors and. as we start feeding data to the readTopic, it starts 
> increasing the number of executors until it reaches the 5 executors limit and 
> all messages are transformed and written to the writeTopic in Kafka.
> If we stop feeding messages to the readTopic the application will work as 
> expected and starts killing executors that are not needed anymore until we 
> stop sending data completely and it reach 0 executors running.
> If we start sending data again right away, it behaves just as expected it 
> starts increasing the numbers of executors again. But if we leave the 
> application in idle at 0 executors for around 10 minutes we start getting 
> errors like this:
> {noformat}
> *no*
> 20/09/15 10:41:22 ERROR TransportClient: Failed to send RPC RPC 
> 7570256331800450365 to sparkmaster/10.0.12.231:7077: 
> java.nio.channels.ClosedChannelException
> java.nio.channels.ClosedChannelException
>   at 
> io.netty.channel.AbstractChannel$AbstractUnsafe.newClosedChannelException(AbstractChannel.java:957)
>   at 
> io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:865)
>   at 
> io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1367)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:717)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:764)
>   at 
> io.netty.channel.AbstractChannelHandlerContext$WriteTask.run(AbstractChannelHandlerContext.java:1104)
>   at 
> io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
>   at 
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
>   at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500)
>   at 
> io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
>   at 
> io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
>   at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.IOException: Connection reset by peer
>   at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
>   at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
>   at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
>   at sun.nio.ch.IOUtil.write(IOUtil.java:65)
>   at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:468)
>   at 
> org.apache.spark.network.protocol.MessageWithHeader.copyByteBuf(MessageWithHeader.java:148)
>   at 
> org.apache.spark.network.protocol.MessageWithHeader.transferTo(MessageWithHeader.java:123)
>   at 
> io.netty.channel.socket.nio.NioSocketChannel.doWriteFileRegion(NioSocketChannel.java:362)
>   at 
> io.netty.channel.nio.AbstractNioByteChannel.doWriteInternal(AbstractNioByteChannel.java:235)
>   at 
> io.netty.channel.nio.AbstractNioByteChannel.doWrite0(AbstractNioByteChannel.java:209)
>   at 
> io.netty.channel.socket.nio.NioSocketChannel.doWrite(NioSocketChannel.java:400)
>   at 
> 

[jira] [Commented] (SPARK-32893) Structured Streaming and Dynamic Allocation on StandaloneCluster

2023-03-21 Thread Ranga Reddy (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17703177#comment-17703177
 ] 

Ranga Reddy commented on SPARK-32893:
-

I can see similar behaviour Spark Structured Streaming with Yarn.
{code:java}
2023-03-14 18:17:29 ERROR TransportClient:337 - Failed to send RPC RPC 
7955407071046657873 to /127.0.0.1:50040: 
java.nio.channels.ClosedChannelException
java.nio.channels.ClosedChannelException
        at 
io.netty.channel.AbstractChannel$AbstractUnsafe.newClosedChannelException(AbstractChannel.java:957)
        at 
io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:865)
        at 
io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1367)
        at 
io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:717)
        at 
io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:764)
        at 
io.netty.channel.AbstractChannelHandlerContext$WriteTask.run(AbstractChannelHandlerContext.java:1104)
        at 
io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
        at 
io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500)
        at 
io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
        at 
io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
        at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.lang.Thread.run(Thread.java:748) {code}

> Structured Streaming and Dynamic Allocation on StandaloneCluster
> 
>
> Key: SPARK-32893
> URL: https://issues.apache.org/jira/browse/SPARK-32893
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.0.1
>Reporter: Duarte Ferreira
>Priority: Major
>
> We are currently using Spark 3.0.1 Standalone cluster to run our Structured 
> streaming applications.
> We set the following configurations when running the application in cluster 
> mode:
>  * spark.dynamicAllocation.enabled = true
>  * spark.shuffle.service.enabled = true
>  * spark.cores.max =5
>  * spark.executor.memory = 1G
>  * spark.executor.cores = 1
> We also have the configurations set to enable spark.shuffle.service.enabled 
> on each worker and have a cluster composed of 1 master and 2 slaves.
> The application reads data from a kafka Topic (readTopic) using [This 
> documentation, 
> |https://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html]applies
>  some transformations on the DataSet using spark SQL and writes data to 
> another Kafka Topic (writeTopic).
> When we start the application it behaves correctly, it starts with 0 
> executors and. as we start feeding data to the readTopic, it starts 
> increasing the number of executors until it reaches the 5 executors limit and 
> all messages are transformed and written to the writeTopic in Kafka.
> If we stop feeding messages to the readTopic the application will work as 
> expected and starts killing executors that are not needed anymore until we 
> stop sending data completely and it reach 0 executors running.
> If we start sending data again right away, it behaves just as expected it 
> starts increasing the numbers of executors again. But if we leave the 
> application in idle at 0 executors for around 10 minutes we start getting 
> errors like this:
> {noformat}
> *no*
> 20/09/15 10:41:22 ERROR TransportClient: Failed to send RPC RPC 
> 7570256331800450365 to sparkmaster/10.0.12.231:7077: 
> java.nio.channels.ClosedChannelException
> java.nio.channels.ClosedChannelException
>   at 
> io.netty.channel.AbstractChannel$AbstractUnsafe.newClosedChannelException(AbstractChannel.java:957)
>   at 
> io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:865)
>   at 
> io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1367)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:717)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:764)
>   at 
> io.netty.channel.AbstractChannelHandlerContext$WriteTask.run(AbstractChannelHandlerContext.java:1104)
>   at 
> io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
>   at 
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
>   at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500)
> 

[jira] [Commented] (SPARK-41006) ConfigMap has the same name when launching two pods on the same namespace

2023-03-21 Thread Cedric van Eetvelde (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17703145#comment-17703145
 ] 

Cedric van Eetvelde commented on SPARK-41006:
-

As mentionned above, I create a Pull Request with the correction. I updated the 
Unit Tests accordingly.

> ConfigMap has the same name when launching two pods on the same namespace
> -
>
> Key: SPARK-41006
> URL: https://issues.apache.org/jira/browse/SPARK-41006
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.1.0, 3.2.0, 3.3.0
>Reporter: Eric
>Priority: Minor
>
> If we use the Spark Launcher to launch our spark apps in k8s:
> {code:java}
> val sparkLauncher = new InProcessLauncher()
>  .setMaster(k8sMaster)
>  .setDeployMode(deployMode)
>  .setAppName(appName)
>  .setVerbose(true)
> sparkLauncher.startApplication(new SparkAppHandle.Listener { ...{code}
> We have an issue when we launch another spark driver in the same namespace 
> where other spark app was running:
> {code:java}
> kp -n audit-exporter-eee5073aac -w
> NAME                                     READY   STATUS        RESTARTS   AGE
> audit-exporter-71489e843d8085c0-driver   1/1     Running       0          
> 9m54s
> audit-exporter-7e6b8b843d80b9e6-exec-1   1/1     Running       0          
> 9m40s
> data-io-120204843d899567-driver          0/1     Terminating   0          1s
> data-io-120204843d899567-driver          0/1     Terminating   0          2s
> data-io-120204843d899567-driver          0/1     Terminating   0          3s
> data-io-120204843d899567-driver          0/1     Terminating   0          
> 3s{code}
> The error is:
> {code:java}
> {"time":"2022-11-03T12:49:45.626Z","lvl":"WARN","logger":"o.a.s.l.InProcessAppHandle","thread":"spark-app-38:
>  'data-io'","msg":"Application failed with 
> exception.","stack_trace":"io.fabric8.kubernetes.client.KubernetesClientException:
>  Failure executing: PUT at: 
> https://kubernetes.default/api/v1/namespaces/audit-exporter-eee5073aac/configmaps/spark-drv-d19c37843d80350c-conf-map.
>  Message: ConfigMap \"spark-drv-d19c37843d80350c-conf-map\" is invalid: data: 
> Forbidden: field is immutable when `immutable` is set. Received status: 
> Status(apiVersion=v1, code=422, 
> details=StatusDetails(causes=[StatusCause(field=data, message=Forbidden: 
> field is immutable when `immutable` is set, reason=FieldValueForbidden, 
> additionalProperties={})], group=null, kind=ConfigMap, 
> name=spark-drv-d19c37843d80350c-conf-map, retryAfterSeconds=null, uid=null, 
> additionalProperties={}), kind=Status, message=ConfigMap 
> \"spark-drv-d19c37843d80350c-conf-map\" is invalid: data: Forbidden: field is 
> immutable when `immutable` is set, metadata=ListMeta(_continue=null, 
> remainingItemCount=null, resourceVersion=null, selfLink=null, 
> additionalProperties={}), reason=Invalid, status=Failure, 
> additionalProperties={}).\n\tat 
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:682)\n\tat
>  
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:661)\n\tat
>  
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:612)\n\tat
>  
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:555)\n\tat
>  
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:518)\n\tat
>  
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleUpdate(OperationSupport.java:342)\n\tat
>  
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleUpdate(OperationSupport.java:322)\n\tat
>  
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleUpdate(BaseOperation.java:649)\n\tat
>  
> io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.lambda$replace$1(HasMetadataOperation.java:195)\n\tat
>  
> io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation$$Lambda$5360/00.apply(Unknown
>  Source)\n\tat 
> io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:200)\n\tat
>  
> io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:141)\n\tat
>  
> io.fabric8.kubernetes.client.dsl.base.BaseOperation$$Lambda$4618/00.apply(Unknown
>  Source)\n\tat 
> io.fabric8.kubernetes.client.utils.CreateOrReplaceHelper.replace(CreateOrReplaceHelper.java:69)\n\tat
>  
> io.fabric8.kubernetes.client.utils.CreateOrReplaceHelper.createOrReplace(CreateOrReplaceHelper.java:61)\n\tat
>  
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.createOrReplace(BaseOperation.java:318)\n\tat
>  
> 

[jira] [Comment Edited] (SPARK-41006) ConfigMap has the same name when launching two pods on the same namespace

2023-03-21 Thread Cedric van Eetvelde (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17703145#comment-17703145
 ] 

Cedric van Eetvelde edited comment on SPARK-41006 at 3/21/23 11:03 AM:
---

As mentionned above, I created a Pull Request with the correction. I updated 
the Unit Tests accordingly.


was (Author: JIRAUSER299387):
As mentionned above, I create a Pull Request with the correction. I updated the 
Unit Tests accordingly.

> ConfigMap has the same name when launching two pods on the same namespace
> -
>
> Key: SPARK-41006
> URL: https://issues.apache.org/jira/browse/SPARK-41006
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.1.0, 3.2.0, 3.3.0
>Reporter: Eric
>Priority: Minor
>
> If we use the Spark Launcher to launch our spark apps in k8s:
> {code:java}
> val sparkLauncher = new InProcessLauncher()
>  .setMaster(k8sMaster)
>  .setDeployMode(deployMode)
>  .setAppName(appName)
>  .setVerbose(true)
> sparkLauncher.startApplication(new SparkAppHandle.Listener { ...{code}
> We have an issue when we launch another spark driver in the same namespace 
> where other spark app was running:
> {code:java}
> kp -n audit-exporter-eee5073aac -w
> NAME                                     READY   STATUS        RESTARTS   AGE
> audit-exporter-71489e843d8085c0-driver   1/1     Running       0          
> 9m54s
> audit-exporter-7e6b8b843d80b9e6-exec-1   1/1     Running       0          
> 9m40s
> data-io-120204843d899567-driver          0/1     Terminating   0          1s
> data-io-120204843d899567-driver          0/1     Terminating   0          2s
> data-io-120204843d899567-driver          0/1     Terminating   0          3s
> data-io-120204843d899567-driver          0/1     Terminating   0          
> 3s{code}
> The error is:
> {code:java}
> {"time":"2022-11-03T12:49:45.626Z","lvl":"WARN","logger":"o.a.s.l.InProcessAppHandle","thread":"spark-app-38:
>  'data-io'","msg":"Application failed with 
> exception.","stack_trace":"io.fabric8.kubernetes.client.KubernetesClientException:
>  Failure executing: PUT at: 
> https://kubernetes.default/api/v1/namespaces/audit-exporter-eee5073aac/configmaps/spark-drv-d19c37843d80350c-conf-map.
>  Message: ConfigMap \"spark-drv-d19c37843d80350c-conf-map\" is invalid: data: 
> Forbidden: field is immutable when `immutable` is set. Received status: 
> Status(apiVersion=v1, code=422, 
> details=StatusDetails(causes=[StatusCause(field=data, message=Forbidden: 
> field is immutable when `immutable` is set, reason=FieldValueForbidden, 
> additionalProperties={})], group=null, kind=ConfigMap, 
> name=spark-drv-d19c37843d80350c-conf-map, retryAfterSeconds=null, uid=null, 
> additionalProperties={}), kind=Status, message=ConfigMap 
> \"spark-drv-d19c37843d80350c-conf-map\" is invalid: data: Forbidden: field is 
> immutable when `immutable` is set, metadata=ListMeta(_continue=null, 
> remainingItemCount=null, resourceVersion=null, selfLink=null, 
> additionalProperties={}), reason=Invalid, status=Failure, 
> additionalProperties={}).\n\tat 
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:682)\n\tat
>  
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:661)\n\tat
>  
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:612)\n\tat
>  
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:555)\n\tat
>  
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:518)\n\tat
>  
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleUpdate(OperationSupport.java:342)\n\tat
>  
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleUpdate(OperationSupport.java:322)\n\tat
>  
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleUpdate(BaseOperation.java:649)\n\tat
>  
> io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.lambda$replace$1(HasMetadataOperation.java:195)\n\tat
>  
> io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation$$Lambda$5360/00.apply(Unknown
>  Source)\n\tat 
> io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:200)\n\tat
>  
> io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:141)\n\tat
>  
> io.fabric8.kubernetes.client.dsl.base.BaseOperation$$Lambda$4618/00.apply(Unknown
>  Source)\n\tat 
> io.fabric8.kubernetes.client.utils.CreateOrReplaceHelper.replace(CreateOrReplaceHelper.java:69)\n\tat
>  
> io.fabric8.kubernetes.client.utils.CreateOrReplaceHelper.createOrReplace(CreateOrReplaceHelper.java:61)\n\tat
>  
> 

[jira] [Comment Edited] (SPARK-40327) Increase pandas API coverage for pandas API on Spark

2023-03-21 Thread Xinrong Meng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17703123#comment-17703123
 ] 

Xinrong Meng edited comment on SPARK-40327 at 3/21/23 9:48 AM:
---

All resolved issues are moved to 
https://issues.apache.org/jira/browse/SPARK-42882 for clarity and references in 
the release note.

The version is also modified to Spark 3.5.0.


was (Author: xinrongm):
Hi, all resolved issues are moved to 
https://issues.apache.org/jira/browse/SPARK-42882 for clarity and references in 
the release note.

> Increase pandas API coverage for pandas API on Spark
> 
>
> Key: SPARK-40327
> URL: https://issues.apache.org/jira/browse/SPARK-40327
> Project: Spark
>  Issue Type: Umbrella
>  Components: Pandas API on Spark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Increasing the pandas API coverage for Apache Spark 3.4.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40327) Increase pandas API coverage for pandas API on Spark

2023-03-21 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-40327:
-
Affects Version/s: 3.5.0
   (was: 3.4.0)

> Increase pandas API coverage for pandas API on Spark
> 
>
> Key: SPARK-40327
> URL: https://issues.apache.org/jira/browse/SPARK-40327
> Project: Spark
>  Issue Type: Umbrella
>  Components: Pandas API on Spark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Priority: Major
> Fix For: 3.4.0
>
>
> Increasing the pandas API coverage for Apache Spark 3.4.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40327) Increase pandas API coverage for pandas API on Spark

2023-03-21 Thread Xinrong Meng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17703123#comment-17703123
 ] 

Xinrong Meng commented on SPARK-40327:
--

Hi, all resolved issues are moved to 
https://issues.apache.org/jira/browse/SPARK-42882 for clarity and references in 
the release note.

> Increase pandas API coverage for pandas API on Spark
> 
>
> Key: SPARK-40327
> URL: https://issues.apache.org/jira/browse/SPARK-40327
> Project: Spark
>  Issue Type: Umbrella
>  Components: Pandas API on Spark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
> Fix For: 3.4.0
>
>
> Increasing the pandas API coverage for Apache Spark 3.4.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40327) Increase pandas API coverage for pandas API on Spark

2023-03-21 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-40327:
-
Fix Version/s: (was: 3.4.0)

> Increase pandas API coverage for pandas API on Spark
> 
>
> Key: SPARK-40327
> URL: https://issues.apache.org/jira/browse/SPARK-40327
> Project: Spark
>  Issue Type: Umbrella
>  Components: Pandas API on Spark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Increasing the pandas API coverage for Apache Spark 3.4.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40340) Implement `Expanding.sem`.

2023-03-21 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-40340:
-
Parent: SPARK-40327  (was: SPARK-42882)

> Implement `Expanding.sem`.
> --
>
> Key: SPARK-40340
> URL: https://issues.apache.org/jira/browse/SPARK-40340
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>
> We should implement `Expanding.sem` for increasing pandas API coverage.
> pandas docs: 
> https://pandas.pydata.org/docs/reference/api/pandas.core.window.expanding.Expanding.sem.html



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40341) Implement `Rolling.median`.

2023-03-21 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-40341:
-
Parent: SPARK-40327  (was: SPARK-42882)

> Implement `Rolling.median`.
> ---
>
> Key: SPARK-40341
> URL: https://issues.apache.org/jira/browse/SPARK-40341
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Yikun Jiang
>Priority: Major
>
> We should implement `Rolling.median` for increasing pandas API coverage.
> pandas docs: 
> https://pandas.pydata.org/docs/reference/api/pandas.core.window.rolling.Rolling.median.html



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39199) Implement pandas API missing parameters

2023-03-21 Thread Xinrong Meng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17703121#comment-17703121
 ] 

Xinrong Meng commented on SPARK-39199:
--

Please see https://issues.apache.org/jira/browse/SPARK-42883

> Implement pandas API missing parameters
> ---
>
> Key: SPARK-39199
> URL: https://issues.apache.org/jira/browse/SPARK-39199
> Project: Spark
>  Issue Type: Umbrella
>  Components: Pandas API on Spark, PySpark
>Affects Versions: 3.3.0, 3.3.1, 3.4.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42883) Implement Pandas API Missing Parameters

2023-03-21 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng resolved SPARK-42883.
--
Resolution: Resolved

> Implement Pandas API Missing Parameters
> ---
>
> Key: SPARK-42883
> URL: https://issues.apache.org/jira/browse/SPARK-42883
> Project: Spark
>  Issue Type: Umbrella
>  Components: Pandas API on Spark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
>
> pandas API on Spark aims to make pandas code work on Spark clusters without 
> any changes. So full API coverage has been one of our major goals. Currently, 
> most pandas functions are implemented, whereas some of them are have 
> incomplete parameters support.
> There are some common parameters missing (resolved):
>  * How to do with NAs   
>  * Filter data types    
>  * Control result length    
>  * Reindex result   
> There are remaining missing parameters to implement (see doc below).
> See the design and the current status at 
> [https://docs.google.com/document/d/1H6RXL6oc-v8qLJbwKl6OEqBjRuMZaXcTYmrZb9yNm5I/edit?usp=sharing].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38552) Implement `keep` parameter of `frame.nlargest/nsmallest` to decide how to resolve ties

2023-03-21 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-38552:
-
Parent: SPARK-42883  (was: SPARK-39199)

> Implement `keep` parameter of `frame.nlargest/nsmallest` to decide how to 
> resolve ties
> --
>
> Key: SPARK-38552
> URL: https://issues.apache.org/jira/browse/SPARK-38552
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
> Fix For: 3.4.0
>
>
> Implement `keep` parameter of `frame.nlargest/nsmallest` to decide how to 
> resolve ties



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42882) Pandas API Coverage Improvements

2023-03-21 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng resolved SPARK-42882.
--
Resolution: Resolved

> Pandas API Coverage Improvements
> 
>
> Key: SPARK-42882
> URL: https://issues.apache.org/jira/browse/SPARK-42882
> Project: Spark
>  Issue Type: Epic
>  Components: Pandas API on Spark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Pandas API on Spark aims to make pandas code work on Spark clusters without 
> any changes. So full API coverage has been one of our major goals. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   >