date:20231120



[ 
https://issues.apache.org/jira/browse/SPARK-45649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17788306#comment-17788306
 ] 

Wenchen Fan commented on SPARK-45649:
-

This has a bug. I've uploaded the test data. You can reproduce the bug via
{code:java}
spark.read.parquet("/tmp/test_table.parquet").createOrReplaceTempView("test_table")
spark.sql("SELECT nullable_struct_col, id, lag(long_mixedparts, 5) OVER 
(PARTITION BY nullable_struct_col ORDER BY id ) FROM test_table").collect {code}

> Unify the prepare framework for `OffsetWindowFunctionFrame`
> ---
>
> Key: SPARK-45649
> URL: https://issues.apache.org/jira/browse/SPARK-45649
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Jiaan Geng
>Assignee: Jiaan Geng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: test_table.parquet.zip
>
>
> Currently, the implementation the `prepare` of  all the 
> `OffsetWindowFunctionFrame` have the same code logic show below.
> ```
>   override def prepare(rows: ExternalAppendOnlyUnsafeRowArray): Unit = {
> if (offset > rows.length) {
>   fillDefaultValue(EmptyRow)
> } else {
>   resetStates(rows)
>   if (ignoreNulls) {
> ...
>   } else {
> ...
>   }
> }
>   }
> ```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45649) Unify the prepare framework for `OffsetWindowFunctionFrame`



 [ 
https://issues.apache.org/jira/browse/SPARK-45649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan updated SPARK-45649:

Attachment: test_table.parquet.zip

> Unify the prepare framework for `OffsetWindowFunctionFrame`
> ---
>
> Key: SPARK-45649
> URL: https://issues.apache.org/jira/browse/SPARK-45649
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Jiaan Geng
>Assignee: Jiaan Geng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: test_table.parquet.zip
>
>
> Currently, the implementation the `prepare` of  all the 
> `OffsetWindowFunctionFrame` have the same code logic show below.
> ```
>   override def prepare(rows: ExternalAppendOnlyUnsafeRowArray): Unit = {
> if (offset > rows.length) {
>   fillDefaultValue(EmptyRow)
> } else {
>   resetStates(rows)
>   if (ignoreNulls) {
> ...
>   } else {
> ...
>   }
> }
>   }
> ```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-45649) Unify the prepare framework for `OffsetWindowFunctionFrame`



[ 
https://issues.apache.org/jira/browse/SPARK-45649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17788307#comment-17788307
 ] 

Wenchen Fan commented on SPARK-45649:
-

I'm reverting the commit.

> Unify the prepare framework for `OffsetWindowFunctionFrame`
> ---
>
> Key: SPARK-45649
> URL: https://issues.apache.org/jira/browse/SPARK-45649
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Jiaan Geng
>Assignee: Jiaan Geng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: test_table.parquet.zip
>
>
> Currently, the implementation the `prepare` of  all the 
> `OffsetWindowFunctionFrame` have the same code logic show below.
> ```
>   override def prepare(rows: ExternalAppendOnlyUnsafeRowArray): Unit = {
> if (offset > rows.length) {
>   fillDefaultValue(EmptyRow)
> } else {
>   resetStates(rows)
>   if (ignoreNulls) {
> ...
>   } else {
> ...
>   }
> }
>   }
> ```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Reopened] (SPARK-45649) Unify the prepare framework for `OffsetWindowFunctionFrame`



 [ 
https://issues.apache.org/jira/browse/SPARK-45649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reopened SPARK-45649:
-

> Unify the prepare framework for `OffsetWindowFunctionFrame`
> ---
>
> Key: SPARK-45649
> URL: https://issues.apache.org/jira/browse/SPARK-45649
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Jiaan Geng
>Assignee: Jiaan Geng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: test_table.parquet.zip
>
>
> Currently, the implementation the `prepare` of  all the 
> `OffsetWindowFunctionFrame` have the same code logic show below.
> ```
>   override def prepare(rows: ExternalAppendOnlyUnsafeRowArray): Unit = {
> if (offset > rows.length) {
>   fillDefaultValue(EmptyRow)
> } else {
>   resetStates(rows)
>   if (ignoreNulls) {
> ...
>   } else {
> ...
>   }
> }
>   }
> ```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46027) Add `Python 3.12` to the Daily Python Github Action job



 [ 
https://issues.apache.org/jira/browse/SPARK-46027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46027:
---
Labels: pull-request-available  (was: )

> Add `Python 3.12` to the Daily Python Github Action job
> ---
>
> Key: SPARK-46027
> URL: https://issues.apache.org/jira/browse/SPARK-46027
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-46027) Add `Python 3.12` to the Daily Python Github Action job

Hyukjin Kwon created SPARK-46027:


 Summary: Add `Python 3.12` to the Daily Python Github Action job
 Key: SPARK-46027
 URL: https://issues.apache.org/jira/browse/SPARK-46027
 Project: Spark
  Issue Type: Sub-task
  Components: Project Infra, PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-46004) Refine docstring of `DataFrame.dropna/fillna/replace`



 [ 
https://issues.apache.org/jira/browse/SPARK-46004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-46004:


Assignee: BingKun Pan

> Refine docstring of `DataFrame.dropna/fillna/replace`
> -
>
> Key: SPARK-46004
> URL: https://issues.apache.org/jira/browse/SPARK-46004
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-46004) Refine docstring of `DataFrame.dropna/fillna/replace`



 [ 
https://issues.apache.org/jira/browse/SPARK-46004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46004.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43907
[https://github.com/apache/spark/pull/43907]

> Refine docstring of `DataFrame.dropna/fillna/replace`
> -
>
> Key: SPARK-46004
> URL: https://issues.apache.org/jira/browse/SPARK-46004
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46026) Refine docstring of UDTF



 [ 
https://issues.apache.org/jira/browse/SPARK-46026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46026:
---
Labels: pull-request-available  (was: )

> Refine docstring of UDTF
> 
>
> Key: SPARK-46026
> URL: https://issues.apache.org/jira/browse/SPARK-46026
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-46026) Refine docstring of UDTF

Hyukjin Kwon created SPARK-46026:


 Summary: Refine docstring of UDTF
 Key: SPARK-46026
 URL: https://issues.apache.org/jira/browse/SPARK-46026
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Deleted] (SPARK-46025) Support Python 3.12 in PySpark



 [ 
https://issues.apache.org/jira/browse/SPARK-46025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon deleted SPARK-46025:
-


> Support Python 3.12 in PySpark
> --
>
> Key: SPARK-46025
> URL: https://issues.apache.org/jira/browse/SPARK-46025
> Project: Spark
>  Issue Type: Improvement
>Reporter: Hyukjin Kwon
>Priority: Major
>
> Python 3.12 is released out. We should make sure the tests pass, and mark it 
> supported in setup.py



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46025) Support Python 3.12 in PySpark



 [ 
https://issues.apache.org/jira/browse/SPARK-46025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-46025:
-
Issue Type: Improvement  (was: Bug)

> Support Python 3.12 in PySpark
> --
>
> Key: SPARK-46025
> URL: https://issues.apache.org/jira/browse/SPARK-46025
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> Python 3.12 is released out. We should make sure the tests pass, and mark it 
> supported in setup.py



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-46025) Support Python 3.12 in PySpark

Hyukjin Kwon created SPARK-46025:


 Summary: Support Python 3.12 in PySpark
 Key: SPARK-46025
 URL: https://issues.apache.org/jira/browse/SPARK-46025
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


Python 3.12 is released out. We should make sure the tests pass, and mark it 
supported in setup.py



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46024) Document parameters and examples for RuntimeConf get, set and unset



 [ 
https://issues.apache.org/jira/browse/SPARK-46024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46024:
---
Labels: pull-request-available  (was: )

> Document parameters and examples for RuntimeConf get, set and unset
> ---
>
> Key: SPARK-46024
> URL: https://issues.apache.org/jira/browse/SPARK-46024
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46021) Support canceling future jobs belonging to a certain job group on `cancelJobGroup` call



 [ 
https://issues.apache.org/jira/browse/SPARK-46021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46021:
---
Labels: pull-request-available  (was: )

> Support canceling future jobs belonging to a certain job group on 
> `cancelJobGroup` call
> ---
>
> Key: SPARK-46021
> URL: https://issues.apache.org/jira/browse/SPARK-46021
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Xinyi Yu
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-46024) Document parameters and examples for RuntimeConf get, set and unset

Hyukjin Kwon created SPARK-46024:


 Summary: Document parameters and examples for RuntimeConf get, set 
and unset
 Key: SPARK-46024
 URL: https://issues.apache.org/jira/browse/SPARK-46024
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-45311) Encoder fails on many "NoSuchElementException: None.get" since 3.4.x, search for an encoder for a generic type, and since 3.5.x isn't "an expression encoder"

2023-11-20 Thread Marc Le Bihan (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17788270#comment-17788270
 ] 

Marc Le Bihan edited comment on SPARK-45311 at 11/21/23 6:31 AM:
-

Thanks, that's a good news. I'll check if this 3.4.2 succeed in resolving the 
two first issues mentioned (I think it will) from the SNAPSHOT in preparation, 
and will close them as duplicate.

For the last one,

*3)* When I switch to the *Spark 3.5.0* version, the same problems remain, but 
another add itself to the list:
"{*}Only expression encoders are supported for now{*}" on what was accepted and 
working before.

it's a new one, I should keep it, open, isn't it?
 


was (Author: mlebihan):
Thanks, that's a good news. I'll check if this 3.4.2 in preparation succeed in 
resolving the two first issues mentioned (I think they will) and will close 
them as duplicate.

For the last one, 

*3)* When I switch to the *Spark 3.5.0* version, the same problems remain, but 
another add itself to the list:
"{*}Only expression encoders are supported for now{*}" on what was accepted and 
working before.

it's a new one, I should keep apart, open, isn't it?
 

> Encoder fails on many "NoSuchElementException: None.get" since 3.4.x, search 
> for an encoder for a generic type, and since 3.5.x isn't "an expression 
> encoder"
> -
>
> Key: SPARK-45311
> URL: https://issues.apache.org/jira/browse/SPARK-45311
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0, 3.4.1, 3.5.0
> Environment: Debian 12
> Java 17
> Underlying Spring-Boot 2.7.14
>Reporter: Marc Le Bihan
>Priority: Major
>
> If you find it convenient, you might clone the 
> [https://gitlab.com/territoirevif/minimal-tests-spark-issue] project (that 
> does many operations around cities, local authorities and accounting with 
> open data) where I've extracted from my work what's necessary to make a set 
> of 35 tests that run correctly with Spark 3.3.x, and show the troubles 
> encountered with 3.4.x and 3.5.x.
>  
> It is working well with Spark 3.2.x, 3.3.x. But as soon as I selec{*}t Spark 
> 3.4.x{*}, where the encoder seems to have deeply changed, the encoder fails 
> with two problems:
>  
> *1)* It throws *java.util.NoSuchElementException: None.get* messages 
> everywhere.
> Asking over the Internet, I wasn't alone facing this problem. Reading it, 
> you'll see that I've attempted a debug but my Scala skills are low.
> [https://stackoverflow.com/questions/76036349/encoders-bean-doesnt-work-anymore-on-a-java-pojo-with-spark-3-4-0]
> {color:#172b4d}by the way, if possible, the encoder and decoder functions 
> should forward a parameter as soon as the name of the field being handled is 
> known, and then all the long of their process, so that when the encoder is at 
> any point where it has to throw an exception, it knows the field it is 
> handling in its specific call and can send a message like:{color}
> {color:#00875a}_java.util.NoSuchElementException: None.get when encoding [the 
> method or field it was targeting]_{color}
>  
> *2)* *Not found an encoder of the type RS to Spark SQL internal 
> representation.* Consider to change the input type to one of supported at 
> (...)
> Or : Not found an encoder of the type *OMI_ID* to Spark SQL internal 
> representation (...)
>  
> where *RS* and *OMI_ID* are generic types.
> This is strange.
> [https://stackoverflow.com/questions/76045255/encoders-bean-attempts-to-check-the-validity-of-a-return-type-considering-its-ge]
>  
> *3)* When I switch to the *Spark 3.5.0* version, the same problems remain, 
> but another add itself to the list:
> "{*}Only expression encoders are supported for now{*}" on what was accepted 
> and working before.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-45311) Encoder fails on many "NoSuchElementException: None.get" since 3.4.x, search for an encoder for a generic type, and since 3.5.x isn't "an expression encoder"

2023-11-20 Thread Marc Le Bihan (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17788270#comment-17788270
 ] 

Marc Le Bihan commented on SPARK-45311:
---

Thanks, that's a good news. I'll check if this 3.4.2 in preparation succeed in 
resolving the two first issues mentioned (I think they will) and will close 
them as duplicate.

For the last one, 

*3)* When I switch to the *Spark 3.5.0* version, the same problems remain, but 
another add itself to the list:
"{*}Only expression encoders are supported for now{*}" on what was accepted and 
working before.

it's a new one, I should keep apart, open, isn't it?
 

> Encoder fails on many "NoSuchElementException: None.get" since 3.4.x, search 
> for an encoder for a generic type, and since 3.5.x isn't "an expression 
> encoder"
> -
>
> Key: SPARK-45311
> URL: https://issues.apache.org/jira/browse/SPARK-45311
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0, 3.4.1, 3.5.0
> Environment: Debian 12
> Java 17
> Underlying Spring-Boot 2.7.14
>Reporter: Marc Le Bihan
>Priority: Major
>
> If you find it convenient, you might clone the 
> [https://gitlab.com/territoirevif/minimal-tests-spark-issue] project (that 
> does many operations around cities, local authorities and accounting with 
> open data) where I've extracted from my work what's necessary to make a set 
> of 35 tests that run correctly with Spark 3.3.x, and show the troubles 
> encountered with 3.4.x and 3.5.x.
>  
> It is working well with Spark 3.2.x, 3.3.x. But as soon as I selec{*}t Spark 
> 3.4.x{*}, where the encoder seems to have deeply changed, the encoder fails 
> with two problems:
>  
> *1)* It throws *java.util.NoSuchElementException: None.get* messages 
> everywhere.
> Asking over the Internet, I wasn't alone facing this problem. Reading it, 
> you'll see that I've attempted a debug but my Scala skills are low.
> [https://stackoverflow.com/questions/76036349/encoders-bean-doesnt-work-anymore-on-a-java-pojo-with-spark-3-4-0]
> {color:#172b4d}by the way, if possible, the encoder and decoder functions 
> should forward a parameter as soon as the name of the field being handled is 
> known, and then all the long of their process, so that when the encoder is at 
> any point where it has to throw an exception, it knows the field it is 
> handling in its specific call and can send a message like:{color}
> {color:#00875a}_java.util.NoSuchElementException: None.get when encoding [the 
> method or field it was targeting]_{color}
>  
> *2)* *Not found an encoder of the type RS to Spark SQL internal 
> representation.* Consider to change the input type to one of supported at 
> (...)
> Or : Not found an encoder of the type *OMI_ID* to Spark SQL internal 
> representation (...)
>  
> where *RS* and *OMI_ID* are generic types.
> This is strange.
> [https://stackoverflow.com/questions/76045255/encoders-bean-attempts-to-check-the-validity-of-a-return-type-considering-its-ge]
>  
> *3)* When I switch to the *Spark 3.5.0* version, the same problems remain, 
> but another add itself to the list:
> "{*}Only expression encoders are supported for now{*}" on what was accepted 
> and working before.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46023) Annotate parameters at docstrings in pyspark.sql module



 [ 
https://issues.apache.org/jira/browse/SPARK-46023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46023:
---
Labels: pull-request-available  (was: )

> Annotate parameters at docstrings in pyspark.sql module
> ---
>
> Key: SPARK-46023
> URL: https://issues.apache.org/jira/browse/SPARK-46023
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> See PR



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-46023) Annotate parameters at docstrings in pyspark.sql module

Hyukjin Kwon created SPARK-46023:


 Summary: Annotate parameters at docstrings in pyspark.sql module
 Key: SPARK-46023
 URL: https://issues.apache.org/jira/browse/SPARK-46023
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


See PR



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-46022) Remove deprecated functions APIs from documents

Haejoon Lee created SPARK-46022:
---

 Summary: Remove deprecated functions APIs from documents
 Key: SPARK-46022
 URL: https://issues.apache.org/jira/browse/SPARK-46022
 Project: Spark
  Issue Type: Bug
  Components: Documentation, PySpark
Affects Versions: 4.0.0
Reporter: Haejoon Lee


We should not expose the deprecated APIs on official documentation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46017) PySpark doc build doesn't work properly on Mac



 [ 
https://issues.apache.org/jira/browse/SPARK-46017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-46017:

Summary: PySpark doc build doesn't work properly on Mac  (was: PySpark doc 
build doesn't work properly on local environment)

> PySpark doc build doesn't work properly on Mac
> --
>
> Key: SPARK-46017
> URL: https://issues.apache.org/jira/browse/SPARK-46017
> Project: Spark
>  Issue Type: Bug
>  Components: Build, PySpark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> PySpark doc build is working properly on GitHub CI, but doesn't work properly 
> on local env for some reason. We should investigate and fix it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46017) PySpark doc build doesn't work properly on Mac



 [ 
https://issues.apache.org/jira/browse/SPARK-46017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-46017:

Description: PySpark doc build is working properly on GitHub CI, but 
doesn't work properly on local Mac env for some reason. We should investigate 
and fix it.  (was: PySpark doc build is working properly on GitHub CI, but 
doesn't work properly on local env for some reason. We should investigate and 
fix it.)

> PySpark doc build doesn't work properly on Mac
> --
>
> Key: SPARK-46017
> URL: https://issues.apache.org/jira/browse/SPARK-46017
> Project: Spark
>  Issue Type: Bug
>  Components: Build, PySpark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> PySpark doc build is working properly on GitHub CI, but doesn't work properly 
> on local Mac env for some reason. We should investigate and fix it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46019) Fix HiveThriftServer2ListenerSuite and ThriftServerPageSuite failed when there are running on local



 [ 
https://issues.apache.org/jira/browse/SPARK-46019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BingKun Pan updated SPARK-46019:

Summary: Fix HiveThriftServer2ListenerSuite and ThriftServerPageSuite 
failed when there are running on local  (was: Fix when 
`HiveThriftServer2ListenerSuite` and `ThriftServerPageSuite` failed to run 
locally)

> Fix HiveThriftServer2ListenerSuite and ThriftServerPageSuite failed when 
> there are running on local
> ---
>
> Key: SPARK-46019
> URL: https://issues.apache.org/jira/browse/SPARK-46019
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46019) Fix when `HiveThriftServer2ListenerSuite` and `ThriftServerPageSuite` failed to run locally



 [ 
https://issues.apache.org/jira/browse/SPARK-46019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BingKun Pan updated SPARK-46019:

Affects Version/s: (was: 3.5.0)
   (was: 3.4.1)

> Fix when `HiveThriftServer2ListenerSuite` and `ThriftServerPageSuite` failed 
> to run locally
> ---
>
> Key: SPARK-46019
> URL: https://issues.apache.org/jira/browse/SPARK-46019
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-46020) Add `Python 3.12` to Infra docker image



 [ 
https://issues.apache.org/jira/browse/SPARK-46020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-46020.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43922
[https://github.com/apache/spark/pull/43922]

> Add `Python 3.12` to Infra docker image
> ---
>
> Key: SPARK-46020
> URL: https://issues.apache.org/jira/browse/SPARK-46020
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-46021) Support canceling future jobs belonging to a certain job group on `cancelJobGroup` call

2023-11-20 Thread Xinyi Yu (Jira)

Xinyi Yu created SPARK-46021:


 Summary: Support canceling future jobs belonging to a certain job 
group on `cancelJobGroup` call
 Key: SPARK-46021
 URL: https://issues.apache.org/jira/browse/SPARK-46021
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Xinyi Yu






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-46020) Add `Python 3.12` to Infra docker image



 [ 
https://issues.apache.org/jira/browse/SPARK-46020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-46020:
-

Assignee: Dongjoon Hyun

> Add `Python 3.12` to Infra docker image
> ---
>
> Key: SPARK-46020
> URL: https://issues.apache.org/jira/browse/SPARK-46020
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-46020) Add `Python 3.12` to Infra docker image

Dongjoon Hyun created SPARK-46020:
-

 Summary: Add `Python 3.12` to Infra docker image
 Key: SPARK-46020
 URL: https://issues.apache.org/jira/browse/SPARK-46020
 Project: Spark
  Issue Type: Sub-task
  Components: Project Infra
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46020) Add `Python 3.12` to Infra docker image



 [ 
https://issues.apache.org/jira/browse/SPARK-46020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46020:
---
Labels: pull-request-available  (was: )

> Add `Python 3.12` to Infra docker image
> ---
>
> Key: SPARK-46020
> URL: https://issues.apache.org/jira/browse/SPARK-46020
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46019) Fix when `HiveThriftServer2ListenerSuite` and `ThriftServerPageSuite` failed to run locally



 [ 
https://issues.apache.org/jira/browse/SPARK-46019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46019:
---
Labels: pull-request-available  (was: )

> Fix when `HiveThriftServer2ListenerSuite` and `ThriftServerPageSuite` failed 
> to run locally
> ---
>
> Key: SPARK-46019
> URL: https://issues.apache.org/jira/browse/SPARK-46019
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 3.4.1, 3.5.0, 4.0.0
>Reporter: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46019) Fix when `HiveThriftServer2ListenerSuite` and `ThriftServerPageSuite` failed to run locally



 [ 
https://issues.apache.org/jira/browse/SPARK-46019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BingKun Pan updated SPARK-46019:

Component/s: Tests

> Fix when `HiveThriftServer2ListenerSuite` and `ThriftServerPageSuite` failed 
> to run locally
> ---
>
> Key: SPARK-46019
> URL: https://issues.apache.org/jira/browse/SPARK-46019
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 3.4.1, 3.5.0, 4.0.0
>Reporter: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45833) Document state data source - reader



 [ 
https://issues.apache.org/jira/browse/SPARK-45833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45833:
---
Labels: pull-request-available  (was: )

> Document state data source - reader
> ---
>
> Key: SPARK-45833
> URL: https://issues.apache.org/jira/browse/SPARK-45833
> Project: Spark
>  Issue Type: Task
>  Components: Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Jungtaek Lim
>Priority: Blocker
>  Labels: pull-request-available
>
> Once the code for SPARK-45511 is landed, we need to put some effort on 
> documenting the usage. Since we are open to further improvement e.g. state 
> writer, we may want to have a separate page like Kafka integration.
> We can include the usage of state metadata source into this page, as state 
> metadata would be mainly useful for state data source.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-46019) Fix when `HiveThriftServer2ListenerSuite` and `ThriftServerPageSuite` failed to run locally

BingKun Pan created SPARK-46019:
---

 Summary: Fix when `HiveThriftServer2ListenerSuite` and 
`ThriftServerPageSuite` failed to run locally
 Key: SPARK-46019
 URL: https://issues.apache.org/jira/browse/SPARK-46019
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.5.0, 3.4.1, 4.0.0
Reporter: BingKun Pan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46018) driver fails to start properly in standalone cluster deployment mode

2023-11-20 Thread xiejiankun (Jira)

[
https://issues.apache.org/jira/browse/SPARK-46018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

xiejiankun updated SPARK-46018:
---
Description:
Without adding the `SPARK_LOCAL_HOSTNAME` attribute to spark-env.sh It is
possible to elect and create the driver normally.

The submitted command is：
{code:java}
bin/spark-submit --class org.apache.spark.examples.SparkPi --master
spark://XXX:7077 --deploy-mode cluster
/opt/module/spark3.1/examples/jars/spark-examples_2.12-3.1.3.jar 1000{code}
The command to start the driver uses the real ip of the worker：
{code:java}
"spark://wor...@169.xxx.xxx.211:7078" {code}
The driver can run on any worker.

After adding the SPARK_LOCAL_HOSTNAME attribute to each worker, *the driver
cannot run on workers other than the one that submitted the command.*

The command to start the driver uses the hostname of the worker：
{code:java}
"spark://Worker@hostname:7078" {code}
The error message is：
{code:java}
Launch Command: "/opt/module/jdk1.8.0_371/bin/java" "-cp"
"/opt/module/spark3.1/conf/:/opt/module/spark3.1/jars/*:/opt/module/hadoop-3.3.0/etc/hadoop/"
"-Xmx6144M" "-Dspark.eventLog.enabled=true" "-Dspark.driver.cores=4"
"-Dspark.jars=file:/opt/module/spark3.1/examples/jars/spark-examples_2.12-3.1.3.jar"
"-Dspark.submit.deployMode=cluster" "-Dspark.sql.shuffle.partitions=60"
"-Dspark.master=spark://nodeA:7077" "-Dspark.executor.cores=4"
"-Dspark.driver.supervise=false"
"-Dspark.app.name=org.apache.spark.examples.SparkPi" "-Dspark.driver.memory=6g"
"-Dspark.eventLog.compress=true" "-Dspark.executor.memory=8g"
"-Dspark.submit.pyFiles=" "-Dspark.eventLog.dir=hdfs://nodeA:8020/sparklog/"
"-Dspark.rpc.askTimeout=10s" "-Dspark.default.parallelism=60"
"-Dspark.history.fs.cleaner.enabled=false"
"org.apache.spark.deploy.worker.DriverWrapper" "spark://Worker@nodeB:7078"
"/opt/module/spark3.1/work/driver-20231121105901-/spark-examples_2.12-3.1.3.jar"
"org.apache.spark.examples.SparkPi" "1000"

23/11/21 10:59:02 WARN NativeCodeLoader: Unable to load native-hadoop library
for your platform... using builtin-java classes where applicable
23/11/21 10:59:02 INFO SecurityManager: Changing view acls to: jaken
23/11/21 10:59:02 INFO SecurityManager: Changing modify acls to: jaken
23/11/21 10:59:02 INFO SecurityManager: Changing view acls groups to:
23/11/21 10:59:02 INFO SecurityManager: Changing modify acls groups to:
23/11/21 10:59:02 INFO SecurityManager: SecurityManager: authentication
disabled; ui acls disabled; users with view permissions: Set(jaken); groups
with view permissions: Set(); users with modify permissions: Set(jaken);
groups with modify permissions: Set()
23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random free
port. You may check whether configuring an appropriate binding address.
23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random free
port. You may check whether configuring an appropriate binding address.
23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random free
port. You may check whether configuring an appropriate binding address.
23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random free
port. You may check whether configuring an appropriate binding address.
23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random free
port. You may check whether configuring an appropriate binding address.
23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random free
port. You may check whether configuring an appropriate binding address.
23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random free
port. You may check whether configuring an appropriate binding address.
23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random free
port. You may check whether configuring an appropriate binding address.
23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random free
port. You may check whether configuring an appropriate binding address.
23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random free
port. You may check whether configuring an appropriate binding address.
23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random free
port. You may check whether configuring an appropriate binding address.
23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random free
port. You may check whether configuring an appropriate binding address.
23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random free
port. You may check whether configuring an appropriate binding address.
23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random free
port. You may check whether configuring an appropriate binding address.
23/11/21 10:59:03 WARN Utils: Service 'Driver' could not bind on a random free
port. You may check

[jira] [Created] (SPARK-46018) driver fails to start properly in standalone cluster deployment mode

2023-11-20 Thread xiejiankun (Jira)

xiejiankun created SPARK-46018:
--

Summary: driver fails to start properly in standalone cluster
deployment mode
Key: SPARK-46018
URL: https://issues.apache.org/jira/browse/SPARK-46018
Project: Spark
Issue Type: Bug
Components: Deploy, Spark Submit
Affects Versions: 3.1.3
Reporter: xiejiankun

Without adding the `SPARK_LOCAL_HOSTNAME` attribute to spark-env.sh It is
possible to elect and create the driver normally.

After adding the SPARK_LOCAL_HOSTNAME attribute to each worker, *the driver
cannot run on workers other than the one that submitted the command.*

The command to start the driver uses the hostname of the worker：
{code:java}
"spark://Worker@hostname:7078" {code}
The error message is：
{code:java}
Launch Command: "/opt/module/jdk1.8.0_371/bin/java" "-cp"
"/opt/module/spark3.1/conf/:/opt/module/spark3.1/jars/*:/opt/module/hadoop-3.3.0/etc/hadoop/"
"-Xmx6144M" "-Dspark.eventLog.enabled=true" "-Dspark.driver.cores=4"
"-Dspark.jars=file:/opt/module/spark3.1/examples/jars/spark-examples_2.12-3.1.3.jar"
"-Dspark.submit.deployMode=cluster" "-Dspark.sql.shuffle.partitions=60"
"-Dspark.master=spark://xy04:7077" "-Dspark.executor.cores=4"
"-Dspark.driver.supervise=false"
"-Dspark.app.name=org.apache.spark.examples.SparkPi" "-Dspark.driver.memory=6g"
"-Dspark.eventLog.compress=true" "-Dspark.executor.memory=8g"
"-Dspark.submit.pyFiles=" "-Dspark.eventLog.dir=hdfs://xy04:8020/sparklog/"
"-Dspark.rpc.askTimeout=10s" "-Dspark.default.parallelism=60"
"-Dspark.history.fs.cleaner.enabled=false"
"org.apache.spark.deploy.worker.DriverWrapper" "spark://Worker@xy04:7078"
"/opt/module/spark3.1/work/driver-20231121105901-/spark-examples_2.12-3.1.3.jar"
"org.apache.spark.examples.SparkPi" "1000"

[jira] [Updated] (SPARK-46017) PySpark doc build doesn't work properly on local environment



 [ 
https://issues.apache.org/jira/browse/SPARK-46017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-46017:

Summary: PySpark doc build doesn't work properly on local environment  
(was: PySpark doc build doesn't work properly on local env)

> PySpark doc build doesn't work properly on local environment
> 
>
> Key: SPARK-46017
> URL: https://issues.apache.org/jira/browse/SPARK-46017
> Project: Spark
>  Issue Type: Bug
>  Components: Build, PySpark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> PySpark doc build is working properly on GitHub CI, but doesn't work properly 
> on local env for some reason. We should investigate and fix it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-46017) PySpark doc build doesn't work properly on local env

Haejoon Lee created SPARK-46017:
---

 Summary: PySpark doc build doesn't work properly on local env
 Key: SPARK-46017
 URL: https://issues.apache.org/jira/browse/SPARK-46017
 Project: Spark
  Issue Type: Bug
  Components: Build, PySpark
Affects Versions: 4.0.0
Reporter: Haejoon Lee


PySpark doc build is working properly on GitHub CI, but doesn't work properly 
on local env for some reason. We should investigate and fix it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-46015) Fix broken link for Koalas issues



 [ 
https://issues.apache.org/jira/browse/SPARK-46015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-46015:


Assignee: Haejoon Lee

> Fix broken link for Koalas issues
> -
>
> Key: SPARK-46015
> URL: https://issues.apache.org/jira/browse/SPARK-46015
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation, PS
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
>
> There is a link broken for Koalas old repo. We should address it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-46015) Fix broken link for Koalas issues



 [ 
https://issues.apache.org/jira/browse/SPARK-46015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46015.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43918
[https://github.com/apache/spark/pull/43918]

> Fix broken link for Koalas issues
> -
>
> Key: SPARK-46015
> URL: https://issues.apache.org/jira/browse/SPARK-46015
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation, PS
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> There is a link broken for Koalas old repo. We should address it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-46014) Run RocksDBStateStoreStreamingAggregationSuite on a dedicated JVM



 [ 
https://issues.apache.org/jira/browse/SPARK-46014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-46014.
---
Fix Version/s: 4.0.0
   3.5.1
   Resolution: Fixed

This is resolved via https://github.com/apache/spark/pull/43916

> Run RocksDBStateStoreStreamingAggregationSuite on a dedicated JVM
> -
>
> Key: SPARK-46014
> URL: https://issues.apache.org/jira/browse/SPARK-46014
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0, 3.5.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-46012) EventLogFileReader should not read rolling logs if appStatus is missing



 [ 
https://issues.apache.org/jira/browse/SPARK-46012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-46012.
---
Fix Version/s: 3.3.4
   3.5.1
   4.0.0
   3.4.2
   Resolution: Fixed

Issue resolved by pull request 43914
[https://github.com/apache/spark/pull/43914]

> EventLogFileReader should not read rolling logs if appStatus is missing
> ---
>
> Key: SPARK-46012
> URL: https://issues.apache.org/jira/browse/SPARK-46012
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.0.0, 3.1.3, 3.2.4, 3.3.3, 3.4.1
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.4, 3.5.1, 4.0.0, 3.4.2
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-44867) Refactor Spark Connect Docs to incorporate Scala setup



 [ 
https://issues.apache.org/jira/browse/SPARK-44867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-44867.
--
Fix Version/s: 3.5.0
 Assignee: Venkata Sai Akhil Gudesa
   Resolution: Fixed

Fixed in https://github.com/apache/spark/pull/42556

> Refactor Spark Connect Docs to incorporate Scala setup
> --
>
> Key: SPARK-44867
> URL: https://issues.apache.org/jira/browse/SPARK-44867
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Venkata Sai Akhil Gudesa
>Assignee: Venkata Sai Akhil Gudesa
>Priority: Major
> Fix For: 3.5.0
>
>
> The current Spark Connect 
> [overview|https://spark.apache.org/docs/latest/spark-connect-overview.html] 
> does not include instructions to setup the Scala REPL as well using the Scala 
> client in applications.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45929) support grouping set operation in dataframe api



 [ 
https://issues.apache.org/jira/browse/SPARK-45929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-45929.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43813
[https://github.com/apache/spark/pull/43813]

> support grouping set operation in dataframe api
> ---
>
> Key: SPARK-45929
> URL: https://issues.apache.org/jira/browse/SPARK-45929
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.1
>Reporter: JacobZheng
>Assignee: JacobZheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> I am using spark dataframe api for complex calculations. When I need to use 
> the grouping sets function, I can only convert the expression to sql via 
> analyzedPlan and then splice these sql into a complex sql to execute. In some 
> cases, this operation generates an extremely complex sql. executing this 
> complex sql, antlr4 continues to consume a large amount of memory, similar to 
> a memory leak scenario. If you can and rollup, cube function through the 
> dataframe api to calculate these operations will be much simpler.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45929) support grouping set operation in dataframe api



 [ 
https://issues.apache.org/jira/browse/SPARK-45929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-45929:


Assignee: JacobZheng

> support grouping set operation in dataframe api
> ---
>
> Key: SPARK-45929
> URL: https://issues.apache.org/jira/browse/SPARK-45929
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.1
>Reporter: JacobZheng
>Assignee: JacobZheng
>Priority: Major
>  Labels: pull-request-available
>
> I am using spark dataframe api for complex calculations. When I need to use 
> the grouping sets function, I can only convert the expression to sql via 
> analyzedPlan and then splice these sql into a complex sql to execute. In some 
> cases, this operation generates an extremely complex sql. executing this 
> complex sql, antlr4 continues to consume a large amount of memory, similar to 
> a memory leak scenario. If you can and rollup, cube function through the 
> dataframe api to calculate these operations will be much simpler.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45856) Move ArtifactManager from Spark Connect into SparkSession (sql/core)



 [ 
https://issues.apache.org/jira/browse/SPARK-45856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-45856.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43735
[https://github.com/apache/spark/pull/43735]

> Move ArtifactManager from Spark Connect into SparkSession (sql/core)
> 
>
> Key: SPARK-45856
> URL: https://issues.apache.org/jira/browse/SPARK-45856
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, SQL
>Affects Versions: 4.0.0
>Reporter: Venkata Sai Akhil Gudesa
>Assignee: Venkata Sai Akhil Gudesa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> The `ArtifactManager` that currently lies in the connect package can be moved 
> into the wider sql/core package (e.g SparkSession) to expand the scope. This 
> is possible because the `ArtifactManager` is tied solely to the 
> `SparkSession#sessionUUID` and hence can be cleanly detached from Spark 
> Connect and be made generally available.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-46016) Fix the script for Supported pandas API to work properly

Haejoon Lee created SPARK-46016:
---

 Summary: Fix the script for Supported pandas API to work properly
 Key: SPARK-46016
 URL: https://issues.apache.org/jira/browse/SPARK-46016
 Project: Spark
  Issue Type: Bug
  Components: Documentation, Pandas API on Spark
Affects Versions: 4.0.0
Reporter: Haejoon Lee


Currently Supported pandas API is not generated properly, so we should fix it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46015) Fix broken link for Koalas issues



 [ 
https://issues.apache.org/jira/browse/SPARK-46015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46015:
---
Labels: pull-request-available  (was: )

> Fix broken link for Koalas issues
> -
>
> Key: SPARK-46015
> URL: https://issues.apache.org/jira/browse/SPARK-46015
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation, PS
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
>
> There is a link broken for Koalas old repo. We should address it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-46015) Fix broken link for Koalas issues

Haejoon Lee created SPARK-46015:
---

 Summary: Fix broken link for Koalas issues
 Key: SPARK-46015
 URL: https://issues.apache.org/jira/browse/SPARK-46015
 Project: Spark
  Issue Type: Bug
  Components: Documentation, PS
Affects Versions: 4.0.0
Reporter: Haejoon Lee


There is a link broken for Koalas old repo. We should address it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-46014) Run RocksDBStateStoreStreamingAggregationSuite on a dedicated JVM



 [ 
https://issues.apache.org/jira/browse/SPARK-46014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-46014:
-

Assignee: Dongjoon Hyun

> Run RocksDBStateStoreStreamingAggregationSuite on a dedicated JVM
> -
>
> Key: SPARK-46014
> URL: https://issues.apache.org/jira/browse/SPARK-46014
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46013) Improve basic datasource examples



 [ 
https://issues.apache.org/jira/browse/SPARK-46013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46013:
---
Labels: pull-request-available  (was: )

> Improve basic datasource examples
> -
>
> Key: SPARK-46013
> URL: https://issues.apache.org/jira/browse/SPARK-46013
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Priority: Major
>  Labels: pull-request-available
>
> We should improve the Python examples on this page: 
> [https://spark.apache.org/docs/latest/sql-data-sources-load-save-functions.html]
>  (basic_datasource_examples.py)
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46014) Run RocksDBStateStoreStreamingAggregationSuite on a dedicated JVM



 [ 
https://issues.apache.org/jira/browse/SPARK-46014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-46014:
--
Summary: Run RocksDBStateStoreStreamingAggregationSuite on a dedicated JVM  
(was: Run RocksDBStateStoreStreamingAggregationSuite on a dedicate JVM)

> Run RocksDBStateStoreStreamingAggregationSuite on a dedicated JVM
> -
>
> Key: SPARK-46014
> URL: https://issues.apache.org/jira/browse/SPARK-46014
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46014) Run RocksDBStateStoreStreamingAggregationSuite on a dedicate JVM



 [ 
https://issues.apache.org/jira/browse/SPARK-46014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46014:
---
Labels: pull-request-available  (was: )

> Run RocksDBStateStoreStreamingAggregationSuite on a dedicate JVM
> 
>
> Key: SPARK-46014
> URL: https://issues.apache.org/jira/browse/SPARK-46014
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-46014) Run RocksDBStateStoreStreamingAggregationSuite on a dedicate JVM

Dongjoon Hyun created SPARK-46014:
-

 Summary: Run RocksDBStateStoreStreamingAggregationSuite on a 
dedicate JVM
 Key: SPARK-46014
 URL: https://issues.apache.org/jira/browse/SPARK-46014
 Project: Spark
  Issue Type: Bug
  Components: SQL, Tests
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-44767) Plugin API for PySpark and SparkR workers



 [ 
https://issues.apache.org/jira/browse/SPARK-44767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-44767:
---
Labels: pull-request-available  (was: )

> Plugin API for PySpark and SparkR workers
> -
>
> Key: SPARK-44767
> URL: https://issues.apache.org/jira/browse/SPARK-44767
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 3.4.1
>Reporter: Willi Raschkowski
>Priority: Major
>  Labels: pull-request-available
>
> An API to customize Python and R workers allows for extensibility beyond what 
> can be expressed via static configs and environment variables like, e.g., 
> {{spark.pyspark.python}}.
> A use case for this is overriding {{PATH}} when using {{spark.archives}} 
> with, say, conda-pack (as documented 
> [here|https://spark.apache.org/docs/3.1.1/api/python/user_guide/python_packaging.html#using-conda]).
>  Some packages rely on binaries. And if we want to use those packages in 
> Spark, we need to include their binaries in the {{PATH}}.
> But we can't set the {{PATH}} via some config because 1) the environment with 
> its binaries may be at a dynamic location (archives are unpacked on the 
> driver [into a directory with random 
> name|https://github.com/apache/spark/blob/5db87787d5cc1cefb51ec77e49bac7afaa46d300/core/src/main/scala/org/apache/spark/SparkFiles.scala#L33-L37]),
>  and 2) we may not want to override the {{PATH}} that's pre-configured on the 
> hosts.
> Other use cases unlocked by this include overriding the executable 
> dynamically (e.g., to select a version) or forking/redirecting the worker's 
> output stream.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-44781) Runtime filter should supports reuse exchange if it can reduce the data size of application side



 [ 
https://issues.apache.org/jira/browse/SPARK-44781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-44781:
---
Labels: pull-request-available  (was: )

> Runtime filter should supports reuse exchange if it can reduce the data size 
> of application side
> 
>
> Key: SPARK-44781
> URL: https://issues.apache.org/jira/browse/SPARK-44781
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Jiaan Geng
>Priority: Major
>  Labels: pull-request-available
>
> Currently, Spark runtime filter only supports using the subquery on one table.
> In fact, we can reuse the exchange, even if it is a shuffle exchange.
> If the shuffle exchange come from a join which has one side with selective 
> predicates, so the results of the join can be used to prune the data amount 
> of the application side.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-46013) Improve basic datasource examples

2023-11-20 Thread Allison Wang (Jira)

Allison Wang created SPARK-46013:


 Summary: Improve basic datasource examples
 Key: SPARK-46013
 URL: https://issues.apache.org/jira/browse/SPARK-46013
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, PySpark
Affects Versions: 4.0.0
Reporter: Allison Wang


We should improve the Python examples on this page: 
[https://spark.apache.org/docs/latest/sql-data-sources-load-save-functions.html]
 (basic_datasource_examples.py)

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45927) Update `path` handling in Python data source



 [ 
https://issues.apache.org/jira/browse/SPARK-45927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-45927:
-

Assignee: Allison Wang

> Update `path` handling in Python data source
> 
>
> Key: SPARK-45927
> URL: https://issues.apache.org/jira/browse/SPARK-45927
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
>  Labels: pull-request-available
>
> We should not make `path` an argument to the constructor of the API.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45927) Update `path` handling in Python data source



 [ 
https://issues.apache.org/jira/browse/SPARK-45927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-45927.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43809
[https://github.com/apache/spark/pull/43809]

> Update `path` handling in Python data source
> 
>
> Key: SPARK-45927
> URL: https://issues.apache.org/jira/browse/SPARK-45927
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> We should not make `path` an argument to the constructor of the API.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45998) Remove redundant type cast



 [ 
https://issues.apache.org/jira/browse/SPARK-45998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-45998:
-

Assignee: Yang Jie

> Remove redundant type cast
> --
>
> Key: SPARK-45998
> URL: https://issues.apache.org/jira/browse/SPARK-45998
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45998) Remove redundant type cast



 [ 
https://issues.apache.org/jira/browse/SPARK-45998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-45998.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43899
[https://github.com/apache/spark/pull/43899]

> Remove redundant type cast
> --
>
> Key: SPARK-45998
> URL: https://issues.apache.org/jira/browse/SPARK-45998
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-46012) EventLogFileReader should not read rolling logs if appStatus is missing



 [ 
https://issues.apache.org/jira/browse/SPARK-46012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-46012:
-

Assignee: Dongjoon Hyun

> EventLogFileReader should not read rolling logs if appStatus is missing
> ---
>
> Key: SPARK-46012
> URL: https://issues.apache.org/jira/browse/SPARK-46012
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.0.0, 3.1.3, 3.2.4, 3.3.3, 3.4.1
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46012) EventLogFileReader should not read rolling logs if appStatus is missing



 [ 
https://issues.apache.org/jira/browse/SPARK-46012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-46012:
--
Parent: SPARK-45869
Issue Type: Sub-task  (was: Bug)

> EventLogFileReader should not read rolling logs if appStatus is missing
> ---
>
> Key: SPARK-46012
> URL: https://issues.apache.org/jira/browse/SPARK-46012
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.0.0, 3.1.3, 3.2.4, 3.3.3, 3.4.1
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-44973) CONV('-9223372036854775808', 10, -2) throws ArrayIndexOutOfBoundsException

2023-11-20 Thread Gera Shegalov (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-44973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17788157#comment-17788157
 ] 

Gera Shegalov commented on SPARK-44973:
---

3.0.3 is the oldest version on my box and it exhibits the same bug:

 
{code:java}
scala> spark.version
res1: String = 3.0.3
scala> sql(s"SELECT CONV('${Long.MinValue}', 10, -2)").show(false)
java.lang.ArrayIndexOutOfBoundsException: -1
  at 
org.apache.spark.sql.catalyst.util.NumberConverter$.convert(NumberConverter.scala:148)
  at 
org.apache.spark.sql.catalyst.expressions.Conv.nullSafeEval(mathExpressions.scala:338)
  at 
org.apache.spark.sql.catalyst.expressions.TernaryExpression.eval(Expression.scala:690)
  at 
org.apache.spark.sql.catalyst.expressions.UnaryExpression.eval(Expression.scala:457)
{code}
 

> CONV('-9223372036854775808', 10, -2) throws ArrayIndexOutOfBoundsException
> --
>
> Key: SPARK-44973
> URL: https://issues.apache.org/jira/browse/SPARK-44973
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.3, 3.4.1, 3.5.0
>Reporter: Gera Shegalov
>Priority: Major
>  Labels: pull-request-available
>
> {code:scala}
> scala> sql(s"SELECT CONV('${Long.MinValue}', 10, -2)").show(false)
> java.lang.ArrayIndexOutOfBoundsException: -1
>   at 
> org.apache.spark.sql.catalyst.util.NumberConverter$.convert(NumberConverter.scala:183)
>   at 
> org.apache.spark.sql.catalyst.expressions.Conv.nullSafeEval(mathExpressions.scala:463)
>   at 
> org.apache.spark.sql.catalyst.expressions.TernaryExpression.eval(Expression.scala:821)
>   at 
> org.apache.spark.sql.catalyst.expressions.ToPrettyString.eval(ToPrettyString.scala:57)
>   at 
> org.apache.spark.sql.catalyst.optimizer.ConstantFolding$.org$apache$spark$sql$catalyst$optimizer$ConstantFolding$$constantFolding(expressions.scala:81)
>   at 
> org.apache.spark.sql.catalyst.optimizer.ConstantFolding$.$anonfun$constantFolding$4(expressions.scala:91)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46012) EventLogFileReader should not read rolling logs if appStatus is missing



 [ 
https://issues.apache.org/jira/browse/SPARK-46012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-46012:
--
Affects Version/s: 3.4.1
   3.3.3
   3.2.4
   3.1.3
   3.0.0
   (was: 4.0.0)

> EventLogFileReader should not read rolling logs if appStatus is missing
> ---
>
> Key: SPARK-46012
> URL: https://issues.apache.org/jira/browse/SPARK-46012
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0, 3.1.3, 3.2.4, 3.3.3, 3.4.1
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46012) EventLogFileReader should not read rolling logs if appStatus is missing



 [ 
https://issues.apache.org/jira/browse/SPARK-46012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46012:
---
Labels: pull-request-available  (was: )

> EventLogFileReader should not read rolling logs if appStatus is missing
> ---
>
> Key: SPARK-46012
> URL: https://issues.apache.org/jira/browse/SPARK-46012
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46012) EventLogFileReader should not read rolling logs if appStatus is missing



 [ 
https://issues.apache.org/jira/browse/SPARK-46012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-46012:
--
Summary: EventLogFileReader should not read rolling logs if appStatus is 
missing  (was: EventLogFileReader should not read rolling even logs if 
appStatus is missing)

> EventLogFileReader should not read rolling logs if appStatus is missing
> ---
>
> Key: SPARK-46012
> URL: https://issues.apache.org/jira/browse/SPARK-46012
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-46012) EventLogFileReader should not read rolling even logs if appStatus is missing

Dongjoon Hyun created SPARK-46012:
-

 Summary: EventLogFileReader should not read rolling even logs if 
appStatus is missing
 Key: SPARK-46012
 URL: https://issues.apache.org/jira/browse/SPARK-46012
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-44973) CONV('-9223372036854775808', 10, -2) throws ArrayIndexOutOfBoundsException

2023-11-20 Thread Mark Jarvin (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-44973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17787619#comment-17787619
 ] 

Mark Jarvin edited comment on SPARK-44973 at 11/20/23 8:56 PM:
---

I believe it's affected back to 3.2.0: 
https://github.com/apache/spark/blob/v3.2.0/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/NumberConverter.scala#L131


was (Author: JIRAUSER303124):
I believe it's affected back to 3.3.0: 
https://github.com/apache/spark/blob/v3.3.0/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/NumberConverter.scala#L131

 

> CONV('-9223372036854775808', 10, -2) throws ArrayIndexOutOfBoundsException
> --
>
> Key: SPARK-44973
> URL: https://issues.apache.org/jira/browse/SPARK-44973
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.3, 3.4.1, 3.5.0
>Reporter: Gera Shegalov
>Priority: Major
>  Labels: pull-request-available
>
> {code:scala}
> scala> sql(s"SELECT CONV('${Long.MinValue}', 10, -2)").show(false)
> java.lang.ArrayIndexOutOfBoundsException: -1
>   at 
> org.apache.spark.sql.catalyst.util.NumberConverter$.convert(NumberConverter.scala:183)
>   at 
> org.apache.spark.sql.catalyst.expressions.Conv.nullSafeEval(mathExpressions.scala:463)
>   at 
> org.apache.spark.sql.catalyst.expressions.TernaryExpression.eval(Expression.scala:821)
>   at 
> org.apache.spark.sql.catalyst.expressions.ToPrettyString.eval(ToPrettyString.scala:57)
>   at 
> org.apache.spark.sql.catalyst.optimizer.ConstantFolding$.org$apache$spark$sql$catalyst$optimizer$ConstantFolding$$constantFolding(expressions.scala:81)
>   at 
> org.apache.spark.sql.catalyst.optimizer.ConstantFolding$.$anonfun$constantFolding$4(expressions.scala:91)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46011) Spark Connect session heartbeat / keepalive



 [ 
https://issues.apache.org/jira/browse/SPARK-46011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46011:
---
Labels: pull-request-available  (was: )

> Spark Connect session heartbeat / keepalive
> ---
>
> Key: SPARK-46011
> URL: https://issues.apache.org/jira/browse/SPARK-46011
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Juliusz Sompolski
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-46011) Spark Connect session heartbeat / keepalive

2023-11-20 Thread Juliusz Sompolski (Jira)

Juliusz Sompolski created SPARK-46011:
-

 Summary: Spark Connect session heartbeat / keepalive
 Key: SPARK-46011
 URL: https://issues.apache.org/jira/browse/SPARK-46011
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 4.0.0
Reporter: Juliusz Sompolski






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-46002) Upgrade to Python 3.9 in SQL test



 [ 
https://issues.apache.org/jira/browse/SPARK-46002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-46002.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43902
[https://github.com/apache/spark/pull/43902]

> Upgrade to Python 3.9 in SQL test
> -
>
> Key: SPARK-46002
> URL: https://issues.apache.org/jira/browse/SPARK-46002
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-46002) Upgrade to Python 3.9 in SQL test



 [ 
https://issues.apache.org/jira/browse/SPARK-46002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-46002:
-

Assignee: Ruifeng Zheng

> Upgrade to Python 3.9 in SQL test
> -
>
> Key: SPARK-46002
> URL: https://issues.apache.org/jira/browse/SPARK-46002
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46010) Upgrade Node.js to 18.X



 [ 
https://issues.apache.org/jira/browse/SPARK-46010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46010:
---
Labels: pull-request-available  (was: )

> Upgrade Node.js to 18.X
> ---
>
> Key: SPARK-46010
> URL: https://issues.apache.org/jira/browse/SPARK-46010
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Bjørn Jørgensen
>Priority: Major
>  Labels: pull-request-available
>
> run docker build spark/dev/create-release/spark-rm
> /Dockerfile and then I get this
> Processing triggers for libc-bin (2.31-0ubuntu9.12) ...
> 
> 
>   DEPRECATION WARNING
>   Node.js 12.x is no longer actively supported!
>   You will not receive security or critical stability updates for this 
> version.
>   You should migrate to a supported version of Node.js as soon as possible.
>   Use the installation script that corresponds to the version of Node.js you
>   wish to install. e.g.
>* https://deb.nodesource.com/setup_16.x — Node.js 16 "Gallium"
>* https://deb.nodesource.com/setup_18.x — Node.js 18 LTS "Hydrogen" 
> (recommended)
>* https://deb.nodesource.com/setup_19.x — Node.js 19 "Nineteen"
>* https://deb.nodesource.com/setup_20.x — Node.js 20 "Iron" (current)
>   Please see https://github.com/nodejs/Release for details about which
>   version may be appropriate for you.
>   The NodeSource Node.js distributions repository contains
>   information both about supported versions of Node.js and supported Linux
>   distributions. To learn more about usage, see the repository:
> https://github.com/nodesource/distributions
> 
> 
> Continuing in 20 seconds ...
> 
> 
> 
>SCRIPT DEPRECATION WARNING
>   This script, located at https://deb.nodesource.com/setup_X, used to
>   install Node.js is deprecated now and will eventually be made inactive.
>   Please visit the NodeSource distributions Github and follow the
>   instructions to migrate your repo.
>   https://github.com/nodesource/distributions
>   The NodeSource Node.js Linux distributions GitHub repository contains
>   information about which versions of Node.js and which Linux distributions
>   are supported and how to install it.
>   https://github.com/nodesource/distributions
>   SCRIPT DEPRECATION WARNING
> 
> 
> 
> TO AVOID THIS WAIT MIGRATE THE SCRIPT
> Continuing in 60 seconds (press Ctrl-C to abort) ...



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-46010) Upgrade Node.js to 18.X

2023-11-20 Thread Jira

Bjørn Jørgensen created SPARK-46010:
---

 Summary: Upgrade Node.js to 18.X
 Key: SPARK-46010
 URL: https://issues.apache.org/jira/browse/SPARK-46010
 Project: Spark
  Issue Type: Dependency upgrade
  Components: Build
Affects Versions: 4.0.0
Reporter: Bjørn Jørgensen


run docker build spark/dev/create-release/spark-rm
/Dockerfile and then I get this

Processing triggers for libc-bin (2.31-0ubuntu9.12) ...




  DEPRECATION WARNING

  Node.js 12.x is no longer actively supported!

  You will not receive security or critical stability updates for this version.

  You should migrate to a supported version of Node.js as soon as possible.
  Use the installation script that corresponds to the version of Node.js you
  wish to install. e.g.

   * https://deb.nodesource.com/setup_16.x — Node.js 16 "Gallium"
   * https://deb.nodesource.com/setup_18.x — Node.js 18 LTS "Hydrogen" 
(recommended)
   * https://deb.nodesource.com/setup_19.x — Node.js 19 "Nineteen"
   * https://deb.nodesource.com/setup_20.x — Node.js 20 "Iron" (current)

  Please see https://github.com/nodejs/Release for details about which
  version may be appropriate for you.

  The NodeSource Node.js distributions repository contains
  information both about supported versions of Node.js and supported Linux
  distributions. To learn more about usage, see the repository:
https://github.com/nodesource/distributions




Continuing in 20 seconds ...






   SCRIPT DEPRECATION WARNING


  This script, located at https://deb.nodesource.com/setup_X, used to
  install Node.js is deprecated now and will eventually be made inactive.

  Please visit the NodeSource distributions Github and follow the
  instructions to migrate your repo.
  https://github.com/nodesource/distributions

  The NodeSource Node.js Linux distributions GitHub repository contains
  information about which versions of Node.js and which Linux distributions
  are supported and how to install it.
  https://github.com/nodesource/distributions


  SCRIPT DEPRECATION WARNING





TO AVOID THIS WAIT MIGRATE THE SCRIPT
Continuing in 60 seconds (press Ctrl-C to abort) ...



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21928) ClassNotFoundException for custom Kryo registrator class during serde in netty threads

2023-11-20 Thread Shivam Sharma (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-21928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17788104#comment-17788104
 ] 

Shivam Sharma commented on SPARK-21928:
---

I am getting this intermittent failure on spark 2.4.3 version. Here is the full 
stack trace:
{code:java}
Exception in thread "main" java.lang.reflect.InvocationTargetExceptionat 
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)   
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)at 
org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:65)
at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)Caused 
by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 75 
in stage 1.0 failed 4 times, most recent failure: Lost task 75.3 in stage 1.0 
(TID 171, phx6-kwq.prod.xyz.internal, executor 71): java.io.IOException: 
org.apache.spark.SparkException: Failed to register classes with Kryoat 
org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1333)at 
org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:208)
at 
org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:66)
at 
org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:66)   
 at 
org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:96) 
   at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:89)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)   
 at org.apache.spark.scheduler.Task.run(Task.scala:121)at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:411)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
   at java.lang.Thread.run(Thread.java:748)Caused by: 
org.apache.spark.SparkException: Failed to register classes with Kryoat 
org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:140)
at 
org.apache.spark.serializer.KryoSerializerInstance.borrowKryo(KryoSerializer.scala:324)
at 
org.apache.spark.serializer.KryoSerializerInstance.(KryoSerializer.scala:309)
at 
org.apache.spark.serializer.KryoSerializer.newInstance(KryoSerializer.scala:218)
at 
org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:305)
at 
org.apache.spark.broadcast.TorrentBroadcast.$anonfun$readBroadcastBlock$3(TorrentBroadcast.scala:235)
at scala.Option.getOrElse(Option.scala:138)at 
org.apache.spark.broadcast.TorrentBroadcast.$anonfun$readBroadcastBlock$1(TorrentBroadcast.scala:211)
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1326)... 
14 moreCaused by: java.lang.ClassNotFoundException: 
com.xyz.datashack.SparkKryoRegistrarat 
java.lang.ClassLoader.findClass(ClassLoader.java:530)at 
org.apache.spark.util.ParentClassLoader.findClass(ParentClassLoader.java:35)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)at 
org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.java:40)
at 
org.apache.spark.util.ChildFirstURLClassLoader.loadClass(ChildFirstURLClassLoader.java:48)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)at 
java.lang.Class.forName0(Native Method)at 
java.lang.Class.forName(Class.java:348)at 
org.apache.spark.serializer.KryoSerializer.$anonfun$newKryo$6(KryoSerializer.scala:135)
at 
scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:237)
at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)   
 at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) 
   at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198)at 
scala.collection.TraversableLike.map(TraversableLike.scala:237)at 
scala.collection.TraversableLike.map$(TraversableLike.scala:230)at 
scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:198)at 
org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:135)
... 22 more
Driver stacktrace:at 
org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:1889)
at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:1877)
at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:1876)
at

[jira] [Commented] (SPARK-29251) intermittent serialization failures in spark

2023-11-20 Thread Shivam Sharma (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17788101#comment-17788101
 ] 

Shivam Sharma commented on SPARK-29251:
---

I am also getting the same issue, did you find the fix?

> intermittent serialization failures in spark
> 
>
> Key: SPARK-29251
> URL: https://issues.apache.org/jira/browse/SPARK-29251
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.3
> Environment: We are running Spark 2.4.3 on an AWS EMR cluster. Our 
> cluster consists of one driver instance, 3 core instances, and 3 to 12 task 
> instances; the task group autoscales as needed.
>Reporter: Jerry Vinokurov
>Priority: Major
>  Labels: bulk-closed
>
> We are running an EMR cluster on AWS that processes somewhere around 100 
> batch jobs a day. These jobs are running various SQL commands to transform 
> data and the data volume ranges between a dozen or a few hundred MB to a few 
> GB on the high end for some jobs, and even around ~1 TB for one particularly 
> large one. We use the Kryo serializer and preregister our classes like so:
>  
> {code:java}
> object KryoRegistrar {
>   val classesToRegister: Array[Class[_]] = Array(
>     classOf[MyModel],
>    [etc]
>   )
> }
> // elsewhere
> val sparkConf = new SparkConf()
>       .registerKryoClasses(KryoRegistrar.classesToRegister)
> {code}
>  
>  
> Intermittently throughout the cluster's operation we have observed jobs 
> terminating with the following stack trace:
>  
>  
> {noformat}
> org.apache.spark.SparkException: Failed to register classes with Kryo
>   at 
> org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:140)
>   at 
> org.apache.spark.serializer.KryoSerializerInstance.borrowKryo(KryoSerializer.scala:324)
>   at 
> org.apache.spark.serializer.KryoSerializerInstance.(KryoSerializer.scala:309)
>   at 
> org.apache.spark.serializer.KryoSerializer.newInstance(KryoSerializer.scala:218)
>   at 
> org.apache.spark.broadcast.TorrentBroadcast$.blockifyObject(TorrentBroadcast.scala:288)
>   at 
> org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:127)
>   at 
> org.apache.spark.broadcast.TorrentBroadcast.(TorrentBroadcast.scala:88)
>   at 
> org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34)
>   at 
> org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62)
>   at org.apache.spark.SparkContext.broadcast(SparkContext.scala:1489)
>   at 
> org.apache.spark.sql.execution.datasources.csv.CSVFileFormat.buildReader(CSVFileFormat.scala:103)
>   at 
> org.apache.spark.sql.execution.datasources.FileFormat$class.buildReaderWithPartitionValues(FileFormat.scala:129)
>   at 
> org.apache.spark.sql.execution.datasources.TextBasedFileFormat.buildReaderWithPartitionValues(FileFormat.scala:165)
>   at 
> org.apache.spark.sql.execution.FileSourceScanExec.inputRDD$lzycompute(DataSourceScanExec.scala:309)
>   at 
> org.apache.spark.sql.execution.FileSourceScanExec.inputRDD(DataSourceScanExec.scala:305)
>   at 
> org.apache.spark.sql.execution.FileSourceScanExec.inputRDDs(DataSourceScanExec.scala:327)
>   at 
> org.apache.spark.sql.execution.FilterExec.inputRDDs(basicPhysicalOperators.scala:121)
>   at 
> org.apache.spark.sql.execution.ProjectExec.inputRDDs(basicPhysicalOperators.scala:41)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:627)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:156)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
>   at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
>   at 
> org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.prepareShuffleDependency(ShuffleExchangeExec.scala:92)
>   at 
> org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$doExecute$1.apply(ShuffleExchangeExec.scala:128)
>   at 
> org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$doExecute$1.apply(ShuffleExchangeExec.scala:119)
>   at 
> org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:52)
>   at 
> org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.doExecute(ShuffleExchangeExec.scala:119)
>   at 
>

[jira] [Updated] (SPARK-43393) Sequence expression can overflow



 [ 
https://issues.apache.org/jira/browse/SPARK-43393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-43393:
--
Fix Version/s: 3.3.4

> Sequence expression can overflow
> 
>
> Key: SPARK-43393
> URL: https://issues.apache.org/jira/browse/SPARK-43393
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Deepayan Patra
>Assignee: Deepayan Patra
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.2, 4.0.0, 3.5.1, 3.3.4
>
>
> Spark has a (long-standing) overflow bug in the {{sequence}} expression.
>  
> Consider the following operations:
> {{spark.sql("CREATE TABLE foo (l LONG);")}}
> {{spark.sql(s"INSERT INTO foo VALUES (${Long.MaxValue});")}}
> {{spark.sql("SELECT sequence(0, l) FROM foo;").collect()}}
>  
> The result of these operations will be:
> {{Array[org.apache.spark.sql.Row] = Array([WrappedArray()])}}
> an unintended consequence of overflow.
>  
> The sequence is applied to values {{0}} and {{Long.MaxValue}} with a step 
> size of {{1}} which uses a length computation defined 
> [here|https://github.com/apache/spark/blob/16411188c7ba6cb19c46a2bd512b2485a4c03e2c/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L3451].
>  In this calculation, with {{{}start = 0{}}}, {{{}stop = Long.MaxValue{}}}, 
> and {{{}step = 1{}}}, the calculated {{len}} overflows to 
> {{{}Long.MinValue{}}}. The computation, in binary looks like:
> 0111 -
> 
> --      
> 0111 /
> 0001
> --        
>         0111 +
> 0001
> --      
> 1000
> The following 
> [check|https://github.com/apache/spark/blob/16411188c7ba6cb19c46a2bd512b2485a4c03e2c/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L3454]
>  passes as the negative {{Long.MinValue}} is still {{{}<= 
> MAX_ROUNDED_ARRAY_LENGTH{}}}. The following cast to {{toInt}} uses this 
> representation and [truncates the upper 
> bits|https://github.com/apache/spark/blob/16411188c7ba6cb19c46a2bd512b2485a4c03e2c/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L3457]
>  resulting in an empty length of 0.
> Other overflows are similarly problematic.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-46005) Upgrade scikit-learn to 1.3.2



 [ 
https://issues.apache.org/jira/browse/SPARK-46005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-46005.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43905
[https://github.com/apache/spark/pull/43905]

> Upgrade scikit-learn to 1.3.2
> -
>
> Key: SPARK-46005
> URL: https://issues.apache.org/jira/browse/SPARK-46005
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-46007) Add `AmmoniteTest` test tag and mark `ReplE2ESuite`



 [ 
https://issues.apache.org/jira/browse/SPARK-46007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-46007.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43909
[https://github.com/apache/spark/pull/43909]

> Add `AmmoniteTest` test tag and mark `ReplE2ESuite`
> ---
>
> Key: SPARK-46007
> URL: https://issues.apache.org/jira/browse/SPARK-46007
> Project: Spark
>  Issue Type: Test
>  Components: Connect, Tests
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> - https://github.com/com-lihaoyi/Ammonite/issues/276



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45937) Fix documentation of spark.executor.maxNumFailures

2023-11-20 Thread Thomas Graves (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves resolved SPARK-45937.
---
Resolution: Duplicate

> Fix documentation of spark.executor.maxNumFailures
> --
>
> Key: SPARK-45937
> URL: https://issues.apache.org/jira/browse/SPARK-45937
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: Thomas Graves
>Priority: Critical
>
> https://issues.apache.org/jira/browse/SPARK-41210 added support for 
> spark.executor.maxNumFailures on Kubernetes, it made this config generic and 
> deprecated the yarn version.  This config isn't documented and defaults are 
> not documented.
>  
> [https://github.com/apache/spark/commit/40872e9a094f8459b0b6f626937ced48a8d98efb]
> \
> It also added {color:#0a3069}spark.executor.failuresValidityInterval.{color}
>  
> {color:#0a3069}Both need to have default values specified for yarn and k8s, 
> it also needs to remove the yarn documentation for equivalent configs 
> spark.yarn.max.executor.failures configuration{color}
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45699) Fix "Widening conversion from `TypeA` to `TypeB` is deprecated because it loses precision"



 [ 
https://issues.apache.org/jira/browse/SPARK-45699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45699:
---
Labels: pull-request-available  (was: )

> Fix "Widening conversion from `TypeA` to `TypeB` is deprecated because it 
> loses precision"
> --
>
> Key: SPARK-45699
> URL: https://issues.apache.org/jira/browse/SPARK-45699
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>
> {code:java}
> error] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala:1199:67:
>  Widening conversion from Long to Double is deprecated because it loses 
> precision. Write `.toDouble` instead. [quickfixable]
> [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=deprecation, 
> site=org.apache.spark.scheduler.TaskSetManager.checkSpeculatableTasks.threshold
> [error]       val threshold = max(speculationMultiplier * medianDuration, 
> minTimeToSpeculation)
> [error]                                                                   ^
> [error] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala:1207:60:
>  Widening conversion from Long to Double is deprecated because it loses 
> precision. Write `.toDouble` instead. [quickfixable]
> [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=deprecation, 
> site=org.apache.spark.scheduler.TaskSetManager.checkSpeculatableTasks
> [error]       foundTasks = checkAndSubmitSpeculatableTasks(timeMs, threshold, 
> customizedThreshold = true)
> [error]                                                            ^
> [error] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/arrow/ArrowVectorReader.scala:137:48:
>  Widening conversion from Int to Float is deprecated because it loses 
> precision. Write `.toFloat` instead. [quickfixable]
> [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=deprecation, 
> site=org.apache.spark.sql.connect.client.arrow.IntVectorReader.getFloat
> [error]   override def getFloat(i: Int): Float = getInt(i)
> [error]                                                ^
> [error] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/arrow/ArrowVectorReader.scala:146:49:
>  Widening conversion from Long to Float is deprecated because it loses 
> precision. Write `.toFloat` instead. [quickfixable]
> [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=deprecation, 
> site=org.apache.spark.sql.connect.client.arrow.BigIntVectorReader.getFloat
> [error]   override def getFloat(i: Int): Float = getLong(i)
> [error]                                                 ^
> [error] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/arrow/ArrowVectorReader.scala:147:51:
>  Widening conversion from Long to Double is deprecated because it loses 
> precision. Write `.toDouble` instead. [quickfixable]
> [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=deprecation, 
> site=org.apache.spark.sql.connect.client.arrow.BigIntVectorReader.getDouble
> [error]   override def getDouble(i: Int): Double = getLong(i)
> [error]                                                   ^ {code}
>  
>  
> The example of the compilation warning is as above, there are probably over 
> 100 similar cases that need to be fixed.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46009) Merge the parse rule of PercentileCont and PercentileDisc into functionCall



 [ 
https://issues.apache.org/jira/browse/SPARK-46009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46009:
---
Labels: pull-request-available  (was: )

> Merge the parse rule of PercentileCont and PercentileDisc into functionCall
> ---
>
> Key: SPARK-46009
> URL: https://issues.apache.org/jira/browse/SPARK-46009
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Jiaan Geng
>Assignee: Jiaan Geng
>Priority: Major
>  Labels: pull-request-available
>
> Spark SQL parser have a special rule to parse 
> [percentile_cont|percentile_disc](percentage) WITHIN GROUP (ORDER BY v).
> We should merge this rule into the functionCall.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46009) Merge the parse rule of PercentileCont and PercentileDisc into functionCall



 [ 
https://issues.apache.org/jira/browse/SPARK-46009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiaan Geng updated SPARK-46009:
---
Description: 
Spark SQL parser have a special rule to parse 
[percentile_cont|percentile_disc](percentage) WITHIN GROUP (ORDER BY v).
We should merge this rule into the functionCall.

  was:
Spark SQL parse have a special rule to parse 
[percentile_cont|percentile_disc](percentage) WITHIN GROUP (ORDER BY v).
We should merge this rule into the functionCall.


> Merge the parse rule of PercentileCont and PercentileDisc into functionCall
> ---
>
> Key: SPARK-46009
> URL: https://issues.apache.org/jira/browse/SPARK-46009
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Jiaan Geng
>Assignee: Jiaan Geng
>Priority: Major
>
> Spark SQL parser have a special rule to parse 
> [percentile_cont|percentile_disc](percentage) WITHIN GROUP (ORDER BY v).
> We should merge this rule into the functionCall.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-46007) Add `AmmoniteTest` test tag and mark `ReplE2ESuite`



 [ 
https://issues.apache.org/jira/browse/SPARK-46007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-46007:
-

Assignee: Dongjoon Hyun

> Add `AmmoniteTest` test tag and mark `ReplE2ESuite`
> ---
>
> Key: SPARK-46007
> URL: https://issues.apache.org/jira/browse/SPARK-46007
> Project: Spark
>  Issue Type: Test
>  Components: Connect, Tests
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>
> - https://github.com/com-lihaoyi/Ammonite/issues/276



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46009) Merge the parse rule of PercentileCont and PercentileDisc into functionCall



 [ 
https://issues.apache.org/jira/browse/SPARK-46009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiaan Geng updated SPARK-46009:
---
Description: 
Spark SQL parse have a special rule to parse 
[percentile_cont|percentile_disc](percentage) WITHIN GROUP (ORDER BY v).
We should merge this rule into the functionCall.

  was:
Spark SQL parse have a special rule to parse 
[percentile_cont|percentile_disc](percentage) WITHIN GROUP (ORDER BY v).
We should merge this rule into the 


> Merge the parse rule of PercentileCont and PercentileDisc into functionCall
> ---
>
> Key: SPARK-46009
> URL: https://issues.apache.org/jira/browse/SPARK-46009
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Jiaan Geng
>Assignee: Jiaan Geng
>Priority: Major
>
> Spark SQL parse have a special rule to parse 
> [percentile_cont|percentile_disc](percentage) WITHIN GROUP (ORDER BY v).
> We should merge this rule into the functionCall.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46009) Merge the parse rule of PercentileCont and PercentileDisc into functionCall



 [ 
https://issues.apache.org/jira/browse/SPARK-46009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiaan Geng updated SPARK-46009:
---
Description: 
Spark SQL parse have a special rule to parse 
[percentile_cont|percentile_disc](percentage) WITHIN GROUP (ORDER BY v).
We should merge this rule into the 

> Merge the parse rule of PercentileCont and PercentileDisc into functionCall
> ---
>
> Key: SPARK-46009
> URL: https://issues.apache.org/jira/browse/SPARK-46009
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Jiaan Geng
>Assignee: Jiaan Geng
>Priority: Major
>
> Spark SQL parse have a special rule to parse 
> [percentile_cont|percentile_disc](percentage) WITHIN GROUP (ORDER BY v).
> We should merge this rule into the 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-46009) Merge the parse rule of PercentileCont and PercentileDisc into functionCall

Jiaan Geng created SPARK-46009:
--

 Summary: Merge the parse rule of PercentileCont and PercentileDisc 
into functionCall
 Key: SPARK-46009
 URL: https://issues.apache.org/jira/browse/SPARK-46009
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: Jiaan Geng
Assignee: Jiaan Geng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46007) Add `AmmoniteTest` test tag and mark `ReplE2ESuite`



 [ 
https://issues.apache.org/jira/browse/SPARK-46007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-46007:
--
Summary: Add `AmmoniteTest` test tag and mark `ReplE2ESuite`  (was: Add 
`AmmoniteTest` and mark `ReplE2ESuite`)

> Add `AmmoniteTest` test tag and mark `ReplE2ESuite`
> ---
>
> Key: SPARK-46007
> URL: https://issues.apache.org/jira/browse/SPARK-46007
> Project: Spark
>  Issue Type: Test
>  Components: Connect, Tests
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>
> - https://github.com/com-lihaoyi/Ammonite/issues/276



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-46008) Add spark-sql-viz-test.js to test spark-sql-viz.js

2023-11-20 Thread Kent Yao (Jira)

Kent Yao created SPARK-46008:


 Summary: Add spark-sql-viz-test.js to test spark-sql-viz.js
 Key: SPARK-46008
 URL: https://issues.apache.org/jira/browse/SPARK-46008
 Project: Spark
  Issue Type: Sub-task
  Components: SQL, UI
Affects Versions: 4.0.0
Reporter: Kent Yao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46007) Add `AmmoniteTest` and mark `ReplE2ESuite`



 [ 
https://issues.apache.org/jira/browse/SPARK-46007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-46007:
--
Description: - https://github.com/com-lihaoyi/Ammonite/issues/276

> Add `AmmoniteTest` and mark `ReplE2ESuite`
> --
>
> Key: SPARK-46007
> URL: https://issues.apache.org/jira/browse/SPARK-46007
> Project: Spark
>  Issue Type: Test
>  Components: Connect, Tests
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>
> - https://github.com/com-lihaoyi/Ammonite/issues/276



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-46007) Add `AmmoniteTest` and mark `ReplE2ESuite`

Dongjoon Hyun created SPARK-46007:
-

 Summary: Add `AmmoniteTest` and mark `ReplE2ESuite`
 Key: SPARK-46007
 URL: https://issues.apache.org/jira/browse/SPARK-46007
 Project: Spark
  Issue Type: Test
  Components: Connect, Tests
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46004) Refine docstring of `DataFrame.dropna/fillna/replace`



 [ 
https://issues.apache.org/jira/browse/SPARK-46004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46004:
---
Labels: pull-request-available  (was: )

> Refine docstring of `DataFrame.dropna/fillna/replace`
> -
>
> Key: SPARK-46004
> URL: https://issues.apache.org/jira/browse/SPARK-46004
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-45937) Fix documentation of spark.executor.maxNumFailures

2023-11-20 Thread Cheng Pan (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17787926#comment-17787926
 ] 

Cheng Pan commented on SPARK-45937:
---

[~tgraves] sorry, I missed this ticket and opened a new one SPARK-45969, PR is 
ready for review https://github.com/apache/spark/pull/43863

> Fix documentation of spark.executor.maxNumFailures
> --
>
> Key: SPARK-45937
> URL: https://issues.apache.org/jira/browse/SPARK-45937
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: Thomas Graves
>Priority: Critical
>
> https://issues.apache.org/jira/browse/SPARK-41210 added support for 
> spark.executor.maxNumFailures on Kubernetes, it made this config generic and 
> deprecated the yarn version.  This config isn't documented and defaults are 
> not documented.
>  
> [https://github.com/apache/spark/commit/40872e9a094f8459b0b6f626937ced48a8d98efb]
> \
> It also added {color:#0a3069}spark.executor.failuresValidityInterval.{color}
>  
> {color:#0a3069}Both need to have default values specified for yarn and k8s, 
> it also needs to remove the yarn documentation for equivalent configs 
> spark.yarn.max.executor.failures configuration{color}
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46006) YarnAllocator miss clean targetNumExecutorsPerResourceProfileId after YarnSchedulerBackend call stop



 [ 
https://issues.apache.org/jira/browse/SPARK-46006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46006:
---
Labels: pull-request-available  (was: )

> YarnAllocator miss clean targetNumExecutorsPerResourceProfileId after 
> YarnSchedulerBackend call stop
> 
>
> Key: SPARK-46006
> URL: https://issues.apache.org/jira/browse/SPARK-46006
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 3.1.3, 3.2.4, 3.3.2, 3.4.1, 3.5.0
>Reporter: angerszhu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.2, 4.0.0, 3.5.1
>
> Attachments: image-2023-11-20-17-56-45-212.png, 
> image-2023-11-20-17-56-56-507.png
>
>
> We meet a case that user call sc.stop() after run all custom code, but stuck 
> in some place. 
> Cause below situation
>  # User call sc.stop()
>  # sc.stop() stuck in some process, but SchedulerBackend.stop was called
>  # Since tarn ApplicationMaster didn't finish， still call 
> YarnAllocator.allocateResources()
>  # Since driver endpoint stop new allocated executor failed to register
>  # untll trigger Max number of executor failures
> Caused by 
> Before call CoarseGrainedSchedulerBackend.stop() will call 
> YarnSchedulerBackend.requestTotalExecutor() to clean request info
> !image-2023-11-20-17-56-56-507.png|width=898,height=297!
>  
> From the log we make sure that CoarseGrainedSchedulerBackend.stop()  was 
> called
>  
>  
> When YarnAllocator handle then empty resource request,  since 
> resourceTotalExecutorsWithPreferedLocalities is empty, miss clean 
> targetNumExecutorsPerResourceProfileId.
> !image-2023-11-20-17-56-45-212.png|width=708,height=379!
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46006) YarnAllocator miss clean targetNumExecutorsPerResourceProfileId after YarnSchedulerBackend call stop



 [ 
https://issues.apache.org/jira/browse/SPARK-46006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu updated SPARK-46006:
--
Description: 
We meet a case that user call sc.stop() after run all custom code, but stuck in 
some place. 

Cause below situation
 # User call sc.stop()
 # sc.stop() stuck in some process, but SchedulerBackend.stop was called
 # Since tarn ApplicationMaster didn't finish， still call 
YarnAllocator.allocateResources()
 # Since driver endpoint stop new allocated executor failed to register
 # untll trigger Max number of executor failures

Caused by 

Before call CoarseGrainedSchedulerBackend.stop() will call 
YarnSchedulerBackend.requestTotalExecutor() to clean request info

!image-2023-11-20-17-56-56-507.png|width=898,height=297!

 

>From the log we make sure that CoarseGrainedSchedulerBackend.stop()  was called

 

 

When YarnAllocator handle then empty resource request,  since 
resourceTotalExecutorsWithPreferedLocalities is empty, miss clean 
targetNumExecutorsPerResourceProfileId.

!image-2023-11-20-17-56-45-212.png|width=708,height=379!

 

 

  was:
We meet a case that user call sc.stop() after run all custom code, but stuck in 
some place. 

Cause below situation
 # User call sc.stop()
 # sc.stop() stuck in some process, but SchedulerBackend.stop was called
 # Since tarn ApplicationMaster didn't finish， still call 
YarnAllocator.allocateResources()
因为driver端口已经关闭，allocateResource还在继续申请新的executor失败
触发 Max number of executor failures


> YarnAllocator miss clean targetNumExecutorsPerResourceProfileId after 
> YarnSchedulerBackend call stop
> 
>
> Key: SPARK-46006
> URL: https://issues.apache.org/jira/browse/SPARK-46006
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 3.1.3, 3.2.4, 3.3.2, 3.4.1, 3.5.0
>Reporter: angerszhu
>Priority: Major
> Fix For: 3.4.2, 4.0.0, 3.5.1
>
> Attachments: image-2023-11-20-17-56-45-212.png, 
> image-2023-11-20-17-56-56-507.png
>
>
> We meet a case that user call sc.stop() after run all custom code, but stuck 
> in some place. 
> Cause below situation
>  # User call sc.stop()
>  # sc.stop() stuck in some process, but SchedulerBackend.stop was called
>  # Since tarn ApplicationMaster didn't finish， still call 
> YarnAllocator.allocateResources()
>  # Since driver endpoint stop new allocated executor failed to register
>  # untll trigger Max number of executor failures
> Caused by 
> Before call CoarseGrainedSchedulerBackend.stop() will call 
> YarnSchedulerBackend.requestTotalExecutor() to clean request info
> !image-2023-11-20-17-56-56-507.png|width=898,height=297!
>  
> From the log we make sure that CoarseGrainedSchedulerBackend.stop()  was 
> called
>  
>  
> When YarnAllocator handle then empty resource request,  since 
> resourceTotalExecutorsWithPreferedLocalities is empty, miss clean 
> targetNumExecutorsPerResourceProfileId.
> !image-2023-11-20-17-56-45-212.png|width=708,height=379!
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46006) YarnAllocator miss clean targetNumExecutorsPerResourceProfileId after YarnSchedulerBackend call stop



 [ 
https://issues.apache.org/jira/browse/SPARK-46006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu updated SPARK-46006:
--
Attachment: image-2023-11-20-17-56-56-507.png

> YarnAllocator miss clean targetNumExecutorsPerResourceProfileId after 
> YarnSchedulerBackend call stop
> 
>
> Key: SPARK-46006
> URL: https://issues.apache.org/jira/browse/SPARK-46006
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 3.1.3, 3.2.4, 3.3.2, 3.4.1, 3.5.0
>Reporter: angerszhu
>Priority: Major
> Fix For: 3.4.2, 4.0.0, 3.5.1
>
> Attachments: image-2023-11-20-17-56-45-212.png, 
> image-2023-11-20-17-56-56-507.png
>
>
> We meet a case that user call sc.stop() after run all custom code, but stuck 
> in some place. 
> Cause below situation
>  # User call sc.stop()
>  # sc.stop() stuck in some process, but SchedulerBackend.stop was called
>  # Since tarn ApplicationMaster didn't finish， still call 
> YarnAllocator.allocateResources()
> 因为driver端口已经关闭，allocateResource还在继续申请新的executor失败
> 触发 Max number of executor failures



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46006) YarnAllocator miss clean targetNumExecutorsPerResourceProfileId after YarnSchedulerBackend call stop



 [ 
https://issues.apache.org/jira/browse/SPARK-46006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu updated SPARK-46006:
--
Attachment: image-2023-11-20-17-56-45-212.png

> YarnAllocator miss clean targetNumExecutorsPerResourceProfileId after 
> YarnSchedulerBackend call stop
> 
>
> Key: SPARK-46006
> URL: https://issues.apache.org/jira/browse/SPARK-46006
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 3.1.3, 3.2.4, 3.3.2, 3.4.1, 3.5.0
>Reporter: angerszhu
>Priority: Major
> Fix For: 3.4.2, 4.0.0, 3.5.1
>
> Attachments: image-2023-11-20-17-56-45-212.png, 
> image-2023-11-20-17-56-56-507.png
>
>
> We meet a case that user call sc.stop() after run all custom code, but stuck 
> in some place. 
> Cause below situation
>  # User call sc.stop()
>  # sc.stop() stuck in some process, but SchedulerBackend.stop was called
>  # Since tarn ApplicationMaster didn't finish， still call 
> YarnAllocator.allocateResources()
> 因为driver端口已经关闭，allocateResource还在继续申请新的executor失败
> 触发 Max number of executor failures



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46006) YarnAllocator miss clean targetNumExecutorsPerResourceProfileId after YarnSchedulerBackend call stop