[jira] [Resolved] (SPARK-46114) Define IndexError for PySpark error framework

2023-11-26 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46114.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44028
[https://github.com/apache/spark/pull/44028]

> Define IndexError for PySpark error framework
> -
>
> Key: SPARK-46114
> URL: https://issues.apache.org/jira/browse/SPARK-46114
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46114) Define IndexError for PySpark error framework

2023-11-26 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-46114:


 Summary: Define IndexError for PySpark error framework
 Key: SPARK-46114
 URL: https://issues.apache.org/jira/browse/SPARK-46114
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46110) Use error classes in catalog, conf, connect, observation, pandas modules

2023-11-26 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-46110:


 Summary: Use error classes in catalog, conf, connect, observation, 
pandas modules
 Key: SPARK-46110
 URL: https://issues.apache.org/jira/browse/SPARK-46110
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Deleted] (SPARK-46109) Migrate to error classes in PySpark

2023-11-26 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon deleted SPARK-46109:
-


> Migrate to error classes in PySpark
> ---
>
> Key: SPARK-46109
> URL: https://issues.apache.org/jira/browse/SPARK-46109
> Project: Spark
>  Issue Type: Umbrella
>Reporter: Hyukjin Kwon
>Priority: Major
>
> SPARK-41597 continues here to use error classes in PySpark.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46109) Migrate to error classes in PySpark

2023-11-26 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-46109:
-

> Migrate to error classes in PySpark
> ---
>
> Key: SPARK-46109
> URL: https://issues.apache.org/jira/browse/SPARK-46109
> Project: Spark
>  Issue Type: Umbrella
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> SPARK-41597 continues here to use error classes in PySpark.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46109) Migrate to error classes in PySpark

2023-11-26 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-46109:


 Summary: Migrate to error classes in PySpark
 Key: SPARK-46109
 URL: https://issues.apache.org/jira/browse/SPARK-46109
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


SPARK-41597 continues here to use error classes in PySpark.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32933) Use keyword-only syntax for keyword_only methods

2023-11-26 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17789874#comment-17789874
 ] 

Hyukjin Kwon commented on SPARK-32933:
--

Here the PR and JIRA: https://github.com/apache/spark/pull/44023 
https://issues.apache.org/jira/browse/SPARK-46107

> Use keyword-only syntax for keyword_only methods
> 
>
> Key: SPARK-32933
> URL: https://issues.apache.org/jira/browse/SPARK-32933
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.1.0
>Reporter: Maciej Szymkiewicz
>Assignee: Maciej Szymkiewicz
>Priority: Minor
> Fix For: 3.1.0
>
>
> Since 3.0, provides syntax for indicating keyword-only arguments ([PEP 
> 3102|https://www.python.org/dev/peps/pep-3102/]).
> It is not a full replacement for our current usage of {{keyword_only}}, but 
> it would allow us to make our expectations explicit:
> {code:python}
> @keyword_only
> def __init__(self, degree=2, inputCol=None, outputCol=None):
> {code}
> {code:python}
> @keyword_only
> def __init__(self, *, degree=2, inputCol=None, outputCol=None):
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46107) Deprecate pyspark.keyword_only API

2023-11-26 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-46107:


 Summary: Deprecate pyspark.keyword_only API
 Key: SPARK-46107
 URL: https://issues.apache.org/jira/browse/SPARK-46107
 Project: Spark
  Issue Type: Improvement
  Components: ML, PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


See https://issues.apache.org/jira/browse/SPARK-32933. We don't need this 
anymore



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46074) [CONNECT][SCALA] Insufficient details in error when a UDF fails

2023-11-26 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46074.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43983
[https://github.com/apache/spark/pull/43983]

> [CONNECT][SCALA] Insufficient details in error when a UDF fails
> ---
>
> Key: SPARK-46074
> URL: https://issues.apache.org/jira/browse/SPARK-46074
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Niranjan Jayakar
>Assignee: Niranjan Jayakar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Currently, when a UDF fails the connect client does not receive the actual 
> error that caused the failure. 
> As an example, the error message looks like -
> {code:java}
> Exception in thread "main" org.apache.spark.SparkException: 
> grpc_shaded.io.grpc.StatusRuntimeException: INTERNAL: Job aborted due to 
> stage failure: Task 2 in stage 0.0 failed 4 times, most recent failure: Lost 
> task 2.3 in stage 0.0 (TID 10) (10.68.141.158 executor 0): 
> org.apache.spark.SparkException: [FAILED_EXECUTE_UDF] Failed to execute user 
> defined function (` (Main$$$Lambda$4770/1714264622)`: (int) => int). 
> SQLSTATE: 39000 {code}
> In this case, the actual error was a {{{}java.lang.NoClassDefFoundError{}}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46074) [CONNECT][SCALA] Insufficient details in error when a UDF fails

2023-11-26 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-46074:


Assignee: Niranjan Jayakar

> [CONNECT][SCALA] Insufficient details in error when a UDF fails
> ---
>
> Key: SPARK-46074
> URL: https://issues.apache.org/jira/browse/SPARK-46074
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Niranjan Jayakar
>Assignee: Niranjan Jayakar
>Priority: Major
>  Labels: pull-request-available
>
> Currently, when a UDF fails the connect client does not receive the actual 
> error that caused the failure. 
> As an example, the error message looks like -
> {code:java}
> Exception in thread "main" org.apache.spark.SparkException: 
> grpc_shaded.io.grpc.StatusRuntimeException: INTERNAL: Job aborted due to 
> stage failure: Task 2 in stage 0.0 failed 4 times, most recent failure: Lost 
> task 2.3 in stage 0.0 (TID 10) (10.68.141.158 executor 0): 
> org.apache.spark.SparkException: [FAILED_EXECUTE_UDF] Failed to execute user 
> defined function (` (Main$$$Lambda$4770/1714264622)`: (int) => int). 
> SQLSTATE: 39000 {code}
> In this case, the actual error was a {{{}java.lang.NoClassDefFoundError{}}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45922) Multiple policies follow-up (Python)

2023-11-24 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-45922.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43800
[https://github.com/apache/spark/pull/43800]

> Multiple policies follow-up (Python)
> 
>
> Key: SPARK-45922
> URL: https://issues.apache.org/jira/browse/SPARK-45922
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Alice Sayutina
>Assignee: Alice Sayutina
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Minor further improvements for multiple policies work



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46016) Fix pandas API support list properly

2023-11-24 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46016.
--
Fix Version/s: 3.4.2
   4.0.0
   3.5.1
 Assignee: Haejoon Lee
   Resolution: Fixed

Fixed in https://github.com/apache/spark/pull/43996

> Fix pandas API support list properly
> 
>
> Key: SPARK-46016
> URL: https://issues.apache.org/jira/browse/SPARK-46016
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation, Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.2, 4.0.0, 3.5.1
>
>
> Currently Supported pandas API is not generated properly, so we should fix it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46082) Fix protobuf string representation for Pandas Functions API with Spark Connect

2023-11-23 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-46082:


Assignee: Hyukjin Kwon

> Fix protobuf string representation for Pandas Functions API with Spark Connect
> --
>
> Key: SPARK-46082
> URL: https://issues.apache.org/jira/browse/SPARK-46082
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Minor
>  Labels: pull-request-available
>
> {code}
> df = spark.range(1)
> df.mapInPandas(lambda x: x, df.schema)._plan.print()
> {code}
> prints as below. It should includes functions.
> {code}
> 
>   
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46082) Fix protobuf string representation for Pandas Functions API with Spark Connect

2023-11-23 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46082.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43991
[https://github.com/apache/spark/pull/43991]

> Fix protobuf string representation for Pandas Functions API with Spark Connect
> --
>
> Key: SPARK-46082
> URL: https://issues.apache.org/jira/browse/SPARK-46082
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> {code}
> df = spark.range(1)
> df.mapInPandas(lambda x: x, df.schema)._plan.print()
> {code}
> prints as below. It should includes functions.
> {code}
> 
>   
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46085) Dataset.groupingSets in Scala Spark Connect client

2023-11-23 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-46085:


 Summary: Dataset.groupingSets in Scala Spark Connect client
 Key: SPARK-46085
 URL: https://issues.apache.org/jira/browse/SPARK-46085
 Project: Spark
  Issue Type: New Feature
  Components: Connect, SQL
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


Scala Spark Connect client for SPARK-45929



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46083) Make SparkNoSuchElementException as a canonical error API

2023-11-23 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-46083:


 Summary: Make SparkNoSuchElementException as a canonical error API
 Key: SPARK-46083
 URL: https://issues.apache.org/jira/browse/SPARK-46083
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


https://github.com/apache/spark/pull/43927 added SparkNoSuchElementException. 
It should be a canonical error API, documented properly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46082) Fix protobuf string representation for Pandas Functions API with Spark Connect

2023-11-23 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-46082:


 Summary: Fix protobuf string representation for Pandas Functions 
API with Spark Connect
 Key: SPARK-46082
 URL: https://issues.apache.org/jira/browse/SPARK-46082
 Project: Spark
  Issue Type: Improvement
  Components: Connect, PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


{code}
df = spark.range(1)
df.mapInPandas(lambda x: x, df.schema)._plan.print()
{code}

prints as below. It should includes functions.

{code}

  
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46080) Upgrade Cloudpickle to 3.0.0

2023-11-23 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-46080:


 Summary: Upgrade Cloudpickle to 3.0.0
 Key: SPARK-46080
 URL: https://issues.apache.org/jira/browse/SPARK-46080
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


It includes official support of Python 3.12 
(https://github.com/cloudpipe/cloudpickle/pull/517)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46076) Remove `unittest` deprecated alias usage for Python 3.12

2023-11-23 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46076.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43986
[https://github.com/apache/spark/pull/43986]

> Remove `unittest` deprecated alias usage for Python 3.12
> 
>
> Key: SPARK-46076
> URL: https://issues.apache.org/jira/browse/SPARK-46076
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46065) Refactor `(DataFrame|Series).factorize()` to use `create_map`.

2023-11-23 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46065.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43970
[https://github.com/apache/spark/pull/43970]

> Refactor `(DataFrame|Series).factorize()` to use `create_map`.
> --
>
> Key: SPARK-46065
> URL: https://issues.apache.org/jira/browse/SPARK-46065
> Project: Spark
>  Issue Type: Bug
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> We can accept Column object for Column.__getitem__ on remote Session, so we 
> can optimize the existing factorize implementation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46065) Refactor `(DataFrame|Series).factorize()` to use `create_map`.

2023-11-23 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-46065:


Assignee: Haejoon Lee

> Refactor `(DataFrame|Series).factorize()` to use `create_map`.
> --
>
> Key: SPARK-46065
> URL: https://issues.apache.org/jira/browse/SPARK-46065
> Project: Spark
>  Issue Type: Bug
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
>
> We can accept Column object for Column.__getitem__ on remote Session, so we 
> can optimize the existing factorize implementation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Deleted] (SPARK-46049) Support groupingSets operation in PySpark (Spark Connect)

2023-11-22 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon deleted SPARK-46049:
-


> Support groupingSets operation in PySpark (Spark Connect)
> -
>
> Key: SPARK-46049
> URL: https://issues.apache.org/jira/browse/SPARK-46049
> Project: Spark
>  Issue Type: New Feature
>Reporter: Hyukjin Kwon
>Priority: Major
>
> Connect version of SPARK-46048



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46063) Improve error messages related to argument types in cute, rollup, groupby, and pivot

2023-11-22 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-46063:
-
Description: 
{code}
>>> spark.range(1).cube(cols=1.2)
Traceback (most recent call last):
  File "", line 1, in 
  File "/.../python/pyspark/sql/connect/dataframe.py", line 544, in cube
raise PySparkTypeError(
pyspark.errors.exceptions.base.PySparkTypeError: [NOT_COLUMN_OR_STR] Argument 
`cube` should be a Column or str, got float.
{code}

{code}
>>> help(spark.range(1).cube)
Help on method cube in module pyspark.sql.connect.dataframe:

cube(*cols: 'ColumnOrName') -> 'GroupedData' method of 
pyspark.sql.connect.dataframe.DataFrame instance
Create a multi-dimensional cube for the current :class:`DataFrame` using
the specified columns, allowing aggregations to be performed on them.

.. versionadded:: 1.4.0

.. versionchanged:: 3.4.0
{code}

it has to be {cols}

  was:
{code}
>>> spark.range(1).cube(cols=1.2)
Traceback (most recent call last):
  File "", line 1, in 
  File "/.../python/pyspark/sql/connect/dataframe.py", line 544, in cube
raise PySparkTypeError(
pyspark.errors.exceptions.base.PySparkTypeError: [NOT_COLUMN_OR_STR] Argument 
`cube` should be a Column or str, got float.
{code}

```
Help on method cube in module pyspark.sql.connect.dataframe:

cube(*cols: 'ColumnOrName') -> 'GroupedData' method of 
pyspark.sql.connect.dataframe.DataFrame instance
Create a multi-dimensional cube for the current :class:`DataFrame` using
the specified columns, allowing aggregations to be performed on them.

.. versionadded:: 1.4.0

.. versionchanged:: 3.4.0
```

it has to be {cols}


> Improve error messages related to argument types in cute, rollup, groupby, 
> and pivot
> 
>
> Key: SPARK-46063
> URL: https://issues.apache.org/jira/browse/SPARK-46063
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Minor
>
> {code}
> >>> spark.range(1).cube(cols=1.2)
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "/.../python/pyspark/sql/connect/dataframe.py", line 544, in cube
> raise PySparkTypeError(
> pyspark.errors.exceptions.base.PySparkTypeError: [NOT_COLUMN_OR_STR] Argument 
> `cube` should be a Column or str, got float.
> {code}
> {code}
> >>> help(spark.range(1).cube)
> Help on method cube in module pyspark.sql.connect.dataframe:
> cube(*cols: 'ColumnOrName') -> 'GroupedData' method of 
> pyspark.sql.connect.dataframe.DataFrame instance
> Create a multi-dimensional cube for the current :class:`DataFrame` using
> the specified columns, allowing aggregations to be performed on them.
> .. versionadded:: 1.4.0
> .. versionchanged:: 3.4.0
> {code}
> it has to be {cols}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46063) Improve error messages related to argument types in cute, rollup, groupby, and pivot

2023-11-22 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-46063:
-
Summary: Improve error messages related to argument types in cute, rollup, 
groupby, and pivot  (was: Improve error messages related to argument types in 
cute, rollup, and pivot)

> Improve error messages related to argument types in cute, rollup, groupby, 
> and pivot
> 
>
> Key: SPARK-46063
> URL: https://issues.apache.org/jira/browse/SPARK-46063
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Minor
>
> {code}
> >>> spark.range(1).cube(cols=1.2)
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "/.../python/pyspark/sql/connect/dataframe.py", line 544, in cube
> raise PySparkTypeError(
> pyspark.errors.exceptions.base.PySparkTypeError: [NOT_COLUMN_OR_STR] Argument 
> `cube` should be a Column or str, got float.
> {code}
> ```
> Help on method cube in module pyspark.sql.connect.dataframe:
> cube(*cols: 'ColumnOrName') -> 'GroupedData' method of 
> pyspark.sql.connect.dataframe.DataFrame instance
> Create a multi-dimensional cube for the current :class:`DataFrame` using
> the specified columns, allowing aggregations to be performed on them.
> .. versionadded:: 1.4.0
> .. versionchanged:: 3.4.0
> ```
> it has to be {cols}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46063) Improve error messages related to argument types in cute, rollup, and pivot

2023-11-22 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-46063:


 Summary: Improve error messages related to argument types in cute, 
rollup, and pivot
 Key: SPARK-46063
 URL: https://issues.apache.org/jira/browse/SPARK-46063
 Project: Spark
  Issue Type: Improvement
  Components: Connect, PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


{code}
>>> spark.range(1).cube(cols=1.2)
Traceback (most recent call last):
  File "", line 1, in 
  File "/.../python/pyspark/sql/connect/dataframe.py", line 544, in cube
raise PySparkTypeError(
pyspark.errors.exceptions.base.PySparkTypeError: [NOT_COLUMN_OR_STR] Argument 
`cube` should be a Column or str, got float.
{code}

```
Help on method cube in module pyspark.sql.connect.dataframe:

cube(*cols: 'ColumnOrName') -> 'GroupedData' method of 
pyspark.sql.connect.dataframe.DataFrame instance
Create a multi-dimensional cube for the current :class:`DataFrame` using
the specified columns, allowing aggregations to be performed on them.

.. versionadded:: 1.4.0

.. versionchanged:: 3.4.0
```

it has to be {cols}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46061) Add the test party for reattach test case

2023-11-22 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-46061:


Assignee: Hyukjin Kwon

> Add the test party for reattach test case
> -
>
> Key: SPARK-46061
> URL: https://issues.apache.org/jira/browse/SPARK-46061
> Project: Spark
>  Issue Type: Test
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> We need the same test "ReleaseSession releases all queries and does not allow 
> more requests in the session" added in SPARK-45798 to identify an issue like 
> SPARK-46042.
> This is caused by SPARK-46039



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46061) Add the test party for reattach test case

2023-11-22 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46061.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43965
[https://github.com/apache/spark/pull/43965]

> Add the test party for reattach test case
> -
>
> Key: SPARK-46061
> URL: https://issues.apache.org/jira/browse/SPARK-46061
> Project: Spark
>  Issue Type: Test
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> We need the same test "ReleaseSession releases all queries and does not allow 
> more requests in the session" added in SPARK-45798 to identify an issue like 
> SPARK-46042.
> This is caused by SPARK-46039



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45600) Make Python data source registration session level

2023-11-22 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-45600.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43742
[https://github.com/apache/spark/pull/43742]

> Make Python data source registration session level
> --
>
> Key: SPARK-45600
> URL: https://issues.apache.org/jira/browse/SPARK-45600
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Currently, registered data sources are stored in `sharedState` and can be 
> accessed across multiple sessions. This, however, will not work with Spark 
> Connect. We should make this registration session level, and support static 
> registration (e.g. using pip install) in the future.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45600) Make Python data source registration session level

2023-11-22 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-45600:


Assignee: Allison Wang

> Make Python data source registration session level
> --
>
> Key: SPARK-45600
> URL: https://issues.apache.org/jira/browse/SPARK-45600
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
>  Labels: pull-request-available
>
> Currently, registered data sources are stored in `sharedState` and can be 
> accessed across multiple sessions. This, however, will not work with Spark 
> Connect. We should make this registration session level, and support static 
> registration (e.g. using pip install) in the future.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46061) Add the test party for reattach test case

2023-11-22 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-46061:


 Summary: Add the test party for reattach test case
 Key: SPARK-46061
 URL: https://issues.apache.org/jira/browse/SPARK-46061
 Project: Spark
  Issue Type: New Feature
  Components: Connect, PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


We need the same test "ReleaseSession releases all queries and does not allow 
more requests in the session" added in SPARK-45798 to identify an issue like 
SPARK-46042.

This is caused by SPARK-46039



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46061) Add the test party for reattach test case

2023-11-22 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-46061:
-
Issue Type: Test  (was: New Feature)

> Add the test party for reattach test case
> -
>
> Key: SPARK-46061
> URL: https://issues.apache.org/jira/browse/SPARK-46061
> Project: Spark
>  Issue Type: Test
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> We need the same test "ReleaseSession releases all queries and does not allow 
> more requests in the session" added in SPARK-45798 to identify an issue like 
> SPARK-46042.
> This is caused by SPARK-46039



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46048) Support groupingSets operation in PySpark

2023-11-21 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46048.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43951
[https://github.com/apache/spark/pull/43951]

> Support groupingSets operation in PySpark
> -
>
> Key: SPARK-46048
> URL: https://issues.apache.org/jira/browse/SPARK-46048
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Python version of SPARK-45929



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46048) Support groupingSets operation in PySpark

2023-11-21 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-46048:


Assignee: Hyukjin Kwon

> Support groupingSets operation in PySpark
> -
>
> Key: SPARK-46048
> URL: https://issues.apache.org/jira/browse/SPARK-46048
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> Python version of SPARK-45929



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46048) Support groupingSets operation in PySpark

2023-11-21 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-46048:
-
Issue Type: New Feature  (was: Bug)

> Support groupingSets operation in PySpark
> -
>
> Key: SPARK-46048
> URL: https://issues.apache.org/jira/browse/SPARK-46048
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> Python version of SPARK-45929



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46048) Support groupingSets operation in PySpark

2023-11-21 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-46048:


 Summary: Support groupingSets operation in PySpark
 Key: SPARK-46048
 URL: https://issues.apache.org/jira/browse/SPARK-46048
 Project: Spark
  Issue Type: Bug
  Components: Connect, PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


Python version of SPARK-45929



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46049) Support groupingSets operation in PySpark (Spark Connect)

2023-11-21 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-46049:


 Summary: Support groupingSets operation in PySpark (Spark Connect)
 Key: SPARK-46049
 URL: https://issues.apache.org/jira/browse/SPARK-46049
 Project: Spark
  Issue Type: New Feature
  Components: Connect, PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


Connect version of SPARK-46048



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46017) PySpark doc build doesn't work properly on Mac

2023-11-21 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-46017:


Assignee: Haejoon Lee

> PySpark doc build doesn't work properly on Mac
> --
>
> Key: SPARK-46017
> URL: https://issues.apache.org/jira/browse/SPARK-46017
> Project: Spark
>  Issue Type: Bug
>  Components: Build, PySpark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>
> PySpark doc build is working properly on GitHub CI, but doesn't work properly 
> on local Mac env for some reason. We should investigate and fix it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46022) Remove deprecated functions APIs from documents

2023-11-21 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46022.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43932
[https://github.com/apache/spark/pull/43932]

> Remove deprecated functions APIs from documents
> ---
>
> Key: SPARK-46022
> URL: https://issues.apache.org/jira/browse/SPARK-46022
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> We should not expose the deprecated APIs on official documentation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46022) Remove deprecated functions APIs from documents

2023-11-21 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-46022:


Assignee: Haejoon Lee

> Remove deprecated functions APIs from documents
> ---
>
> Key: SPARK-46022
> URL: https://issues.apache.org/jira/browse/SPARK-46022
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
>
> We should not expose the deprecated APIs on official documentation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46013) Improve basic datasource examples

2023-11-21 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-46013:


Assignee: Allison Wang

> Improve basic datasource examples
> -
>
> Key: SPARK-46013
> URL: https://issues.apache.org/jira/browse/SPARK-46013
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
>  Labels: pull-request-available
>
> We should improve the Python examples on this page: 
> [https://spark.apache.org/docs/latest/sql-data-sources-load-save-functions.html]
>  (basic_datasource_examples.py)
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46017) PySpark doc build doesn't work properly on Mac

2023-11-21 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46017.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43932
[https://github.com/apache/spark/pull/43932]

> PySpark doc build doesn't work properly on Mac
> --
>
> Key: SPARK-46017
> URL: https://issues.apache.org/jira/browse/SPARK-46017
> Project: Spark
>  Issue Type: Bug
>  Components: Build, PySpark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
> Fix For: 4.0.0
>
>
> PySpark doc build is working properly on GitHub CI, but doesn't work properly 
> on local Mac env for some reason. We should investigate and fix it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46013) Improve basic datasource examples

2023-11-21 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46013.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43917
[https://github.com/apache/spark/pull/43917]

> Improve basic datasource examples
> -
>
> Key: SPARK-46013
> URL: https://issues.apache.org/jira/browse/SPARK-46013
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> We should improve the Python examples on this page: 
> [https://spark.apache.org/docs/latest/sql-data-sources-load-save-functions.html]
>  (basic_datasource_examples.py)
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46042) Reenable a `releaseSession` test case in SparkConnectServiceE2ESuite

2023-11-21 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-46042:
-
Description: 
https://github.com/apache/spark/pull/43942#issuecomment-1821896165

> Reenable a `releaseSession` test case in SparkConnectServiceE2ESuite
> 
>
> Key: SPARK-46042
> URL: https://issues.apache.org/jira/browse/SPARK-46042
> Project: Spark
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> https://github.com/apache/spark/pull/43942#issuecomment-1821896165



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-46032) connect: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f

2023-11-21 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-46032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17788430#comment-17788430
 ] 

Hyukjin Kwon commented on SPARK-46032:
--

Are executors using the same versions too? The error is most likely from a 
different version of JDK and Scala version. I can't reproduce them locally so 
sharing fulll specification of the server and the client would be very helpful.

> connect: cannot assign instance of java.lang.invoke.SerializedLambda to field 
> org.apache.spark.rdd.MapPartitionsRDD.f
> -
>
> Key: SPARK-46032
> URL: https://issues.apache.org/jira/browse/SPARK-46032
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Bobby Wang
>Priority: Major
>
> I downloaded spark 3.5 from the spark official website, and then I started a 
> Spark Standalone cluster in which both master and the only worker are in the 
> same node. 
>  
> Then I started the connect server by 
> {code:java}
> start-connect-server.sh \
>     --master spark://10.19.183.93:7077 \
>     --packages org.apache.spark:spark-connect_2.12:3.5.0 \
>     --conf spark.executor.cores=12 \
>     --conf spark.task.cpus=1 \
>     --executor-memory 30G \
>     --conf spark.executor.resource.gpu.amount=1 \
>     --conf spark.task.resource.gpu.amount=0.08 \
>     --driver-memory 1G{code}
>  
> I can 100% ensure the spark standalone cluster, the connect server and spark 
> driver are started observed from the webui.
>  
> Finally, I tried to run a very simple spark job 
> (spark.range(100).filter("id>2").collect()) from spark-connect-client using 
> pyspark, but I got the below error.
>  
> _pyspark --remote sc://localhost_
> _Python 3.10.0 (default, Mar  3 2022, 09:58:08) [GCC 7.5.0] on linux_
> _Type "help", "copyright", "credits" or "license" for more information._
> _Welcome to_
>       _              ___
>      _/ __/_  {{_}}{_}__ ___{_}{{_}}/ /{{_}}{_}_
>     {_}{{_}}\ \/ _ \/ _ `/ {_}{{_}}/  '{_}/{_}
>    {_}/{_}_ / .{_}{{_}}/{_},{_}/{_}/ /{_}/{_}\   version 3.5.0{_}
>       {_}/{_}/_
>  
> _Using Python version 3.10.0 (default, Mar  3 2022 09:58:08)_
> _Client connected to the Spark Connect server at localhost_
> _SparkSession available as 'spark'._
> _>>> spark.range(100).filter("id > 3").collect()_
> _Traceback (most recent call last):_
>   _File "", line 1, in _
>   _File 
> "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/dataframe.py",
>  line 1645, in collect_
>     _table, schema = self._session.client.to_table(query)_
>   _File 
> "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py",
>  line 858, in to_table_
>     _table, schema, _, _, _ = self._execute_and_fetch(req)_
>   _File 
> "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py",
>  line 1282, in _execute_and_fetch_
>     _for response in self._execute_and_fetch_as_iterator(req):_
>   _File 
> "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py",
>  line 1263, in _execute_and_fetch_as_iterator_
>     _self._handle_error(error)_
>   _File 
> "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py",
>  line 1502, in _handle_error_
>     _self._handle_rpc_error(error)_
>   _File 
> "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py",
>  line 1538, in _handle_rpc_error_
>     _raise convert_exception(info, status.message) from None_
> _pyspark.errors.exceptions.connect.SparkConnectGrpcException: 
> (org.apache.spark.SparkException) Job aborted due to stage failure: Task 0 in 
> stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 
> (TID 35) (10.19.183.93 executor 0): java.lang.ClassCastException: cannot 
> assign instance of java.lang.invoke.SerializedLambda to field 
> org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance 
> of org.apache.spark.rdd.MapPartitionsRDD_
> _at 
> java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2301)_
> _at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1431)_
> _at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2437)_
> _at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)_
> _at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)_
> _at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)_
> _at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)_
> _at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)_
> _at 

[jira] [Commented] (SPARK-46032) connect: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f

2023-11-21 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-46032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17788381#comment-17788381
 ] 

Hyukjin Kwon commented on SPARK-46032:
--

and can you run without Spark Connect? Seems like just regular Spark shell 
would fail given the error messages.

> connect: cannot assign instance of java.lang.invoke.SerializedLambda to field 
> org.apache.spark.rdd.MapPartitionsRDD.f
> -
>
> Key: SPARK-46032
> URL: https://issues.apache.org/jira/browse/SPARK-46032
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Bobby Wang
>Priority: Major
>
> I downloaded spark 3.5 from the spark official website, and then I started a 
> Spark Standalone cluster in which both master and the only worker are in the 
> same node. 
>  
> Then I started the connect server by 
> {code:java}
> start-connect-server.sh \
>     --master spark://10.19.183.93:7077 \
>     --packages org.apache.spark:spark-connect_2.12:3.5.0 \
>     --conf spark.executor.cores=12 \
>     --conf spark.task.cpus=1 \
>     --executor-memory 30G \
>     --conf spark.executor.resource.gpu.amount=1 \
>     --conf spark.task.resource.gpu.amount=0.08 \
>     --driver-memory 1G{code}
>  
> I can 100% ensure the spark standalone cluster, the connect server and spark 
> driver are started observed from the webui.
>  
> Finally, I tried to run a very simple spark job 
> (spark.range(100).filter("id>2").collect()) from spark-connect-client using 
> pyspark, but I got the below error.
>  
> _pyspark --remote sc://localhost_
> _Python 3.10.0 (default, Mar  3 2022, 09:58:08) [GCC 7.5.0] on linux_
> _Type "help", "copyright", "credits" or "license" for more information._
> _Welcome to_
>       _              ___
>      _/ __/_  {{_}}{_}__ ___{_}{{_}}/ /{{_}}{_}_
>     {_}{{_}}\ \/ _ \/ _ `/ {_}{{_}}/  '{_}/{_}
>    {_}/{_}_ / .{_}{{_}}/{_},{_}/{_}/ /{_}/{_}\   version 3.5.0{_}
>       {_}/{_}/_
>  
> _Using Python version 3.10.0 (default, Mar  3 2022 09:58:08)_
> _Client connected to the Spark Connect server at localhost_
> _SparkSession available as 'spark'._
> _>>> spark.range(100).filter("id > 3").collect()_
> _Traceback (most recent call last):_
>   _File "", line 1, in _
>   _File 
> "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/dataframe.py",
>  line 1645, in collect_
>     _table, schema = self._session.client.to_table(query)_
>   _File 
> "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py",
>  line 858, in to_table_
>     _table, schema, _, _, _ = self._execute_and_fetch(req)_
>   _File 
> "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py",
>  line 1282, in _execute_and_fetch_
>     _for response in self._execute_and_fetch_as_iterator(req):_
>   _File 
> "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py",
>  line 1263, in _execute_and_fetch_as_iterator_
>     _self._handle_error(error)_
>   _File 
> "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py",
>  line 1502, in _handle_error_
>     _self._handle_rpc_error(error)_
>   _File 
> "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py",
>  line 1538, in _handle_rpc_error_
>     _raise convert_exception(info, status.message) from None_
> _pyspark.errors.exceptions.connect.SparkConnectGrpcException: 
> (org.apache.spark.SparkException) Job aborted due to stage failure: Task 0 in 
> stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 
> (TID 35) (10.19.183.93 executor 0): java.lang.ClassCastException: cannot 
> assign instance of java.lang.invoke.SerializedLambda to field 
> org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance 
> of org.apache.spark.rdd.MapPartitionsRDD_
> _at 
> java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2301)_
> _at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1431)_
> _at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2437)_
> _at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)_
> _at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)_
> _at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)_
> _at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)_
> _at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)_
> _at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)_
> _at 

[jira] [Commented] (SPARK-46032) connect: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f

2023-11-21 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-46032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17788380#comment-17788380
 ] 

Hyukjin Kwon commented on SPARK-46032:
--

What's your Scala version [~wbo4958]?

> connect: cannot assign instance of java.lang.invoke.SerializedLambda to field 
> org.apache.spark.rdd.MapPartitionsRDD.f
> -
>
> Key: SPARK-46032
> URL: https://issues.apache.org/jira/browse/SPARK-46032
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Bobby Wang
>Priority: Major
>
> I downloaded spark 3.5 from the spark official website, and then I started a 
> Spark Standalone cluster in which both master and the only worker are in the 
> same node. 
>  
> Then I started the connect server by 
> {code:java}
> start-connect-server.sh \
>     --master spark://10.19.183.93:7077 \
>     --packages org.apache.spark:spark-connect_2.12:3.5.0 \
>     --conf spark.executor.cores=12 \
>     --conf spark.task.cpus=1 \
>     --executor-memory 30G \
>     --conf spark.executor.resource.gpu.amount=1 \
>     --conf spark.task.resource.gpu.amount=0.08 \
>     --driver-memory 1G{code}
>  
> I can 100% ensure the spark standalone cluster, the connect server and spark 
> driver are started observed from the webui.
>  
> Finally, I tried to run a very simple spark job 
> (spark.range(100).filter("id>2").collect()) from spark-connect-client using 
> pyspark, but I got the below error.
>  
> _pyspark --remote sc://localhost_
> _Python 3.10.0 (default, Mar  3 2022, 09:58:08) [GCC 7.5.0] on linux_
> _Type "help", "copyright", "credits" or "license" for more information._
> _Welcome to_
>       _              ___
>      _/ __/_  {{_}}{_}__ ___{_}{{_}}/ /{{_}}{_}_
>     {_}{{_}}\ \/ _ \/ _ `/ {_}{{_}}/  '{_}/{_}
>    {_}/{_}_ / .{_}{{_}}/{_},{_}/{_}/ /{_}/{_}\   version 3.5.0{_}
>       {_}/{_}/_
>  
> _Using Python version 3.10.0 (default, Mar  3 2022 09:58:08)_
> _Client connected to the Spark Connect server at localhost_
> _SparkSession available as 'spark'._
> _>>> spark.range(100).filter("id > 3").collect()_
> _Traceback (most recent call last):_
>   _File "", line 1, in _
>   _File 
> "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/dataframe.py",
>  line 1645, in collect_
>     _table, schema = self._session.client.to_table(query)_
>   _File 
> "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py",
>  line 858, in to_table_
>     _table, schema, _, _, _ = self._execute_and_fetch(req)_
>   _File 
> "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py",
>  line 1282, in _execute_and_fetch_
>     _for response in self._execute_and_fetch_as_iterator(req):_
>   _File 
> "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py",
>  line 1263, in _execute_and_fetch_as_iterator_
>     _self._handle_error(error)_
>   _File 
> "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py",
>  line 1502, in _handle_error_
>     _self._handle_rpc_error(error)_
>   _File 
> "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py",
>  line 1538, in _handle_rpc_error_
>     _raise convert_exception(info, status.message) from None_
> _pyspark.errors.exceptions.connect.SparkConnectGrpcException: 
> (org.apache.spark.SparkException) Job aborted due to stage failure: Task 0 in 
> stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 
> (TID 35) (10.19.183.93 executor 0): java.lang.ClassCastException: cannot 
> assign instance of java.lang.invoke.SerializedLambda to field 
> org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance 
> of org.apache.spark.rdd.MapPartitionsRDD_
> _at 
> java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2301)_
> _at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1431)_
> _at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2437)_
> _at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)_
> _at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)_
> _at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)_
> _at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)_
> _at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)_
> _at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)_
> _at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)_
> _at 

[jira] [Comment Edited] (SPARK-46032) connect: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f

2023-11-21 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-46032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17788380#comment-17788380
 ] 

Hyukjin Kwon edited comment on SPARK-46032 at 11/21/23 11:34 AM:
-

What's your Scala and JDK versions [~wbo4958]?


was (Author: gurwls223):
What's your Scala version [~wbo4958]?

> connect: cannot assign instance of java.lang.invoke.SerializedLambda to field 
> org.apache.spark.rdd.MapPartitionsRDD.f
> -
>
> Key: SPARK-46032
> URL: https://issues.apache.org/jira/browse/SPARK-46032
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Bobby Wang
>Priority: Major
>
> I downloaded spark 3.5 from the spark official website, and then I started a 
> Spark Standalone cluster in which both master and the only worker are in the 
> same node. 
>  
> Then I started the connect server by 
> {code:java}
> start-connect-server.sh \
>     --master spark://10.19.183.93:7077 \
>     --packages org.apache.spark:spark-connect_2.12:3.5.0 \
>     --conf spark.executor.cores=12 \
>     --conf spark.task.cpus=1 \
>     --executor-memory 30G \
>     --conf spark.executor.resource.gpu.amount=1 \
>     --conf spark.task.resource.gpu.amount=0.08 \
>     --driver-memory 1G{code}
>  
> I can 100% ensure the spark standalone cluster, the connect server and spark 
> driver are started observed from the webui.
>  
> Finally, I tried to run a very simple spark job 
> (spark.range(100).filter("id>2").collect()) from spark-connect-client using 
> pyspark, but I got the below error.
>  
> _pyspark --remote sc://localhost_
> _Python 3.10.0 (default, Mar  3 2022, 09:58:08) [GCC 7.5.0] on linux_
> _Type "help", "copyright", "credits" or "license" for more information._
> _Welcome to_
>       _              ___
>      _/ __/_  {{_}}{_}__ ___{_}{{_}}/ /{{_}}{_}_
>     {_}{{_}}\ \/ _ \/ _ `/ {_}{{_}}/  '{_}/{_}
>    {_}/{_}_ / .{_}{{_}}/{_},{_}/{_}/ /{_}/{_}\   version 3.5.0{_}
>       {_}/{_}/_
>  
> _Using Python version 3.10.0 (default, Mar  3 2022 09:58:08)_
> _Client connected to the Spark Connect server at localhost_
> _SparkSession available as 'spark'._
> _>>> spark.range(100).filter("id > 3").collect()_
> _Traceback (most recent call last):_
>   _File "", line 1, in _
>   _File 
> "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/dataframe.py",
>  line 1645, in collect_
>     _table, schema = self._session.client.to_table(query)_
>   _File 
> "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py",
>  line 858, in to_table_
>     _table, schema, _, _, _ = self._execute_and_fetch(req)_
>   _File 
> "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py",
>  line 1282, in _execute_and_fetch_
>     _for response in self._execute_and_fetch_as_iterator(req):_
>   _File 
> "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py",
>  line 1263, in _execute_and_fetch_as_iterator_
>     _self._handle_error(error)_
>   _File 
> "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py",
>  line 1502, in _handle_error_
>     _self._handle_rpc_error(error)_
>   _File 
> "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py",
>  line 1538, in _handle_rpc_error_
>     _raise convert_exception(info, status.message) from None_
> _pyspark.errors.exceptions.connect.SparkConnectGrpcException: 
> (org.apache.spark.SparkException) Job aborted due to stage failure: Task 0 in 
> stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 
> (TID 35) (10.19.183.93 executor 0): java.lang.ClassCastException: cannot 
> assign instance of java.lang.invoke.SerializedLambda to field 
> org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance 
> of org.apache.spark.rdd.MapPartitionsRDD_
> _at 
> java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2301)_
> _at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1431)_
> _at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2437)_
> _at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)_
> _at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)_
> _at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)_
> _at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)_
> _at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)_
> _at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)_

[jira] [Updated] (SPARK-46032) connect: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f

2023-11-21 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-46032:
-
Priority: Major  (was: Blocker)

> connect: cannot assign instance of java.lang.invoke.SerializedLambda to field 
> org.apache.spark.rdd.MapPartitionsRDD.f
> -
>
> Key: SPARK-46032
> URL: https://issues.apache.org/jira/browse/SPARK-46032
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Bobby Wang
>Priority: Major
>
> I downloaded spark 3.5 from the spark official website, and then I started a 
> Spark Standalone cluster in which both master and the only worker are in the 
> same node. 
>  
> Then I started the connect server by 
> {code:java}
> start-connect-server.sh \
>     --master spark://10.19.183.93:7077 \
>     --packages org.apache.spark:spark-connect_2.12:3.5.0 \
>     --conf spark.executor.cores=12 \
>     --conf spark.task.cpus=1 \
>     --executor-memory 30G \
>     --conf spark.executor.resource.gpu.amount=1 \
>     --conf spark.task.resource.gpu.amount=0.08 \
>     --driver-memory 1G{code}
>  
> I can 100% ensure the spark standalone cluster, the connect server and spark 
> driver are started observed from the webui.
>  
> Finally, I tried to run a very simple spark job 
> (spark.range(100).filter("id>2").collect()) from spark-connect-client using 
> pyspark, but I got the below error.
>  
> _pyspark --remote sc://localhost_
> _Python 3.10.0 (default, Mar  3 2022, 09:58:08) [GCC 7.5.0] on linux_
> _Type "help", "copyright", "credits" or "license" for more information._
> _Welcome to_
>       _              ___
>      _/ __/_  {{_}}{_}__ ___{_}{{_}}/ /{{_}}{_}_
>     {_}{{_}}\ \/ _ \/ _ `/ {_}{{_}}/  '{_}/{_}
>    {_}/{_}_ / .{_}{{_}}/{_},{_}/{_}/ /{_}/{_}\   version 3.5.0{_}
>       {_}/{_}/_
>  
> _Using Python version 3.10.0 (default, Mar  3 2022 09:58:08)_
> _Client connected to the Spark Connect server at localhost_
> _SparkSession available as 'spark'._
> _>>> spark.range(100).filter("id > 3").collect()_
> _Traceback (most recent call last):_
>   _File "", line 1, in _
>   _File 
> "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/dataframe.py",
>  line 1645, in collect_
>     _table, schema = self._session.client.to_table(query)_
>   _File 
> "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py",
>  line 858, in to_table_
>     _table, schema, _, _, _ = self._execute_and_fetch(req)_
>   _File 
> "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py",
>  line 1282, in _execute_and_fetch_
>     _for response in self._execute_and_fetch_as_iterator(req):_
>   _File 
> "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py",
>  line 1263, in _execute_and_fetch_as_iterator_
>     _self._handle_error(error)_
>   _File 
> "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py",
>  line 1502, in _handle_error_
>     _self._handle_rpc_error(error)_
>   _File 
> "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py",
>  line 1538, in _handle_rpc_error_
>     _raise convert_exception(info, status.message) from None_
> _pyspark.errors.exceptions.connect.SparkConnectGrpcException: 
> (org.apache.spark.SparkException) Job aborted due to stage failure: Task 0 in 
> stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 
> (TID 35) (10.19.183.93 executor 0): java.lang.ClassCastException: cannot 
> assign instance of java.lang.invoke.SerializedLambda to field 
> org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance 
> of org.apache.spark.rdd.MapPartitionsRDD_
> _at 
> java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2301)_
> _at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1431)_
> _at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2437)_
> _at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)_
> _at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)_
> _at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)_
> _at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)_
> _at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)_
> _at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)_
> _at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)_
> _at java.io.ObjectInputStream.readObject(ObjectInputStream.java:503)_
> _at 

[jira] [Assigned] (SPARK-46023) Annotate parameters at docstrings in pyspark.sql module

2023-11-21 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-46023:


Assignee: Hyukjin Kwon

> Annotate parameters at docstrings in pyspark.sql module
> ---
>
> Key: SPARK-46023
> URL: https://issues.apache.org/jira/browse/SPARK-46023
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> See PR



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46023) Annotate parameters at docstrings in pyspark.sql module

2023-11-21 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46023.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43925
[https://github.com/apache/spark/pull/43925]

> Annotate parameters at docstrings in pyspark.sql module
> ---
>
> Key: SPARK-46023
> URL: https://issues.apache.org/jira/browse/SPARK-46023
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> See PR



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46026) Refine docstring of UDTF

2023-11-21 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-46026:


Assignee: Hyukjin Kwon

> Refine docstring of UDTF
> 
>
> Key: SPARK-46026
> URL: https://issues.apache.org/jira/browse/SPARK-46026
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46026) Refine docstring of UDTF

2023-11-21 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46026.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43928
[https://github.com/apache/spark/pull/43928]

> Refine docstring of UDTF
> 
>
> Key: SPARK-46026
> URL: https://issues.apache.org/jira/browse/SPARK-46026
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46024) Document parameters and examples for RuntimeConf get, set and unset

2023-11-21 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-46024:


Assignee: Hyukjin Kwon

> Document parameters and examples for RuntimeConf get, set and unset
> ---
>
> Key: SPARK-46024
> URL: https://issues.apache.org/jira/browse/SPARK-46024
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46024) Document parameters and examples for RuntimeConf get, set and unset

2023-11-21 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46024.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43927
[https://github.com/apache/spark/pull/43927]

> Document parameters and examples for RuntimeConf get, set and unset
> ---
>
> Key: SPARK-46024
> URL: https://issues.apache.org/jira/browse/SPARK-46024
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46027) Add `Python 3.12` to the Daily Python Github Action job

2023-11-21 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46027.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43929
[https://github.com/apache/spark/pull/43929]

> Add `Python 3.12` to the Daily Python Github Action job
> ---
>
> Key: SPARK-46027
> URL: https://issues.apache.org/jira/browse/SPARK-46027
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46027) Add `Python 3.12` to the Daily Python Github Action job

2023-11-20 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-46027:


 Summary: Add `Python 3.12` to the Daily Python Github Action job
 Key: SPARK-46027
 URL: https://issues.apache.org/jira/browse/SPARK-46027
 Project: Spark
  Issue Type: Sub-task
  Components: Project Infra, PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46004) Refine docstring of `DataFrame.dropna/fillna/replace`

2023-11-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-46004:


Assignee: BingKun Pan

> Refine docstring of `DataFrame.dropna/fillna/replace`
> -
>
> Key: SPARK-46004
> URL: https://issues.apache.org/jira/browse/SPARK-46004
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46004) Refine docstring of `DataFrame.dropna/fillna/replace`

2023-11-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46004.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43907
[https://github.com/apache/spark/pull/43907]

> Refine docstring of `DataFrame.dropna/fillna/replace`
> -
>
> Key: SPARK-46004
> URL: https://issues.apache.org/jira/browse/SPARK-46004
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46026) Refine docstring of UDTF

2023-11-20 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-46026:


 Summary: Refine docstring of UDTF
 Key: SPARK-46026
 URL: https://issues.apache.org/jira/browse/SPARK-46026
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Deleted] (SPARK-46025) Support Python 3.12 in PySpark

2023-11-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon deleted SPARK-46025:
-


> Support Python 3.12 in PySpark
> --
>
> Key: SPARK-46025
> URL: https://issues.apache.org/jira/browse/SPARK-46025
> Project: Spark
>  Issue Type: Improvement
>Reporter: Hyukjin Kwon
>Priority: Major
>
> Python 3.12 is released out. We should make sure the tests pass, and mark it 
> supported in setup.py



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46025) Support Python 3.12 in PySpark

2023-11-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-46025:
-
Issue Type: Improvement  (was: Bug)

> Support Python 3.12 in PySpark
> --
>
> Key: SPARK-46025
> URL: https://issues.apache.org/jira/browse/SPARK-46025
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> Python 3.12 is released out. We should make sure the tests pass, and mark it 
> supported in setup.py



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46025) Support Python 3.12 in PySpark

2023-11-20 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-46025:


 Summary: Support Python 3.12 in PySpark
 Key: SPARK-46025
 URL: https://issues.apache.org/jira/browse/SPARK-46025
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


Python 3.12 is released out. We should make sure the tests pass, and mark it 
supported in setup.py



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46024) Document parameters and examples for RuntimeConf get, set and unset

2023-11-20 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-46024:


 Summary: Document parameters and examples for RuntimeConf get, set 
and unset
 Key: SPARK-46024
 URL: https://issues.apache.org/jira/browse/SPARK-46024
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46023) Annotate parameters at docstrings in pyspark.sql module

2023-11-20 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-46023:


 Summary: Annotate parameters at docstrings in pyspark.sql module
 Key: SPARK-46023
 URL: https://issues.apache.org/jira/browse/SPARK-46023
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


See PR



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46015) Fix broken link for Koalas issues

2023-11-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-46015:


Assignee: Haejoon Lee

> Fix broken link for Koalas issues
> -
>
> Key: SPARK-46015
> URL: https://issues.apache.org/jira/browse/SPARK-46015
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation, PS
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
>
> There is a link broken for Koalas old repo. We should address it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46015) Fix broken link for Koalas issues

2023-11-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46015.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43918
[https://github.com/apache/spark/pull/43918]

> Fix broken link for Koalas issues
> -
>
> Key: SPARK-46015
> URL: https://issues.apache.org/jira/browse/SPARK-46015
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation, PS
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> There is a link broken for Koalas old repo. We should address it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44867) Refactor Spark Connect Docs to incorporate Scala setup

2023-11-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-44867.
--
Fix Version/s: 3.5.0
 Assignee: Venkata Sai Akhil Gudesa
   Resolution: Fixed

Fixed in https://github.com/apache/spark/pull/42556

> Refactor Spark Connect Docs to incorporate Scala setup
> --
>
> Key: SPARK-44867
> URL: https://issues.apache.org/jira/browse/SPARK-44867
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Venkata Sai Akhil Gudesa
>Assignee: Venkata Sai Akhil Gudesa
>Priority: Major
> Fix For: 3.5.0
>
>
> The current Spark Connect 
> [overview|https://spark.apache.org/docs/latest/spark-connect-overview.html] 
> does not include instructions to setup the Scala REPL as well using the Scala 
> client in applications.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45929) support grouping set operation in dataframe api

2023-11-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-45929.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43813
[https://github.com/apache/spark/pull/43813]

> support grouping set operation in dataframe api
> ---
>
> Key: SPARK-45929
> URL: https://issues.apache.org/jira/browse/SPARK-45929
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.1
>Reporter: JacobZheng
>Assignee: JacobZheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> I am using spark dataframe api for complex calculations. When I need to use 
> the grouping sets function, I can only convert the expression to sql via 
> analyzedPlan and then splice these sql into a complex sql to execute. In some 
> cases, this operation generates an extremely complex sql. executing this 
> complex sql, antlr4 continues to consume a large amount of memory, similar to 
> a memory leak scenario. If you can and rollup, cube function through the 
> dataframe api to calculate these operations will be much simpler.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45929) support grouping set operation in dataframe api

2023-11-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-45929:


Assignee: JacobZheng

> support grouping set operation in dataframe api
> ---
>
> Key: SPARK-45929
> URL: https://issues.apache.org/jira/browse/SPARK-45929
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.1
>Reporter: JacobZheng
>Assignee: JacobZheng
>Priority: Major
>  Labels: pull-request-available
>
> I am using spark dataframe api for complex calculations. When I need to use 
> the grouping sets function, I can only convert the expression to sql via 
> analyzedPlan and then splice these sql into a complex sql to execute. In some 
> cases, this operation generates an extremely complex sql. executing this 
> complex sql, antlr4 continues to consume a large amount of memory, similar to 
> a memory leak scenario. If you can and rollup, cube function through the 
> dataframe api to calculate these operations will be much simpler.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45856) Move ArtifactManager from Spark Connect into SparkSession (sql/core)

2023-11-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-45856.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43735
[https://github.com/apache/spark/pull/43735]

> Move ArtifactManager from Spark Connect into SparkSession (sql/core)
> 
>
> Key: SPARK-45856
> URL: https://issues.apache.org/jira/browse/SPARK-45856
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, SQL
>Affects Versions: 4.0.0
>Reporter: Venkata Sai Akhil Gudesa
>Assignee: Venkata Sai Akhil Gudesa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> The `ArtifactManager` that currently lies in the connect package can be moved 
> into the wider sql/core package (e.g SparkSession) to expand the scope. This 
> is possible because the `ArtifactManager` is tied solely to the 
> `SparkSession#sessionUUID` and hence can be cleanly detached from Spark 
> Connect and be made generally available.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45996) Show proper dependency requirement messages for Spark Connect

2023-11-19 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-45996:


Assignee: Hyukjin Kwon

> Show proper dependency requirement messages for Spark Connect
> -
>
> Key: SPARK-45996
> URL: https://issues.apache.org/jira/browse/SPARK-45996
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> {code}
> ./bin/pyspark --remote local
> {code}
> We should improve the error messages below.
> {code}
> /.../pyspark/shell.py:57: UserWarning: Failed to initialize Spark session.
>   warnings.warn("Failed to initialize Spark session.")
> Traceback (most recent call last):
>   File "/.../pyspark/shell.py", line 52, in 
> spark = SparkSession.builder.getOrCreate()
>   File "/.../pyspark/sql/session.py", line 476, in getOrCreate
> from pyspark.sql.connect.session import SparkSession as RemoteSparkSession
>   File "/.../pyspark/sql/connect/session.py", line 53, in 
> from pyspark.sql.connect.client import SparkConnectClient, ChannelBuilder
>   File "/.../pyspark/sql/connect/client/__init__.py", line 22, in 
> from pyspark.sql.connect.client.core import *  # noqa: F401,F403
>   File "/.../pyspark/sql/connect/client/core.py", line 51, in 
> import google.protobuf.message
> ModuleNotFoundError: No module named 'google
> {code}
> {code}
> /.../pyspark/shell.py:57: UserWarning: Failed to initialize Spark session.
>   warnings.warn("Failed to initialize Spark session.")
> Traceback (most recent call last):
>   File "/.../pyspark/shell.py", line 52, in 
> spark = SparkSession.builder.getOrCreate()
>   File "/.../pyspark/sql/session.py", line 476, in getOrCreate
> from pyspark.sql.connect.session import SparkSession as RemoteSparkSession
>   File "/.../pyspark/sql/connect/session.py", line 53, in 
> from pyspark.sql.connect.client import SparkConnectClient, ChannelBuilder
>   File "/.../pyspark/sql/connect/client/__init__.py", line 22, in 
> from pyspark.sql.connect.client.core import *  # noqa: F401,F403
>   File "/.../pyspark/sql/connect/client/core.py", line 52, in 
> from grpc_status import rpc_status
> ModuleNotFoundError: No module named 'grpc_status'
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45996) Show proper dependency requirement messages for Spark Connect

2023-11-19 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-45996.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43894
[https://github.com/apache/spark/pull/43894]

> Show proper dependency requirement messages for Spark Connect
> -
>
> Key: SPARK-45996
> URL: https://issues.apache.org/jira/browse/SPARK-45996
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> {code}
> ./bin/pyspark --remote local
> {code}
> We should improve the error messages below.
> {code}
> /.../pyspark/shell.py:57: UserWarning: Failed to initialize Spark session.
>   warnings.warn("Failed to initialize Spark session.")
> Traceback (most recent call last):
>   File "/.../pyspark/shell.py", line 52, in 
> spark = SparkSession.builder.getOrCreate()
>   File "/.../pyspark/sql/session.py", line 476, in getOrCreate
> from pyspark.sql.connect.session import SparkSession as RemoteSparkSession
>   File "/.../pyspark/sql/connect/session.py", line 53, in 
> from pyspark.sql.connect.client import SparkConnectClient, ChannelBuilder
>   File "/.../pyspark/sql/connect/client/__init__.py", line 22, in 
> from pyspark.sql.connect.client.core import *  # noqa: F401,F403
>   File "/.../pyspark/sql/connect/client/core.py", line 51, in 
> import google.protobuf.message
> ModuleNotFoundError: No module named 'google
> {code}
> {code}
> /.../pyspark/shell.py:57: UserWarning: Failed to initialize Spark session.
>   warnings.warn("Failed to initialize Spark session.")
> Traceback (most recent call last):
>   File "/.../pyspark/shell.py", line 52, in 
> spark = SparkSession.builder.getOrCreate()
>   File "/.../pyspark/sql/session.py", line 476, in getOrCreate
> from pyspark.sql.connect.session import SparkSession as RemoteSparkSession
>   File "/.../pyspark/sql/connect/session.py", line 53, in 
> from pyspark.sql.connect.client import SparkConnectClient, ChannelBuilder
>   File "/.../pyspark/sql/connect/client/__init__.py", line 22, in 
> from pyspark.sql.connect.client.core import *  # noqa: F401,F403
>   File "/.../pyspark/sql/connect/client/core.py", line 52, in 
> from grpc_status import rpc_status
> ModuleNotFoundError: No module named 'grpc_status'
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45942) Only do the thread interruption check for putIterator on executors

2023-11-19 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-45942:


Assignee: Huanli Wang

> Only do the thread interruption check for putIterator on executors
> --
>
> Key: SPARK-45942
> URL: https://issues.apache.org/jira/browse/SPARK-45942
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Huanli Wang
>Assignee: Huanli Wang
>Priority: Major
>  Labels: pull-request-available
>
> https://issues.apache.org/jira/browse/SPARK-45025 
> introduces a peaceful thread interruption handling. However, there is an edge 
> case: when a streaming query is stopped on the driver, it interrupts the 
> stream execution thread. If the streaming query is doing memory store 
> operations on driver and performs {{doPutIterator}} at the same time, the 
> [unroll process will be 
> broken|https://github.com/apache/spark/blob/39fc6108bfaaa0ce471f6460880109f948ba5c62/core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala#L224]
>  and [returns used 
> memory|https://github.com/apache/spark/blob/39fc6108bfaaa0ce471f6460880109f948ba5c62/core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala#L245-L247].
> This can result in {{closeChannelException}} as it falls into this [case 
> clause|https://github.com/apache/spark/blob/aa646d3050028272f7333deaef52f20e6975e0ed/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L1614-L1622]
>  which opens an I/O channel and persists the data into the disk. However, 
> because the thread is interrupted, the channel will be closed at the begin: 
> [https://github.com/openjdk-mirror/jdk7u-jdk/blob/master/src/share/classes/java/nio/channels/spi/AbstractInterruptibleChannel.java#L172]
>  and throws out {{closeChannelException}}
> On executors, [the task will be killed if the thread is 
> interrupted|https://github.com/apache/spark/blob/39fc6108bfaaa0ce471f6460880109f948ba5c62/core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala#L374],
>  however, we don't do it on the driver.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45942) Only do the thread interruption check for putIterator on executors

2023-11-19 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-45942.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43823
[https://github.com/apache/spark/pull/43823]

> Only do the thread interruption check for putIterator on executors
> --
>
> Key: SPARK-45942
> URL: https://issues.apache.org/jira/browse/SPARK-45942
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Huanli Wang
>Assignee: Huanli Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> https://issues.apache.org/jira/browse/SPARK-45025 
> introduces a peaceful thread interruption handling. However, there is an edge 
> case: when a streaming query is stopped on the driver, it interrupts the 
> stream execution thread. If the streaming query is doing memory store 
> operations on driver and performs {{doPutIterator}} at the same time, the 
> [unroll process will be 
> broken|https://github.com/apache/spark/blob/39fc6108bfaaa0ce471f6460880109f948ba5c62/core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala#L224]
>  and [returns used 
> memory|https://github.com/apache/spark/blob/39fc6108bfaaa0ce471f6460880109f948ba5c62/core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala#L245-L247].
> This can result in {{closeChannelException}} as it falls into this [case 
> clause|https://github.com/apache/spark/blob/aa646d3050028272f7333deaef52f20e6975e0ed/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L1614-L1622]
>  which opens an I/O channel and persists the data into the disk. However, 
> because the thread is interrupted, the channel will be closed at the begin: 
> [https://github.com/openjdk-mirror/jdk7u-jdk/blob/master/src/share/classes/java/nio/channels/spi/AbstractInterruptibleChannel.java#L172]
>  and throws out {{closeChannelException}}
> On executors, [the task will be killed if the thread is 
> interrupted|https://github.com/apache/spark/blob/39fc6108bfaaa0ce471f6460880109f948ba5c62/core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala#L374],
>  however, we don't do it on the driver.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45986) Fix `pyspark.ml.torch.tests.test_distributor` in Python 3.11

2023-11-19 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-45986:


Assignee: Hyukjin Kwon  (was: Dongjoon Hyun)

> Fix `pyspark.ml.torch.tests.test_distributor` in Python 3.11
> 
>
> Key: SPARK-45986
> URL: https://issues.apache.org/jira/browse/SPARK-45986
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> https://github.com/apache/spark/actions/runs/6914662405/job/18812759511
> {code}
> ==
> FAIL [0.000s]: test_local_training_succeeds 
> (pyspark.ml.torch.tests.test_distributor.TorchDistributorLocalUnitTests.test_local_training_succeeds)
>  [subtest: 1]
> --
> Traceback (most recent call last):
>   File "/__w/spark/spark/python/pyspark/ml/torch/tests/test_distributor.py", 
> line 384, in test_local_training_succeeds
> self.assertEqual(
> AssertionError: '1' != '0'
> - 1
> + 0
> ==
> FAIL [0.142s]: test_local_training_succeeds 
> (pyspark.ml.torch.tests.test_distributor.TorchDistributorLocalUnitTests.test_local_training_succeeds)
>  [subtest: 2]
> --
> Traceback (most recent call last):
>   File "/__w/spark/spark/python/pyspark/ml/torch/tests/test_distributor.py", 
> line 384, in test_local_training_succeeds
> self.assertEqual(
> AssertionError: '1,2,0' != '0,1,2'
> - 1,2,0
> + 0,1,2
> ==
> FAIL [0.000s]: test_local_training_succeeds 
> (pyspark.ml.torch.tests.test_distributor.TorchDistributorLocalUnitTestsII.test_local_training_succeeds)
>  [subtest: 1]
> --
> Traceback (most recent call last):
>   File "/__w/spark/spark/python/pyspark/ml/torch/tests/test_distributor.py", 
> line 384, in test_local_training_succeeds
> self.assertEqual(
> AssertionError: '1' != '0'
> - 1
> + 0
> ==
> FAIL [0.139s]: test_local_training_succeeds 
> (pyspark.ml.torch.tests.test_distributor.TorchDistributorLocalUnitTestsII.test_local_training_succeeds)
>  [subtest: 2]
> --
> Traceback (most recent call last):
>   File "/__w/spark/spark/python/pyspark/ml/torch/tests/test_distributor.py", 
> line 384, in test_local_training_succeeds
> self.assertEqual(
> AssertionError: '1,2,0' != '0,1,2'
> - 1,2,0
> + 0,1,2
> --
> Ran 23 tests in 166.741s
> FAILED (failures=4)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45996) Show proper dependency requirement messages for Spark Connect

2023-11-19 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-45996:


 Summary: Show proper dependency requirement messages for Spark 
Connect
 Key: SPARK-45996
 URL: https://issues.apache.org/jira/browse/SPARK-45996
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon



{code}
./bin/pyspark --remote local
{code}

We should improve the error messages below.

{code}
/.../pyspark/shell.py:57: UserWarning: Failed to initialize Spark session.
  warnings.warn("Failed to initialize Spark session.")
Traceback (most recent call last):
  File "/.../pyspark/shell.py", line 52, in 
spark = SparkSession.builder.getOrCreate()
  File "/.../pyspark/sql/session.py", line 476, in getOrCreate
from pyspark.sql.connect.session import SparkSession as RemoteSparkSession
  File "/.../pyspark/sql/connect/session.py", line 53, in 
from pyspark.sql.connect.client import SparkConnectClient, ChannelBuilder
  File "/.../pyspark/sql/connect/client/__init__.py", line 22, in 
from pyspark.sql.connect.client.core import *  # noqa: F401,F403
  File "/.../pyspark/sql/connect/client/core.py", line 51, in 
import google.protobuf.message
ModuleNotFoundError: No module named 'google
{code}

{code}
/.../pyspark/shell.py:57: UserWarning: Failed to initialize Spark session.
  warnings.warn("Failed to initialize Spark session.")
Traceback (most recent call last):
  File "/.../pyspark/shell.py", line 52, in 
spark = SparkSession.builder.getOrCreate()
  File "/.../pyspark/sql/session.py", line 476, in getOrCreate
from pyspark.sql.connect.session import SparkSession as RemoteSparkSession
  File "/.../pyspark/sql/connect/session.py", line 53, in 
from pyspark.sql.connect.client import SparkConnectClient, ChannelBuilder
  File "/.../pyspark/sql/connect/client/__init__.py", line 22, in 
from pyspark.sql.connect.client.core import *  # noqa: F401,F403
  File "/.../pyspark/sql/connect/client/core.py", line 52, in 
from grpc_status import rpc_status
ModuleNotFoundError: No module named 'grpc_status'
{code}






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45994) Change description-file to description_file

2023-11-19 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-45994:


Assignee: Bjørn Jørgensen

> Change description-file to description_file
> ---
>
> Key: SPARK-45994
> URL: https://issues.apache.org/jira/browse/SPARK-45994
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Bjørn Jørgensen
>Assignee: Bjørn Jørgensen
>Priority: Major
>  Labels: pull-request-available
>
> + cp -r /home/bjorn/spark/data /home/bjorn/spark/dist
> + '[' true == true ']'
> + echo 'Building python distribution package'
> Building python distribution package
> + pushd /home/bjorn/spark/python
> + rm -rf pyspark.egg-info
> + python3 setup.py sdist
> /usr/lib/python3.11/site-packages/setuptools/dist.py:745: 
> SetuptoolsDeprecationWarning: Invalid dash-separated options
> !!
> 
> 
> Usage of dash-separated 'description-file' will not be supported in 
> future
> versions. Please use the underscore name 'description_file' instead.
> This deprecation is overdue, please update your project and remove 
> deprecated
> calls to avoid build errors in the future.
> See 
> https://setuptools.pypa.io/en/latest/userguide/declarative_config.html for 
> details.
> 
> 
> !!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45994) Change description-file to description_file

2023-11-19 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-45994.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43891
[https://github.com/apache/spark/pull/43891]

> Change description-file to description_file
> ---
>
> Key: SPARK-45994
> URL: https://issues.apache.org/jira/browse/SPARK-45994
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Bjørn Jørgensen
>Assignee: Bjørn Jørgensen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> + cp -r /home/bjorn/spark/data /home/bjorn/spark/dist
> + '[' true == true ']'
> + echo 'Building python distribution package'
> Building python distribution package
> + pushd /home/bjorn/spark/python
> + rm -rf pyspark.egg-info
> + python3 setup.py sdist
> /usr/lib/python3.11/site-packages/setuptools/dist.py:745: 
> SetuptoolsDeprecationWarning: Invalid dash-separated options
> !!
> 
> 
> Usage of dash-separated 'description-file' will not be supported in 
> future
> versions. Please use the underscore name 'description_file' instead.
> This deprecation is overdue, please update your project and remove 
> deprecated
> calls to avoid build errors in the future.
> See 
> https://setuptools.pypa.io/en/latest/userguide/declarative_config.html for 
> details.
> 
> 
> !!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45995) Upgrade R version from 4.3.1 to 4.3.2 in AppVeyor

2023-11-19 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-45995:


 Summary: Upgrade R version from 4.3.1 to 4.3.2 in AppVeyor
 Key: SPARK-45995
 URL: https://issues.apache.org/jira/browse/SPARK-45995
 Project: Spark
  Issue Type: Improvement
  Components: R
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon



https://cran.r-project.org/doc/manuals/r-release/NEWS.html



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45988) Fix `pyspark.pandas.tests.computation.test_apply_func` in Python 3.11

2023-11-19 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-45988.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Fixed in https://github.com/apache/spark/pull/43888

> Fix `pyspark.pandas.tests.computation.test_apply_func` in Python 3.11
> -
>
> Key: SPARK-45988
> URL: https://issues.apache.org/jira/browse/SPARK-45988
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.5.0, 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> https://github.com/apache/spark/actions/runs/6914662405/job/18812759697
> {code}
> ==
> ERROR [0.686s]: test_apply_batch_with_type 
> (pyspark.pandas.tests.computation.test_apply_func.FrameApplyFunctionTests.test_apply_batch_with_type)
> --
> Traceback (most recent call last):
>   File 
> "/__w/spark/spark/python/pyspark/pandas/tests/computation/test_apply_func.py",
>  line 248, in test_apply_batch_with_type
> def identify3(x) -> ps.DataFrame[float, [int, List[int]]]:
> ^
>   File "/__w/spark/spark/python/pyspark/pandas/frame.py", line 13540, in 
> __class_getitem__
> return create_tuple_for_frame_type(params)
>^^^
>   File "/__w/spark/spark/python/pyspark/pandas/typedef/typehints.py", line 
> 721, in create_tuple_for_frame_type
> return Tuple[_to_type_holders(params)]
>  
>   File "/__w/spark/spark/python/pyspark/pandas/typedef/typehints.py", line 
> 766, in _to_type_holders
> data_types = _new_type_holders(data_types, NameTypeHolder)
>  ^
>   File "/__w/spark/spark/python/pyspark/pandas/typedef/typehints.py", line 
> 832, in _new_type_holders
> raise TypeError(
> TypeError: Type hints should be specified as one of:
>   - DataFrame[type, type, ...]
>   - DataFrame[name: type, name: type, ...]
>   - DataFrame[dtypes instance]
>   - DataFrame[zip(names, types)]
>   - DataFrame[index_type, [type, ...]]
>   - DataFrame[(index_name, index_type), [(name, type), ...]]
>   - DataFrame[dtype instance, dtypes instance]
>   - DataFrame[(index_name, index_type), zip(names, types)]
>   - DataFrame[[index_type, ...], [type, ...]]
>   - DataFrame[[(index_name, index_type), ...], [(name, type), ...]]
>   - DataFrame[dtypes instance, dtypes instance]
>   - DataFrame[zip(index_names, index_types), zip(names, types)]
> However, got (, typing.List[int]).
> --
> Ran 10 tests in 34.327s
> FAILED (errors=1)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45989) Fix `pyspark.pandas.tests.connect.computation.test_parity_apply_func` in Python 3.11

2023-11-19 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-45989.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Fixed in https://github.com/apache/spark/pull/43888

> Fix `pyspark.pandas.tests.connect.computation.test_parity_apply_func` in 
> Python 3.11
> 
>
> Key: SPARK-45989
> URL: https://issues.apache.org/jira/browse/SPARK-45989
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 4.0.0
>
>
> https://github.com/apache/spark/actions/runs/6914662405/job/18816505612
> {code}
> ==
> ERROR [1.237s]: test_apply_batch_with_type 
> (pyspark.pandas.tests.connect.computation.test_parity_apply_func.FrameParityApplyFunctionTests.test_apply_batch_with_type)
> --
> Traceback (most recent call last):
>   File 
> "/__w/spark/spark/python/pyspark/pandas/tests/computation/test_apply_func.py",
>  line 248, in test_apply_batch_with_type
> def identify3(x) -> ps.DataFrame[float, [int, List[int]]]:
> ^
>   File "/__w/spark/spark/python/pyspark/pandas/frame.py", line 13540, in 
> __class_getitem__
> return create_tuple_for_frame_type(params)
>^^^
>   File "/__w/spark/spark/python/pyspark/pandas/typedef/typehints.py", line 
> 721, in create_tuple_for_frame_type
> return Tuple[_to_type_holders(params)]
>  
>   File "/__w/spark/spark/python/pyspark/pandas/typedef/typehints.py", line 
> 766, in _to_type_holders
> data_types = _new_type_holders(data_types, NameTypeHolder)
>  ^
>   File "/__w/spark/spark/python/pyspark/pandas/typedef/typehints.py", line 
> 832, in _new_type_holders
> raise TypeError(
> TypeError: Type hints should be specified as one of:
>   - DataFrame[type, type, ...]
>   - DataFrame[name: type, name: type, ...]
>   - DataFrame[dtypes instance]
>   - DataFrame[zip(names, types)]
>   - DataFrame[index_type, [type, ...]]
>   - DataFrame[(index_name, index_type), [(name, type), ...]]
>   - DataFrame[dtype instance, dtypes instance]
>   - DataFrame[(index_name, index_type), zip(names, types)]
>   - DataFrame[[index_type, ...], [type, ...]]
>   - DataFrame[[(index_name, index_type), ...], [(name, type), ...]]
>   - DataFrame[dtypes instance, dtypes instance]
>   - DataFrame[zip(index_names, index_types), zip(names, types)]
> However, got (, typing.List[int]).
> --
> Ran 10 tests in 78.247s
> FAILED (errors=1)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45965) Move DSv2 partitioning expressions into functions.partitioning

2023-11-18 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-45965.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43858
[https://github.com/apache/spark/pull/43858]

> Move DSv2 partitioning expressions into functions.partitioning
> --
>
> Key: SPARK-45965
> URL: https://issues.apache.org/jira/browse/SPARK-45965
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> We weren't able to move those partitioning expressions into nested object 
> because of Scala 2.12 limitation. Now we're able to do it with Scala 2.13



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45985) Refine docstring of `DataFrame.intersect`

2023-11-17 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-45985:


 Summary: Refine docstring of `DataFrame.intersect`
 Key: SPARK-45985
 URL: https://issues.apache.org/jira/browse/SPARK-45985
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45984) Refine docstring of `DataFrame.intersectAll`

2023-11-17 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-45984:


 Summary: Refine docstring of `DataFrame.intersectAll`
 Key: SPARK-45984
 URL: https://issues.apache.org/jira/browse/SPARK-45984
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45983) Refine docstring of `DataFrame.substract`

2023-11-17 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-45983:


 Summary: Refine docstring of `DataFrame.substract`
 Key: SPARK-45983
 URL: https://issues.apache.org/jira/browse/SPARK-45983
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45970) Provide partitioning expressions in Java as same as Scala

2023-11-16 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-45970:


 Summary: Provide partitioning expressions in Java as same as Scala
 Key: SPARK-45970
 URL: https://issues.apache.org/jira/browse/SPARK-45970
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


See https://github.com/apache/spark/pull/43858.

Once Scala 3 is out, we can support the same way of partitioning expressions 
such as:

{code}
import static org.apache.spark.sql.functions.partitioning.*;
{code}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45952) Use built-in math constant in math functions

2023-11-16 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-45952:


Assignee: Ruifeng Zheng

> Use built-in math constant in math functions 
> -
>
> Key: SPARK-45952
> URL: https://issues.apache.org/jira/browse/SPARK-45952
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45952) Use built-in math constant in math functions

2023-11-16 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-45952.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43837
[https://github.com/apache/spark/pull/43837]

> Use built-in math constant in math functions 
> -
>
> Key: SPARK-45952
> URL: https://issues.apache.org/jira/browse/SPARK-45952
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45912) Enhancement of XSDToSchema API: Change to HDFS API for cloud storage accessibility

2023-11-16 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-45912:


Assignee: Shujing Yang

> Enhancement  of XSDToSchema API: Change to HDFS API for cloud storage 
> accessibility
> ---
>
> Key: SPARK-45912
> URL: https://issues.apache.org/jira/browse/SPARK-45912
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Shujing Yang
>Assignee: Shujing Yang
>Priority: Major
>  Labels: pull-request-available
>
> Previously, it utilized `java.nio.path`, which limited file reading to local 
> file systems only. By changing this to an HDFS-compatible API, we now enable 
> the XSDToSchema function to access files in cloud storage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45912) Enhancement of XSDToSchema API: Change to HDFS API for cloud storage accessibility

2023-11-16 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-45912.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43789
[https://github.com/apache/spark/pull/43789]

> Enhancement  of XSDToSchema API: Change to HDFS API for cloud storage 
> accessibility
> ---
>
> Key: SPARK-45912
> URL: https://issues.apache.org/jira/browse/SPARK-45912
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Shujing Yang
>Assignee: Shujing Yang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Previously, it utilized `java.nio.path`, which limited file reading to local 
> file systems only. By changing this to an HDFS-compatible API, we now enable 
> the XSDToSchema function to access files in cloud storage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45964) Remove private[sql] in XML and JSON package under catalyst package

2023-11-16 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-45964:


 Summary: Remove private[sql] in XML and JSON package under 
catalyst package
 Key: SPARK-45964
 URL: https://issues.apache.org/jira/browse/SPARK-45964
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


catalyst is intenral, so we don't need to annotate them as private[sql]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45963) Restore documentation for DSv2 API

2023-11-16 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-45963:


 Summary: Restore documentation for DSv2 API
 Key: SPARK-45963
 URL: https://issues.apache.org/jira/browse/SPARK-45963
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.5.0, 3.4.1, 4.0.0
Reporter: Hyukjin Kwon


DSv2 documentation is mistakenly gone after 
https://github.com/apache/spark/pull/38392. It used to exist in 3.3.0: 
https://spark.apache.org/docs/3.3.0/api/scala/org/apache/spark/sql/connector/catalog/index.html



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45950) Fix IvyTestUtils#createIvyDescriptor function and make common-utils module can run tests on GitHub Action

2023-11-16 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-45950:


Assignee: Yang Jie

> Fix IvyTestUtils#createIvyDescriptor function and make common-utils module 
> can run tests on GitHub Action
> -
>
> Key: SPARK-45950
> URL: https://issues.apache.org/jira/browse/SPARK-45950
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra, Spark Core
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45950) Fix IvyTestUtils#createIvyDescriptor function and make common-utils module can run tests on GitHub Action

2023-11-16 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-45950.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43834
[https://github.com/apache/spark/pull/43834]

> Fix IvyTestUtils#createIvyDescriptor function and make common-utils module 
> can run tests on GitHub Action
> -
>
> Key: SPARK-45950
> URL: https://issues.apache.org/jira/browse/SPARK-45950
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra, Spark Core
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45960) Add Python 3.10 to the Daily Python Github Action job

2023-11-16 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-45960.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43847
[https://github.com/apache/spark/pull/43847]

> Add Python 3.10 to the Daily Python Github Action job
> -
>
> Key: SPARK-45960
> URL: https://issues.apache.org/jira/browse/SPARK-45960
> Project: Spark
>  Issue Type: Task
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45851) (Scala) Support different retry policies for connect client

2023-11-16 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-45851.
--
Fix Version/s: 4.0.0
 Assignee: Alice Sayutina
   Resolution: Fixed

Fixed in https://github.com/apache/spark/pull/43757

> (Scala) Support different retry policies for connect client
> ---
>
> Key: SPARK-45851
> URL: https://issues.apache.org/jira/browse/SPARK-45851
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Alice Sayutina
>Assignee: Alice Sayutina
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Support multiple retry policies defined at the same time. Each policy 
> determines which error types it can retry and how exactly.
> For instance, networking errors should generally be retried differently that
> remote resource being available.
> Relevant python ticket: SPARK-45733



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45935) Fix RST files link substitutions error

2023-11-16 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-45935:


Assignee: BingKun Pan

> Fix RST files link substitutions error
> --
>
> Key: SPARK-45935
> URL: https://issues.apache.org/jira/browse/SPARK-45935
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation, PySpark
>Affects Versions: 3.3.3, 3.4.1, 3.5.0, 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45935) Fix RST files link substitutions error

2023-11-16 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-45935.
--
Fix Version/s: 3.3.4
   3.5.1
   4.0.0
   3.4.2
   Resolution: Fixed

Issue resolved by pull request 43815
[https://github.com/apache/spark/pull/43815]

> Fix RST files link substitutions error
> --
>
> Key: SPARK-45935
> URL: https://issues.apache.org/jira/browse/SPARK-45935
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation, PySpark
>Affects Versions: 3.3.3, 3.4.1, 3.5.0, 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.3.4, 3.5.1, 4.0.0, 3.4.2
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45930) Allow non-deterministic Python UDFs in MapInPandas/MapInArrow

2023-11-15 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-45930.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43810
[https://github.com/apache/spark/pull/43810]

> Allow non-deterministic Python UDFs in MapInPandas/MapInArrow
> -
>
> Key: SPARK-45930
> URL: https://issues.apache.org/jira/browse/SPARK-45930
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Currently if a Python udf is non-deterministic, the analyzer will fail with 
> this error:[INVALID_NON_DETERMINISTIC_EXPRESSIONS] The operator expects a 
> deterministic expression, but the actual expression is "pyUDF()", "a". 
> SQLSTATE: 42K0E;



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45930) Allow non-deterministic Python UDFs in MapInPandas/MapInArrow

2023-11-15 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-45930:


Assignee: Allison Wang

> Allow non-deterministic Python UDFs in MapInPandas/MapInArrow
> -
>
> Key: SPARK-45930
> URL: https://issues.apache.org/jira/browse/SPARK-45930
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
>  Labels: pull-request-available
>
> Currently if a Python udf is non-deterministic, the analyzer will fail with 
> this error:[INVALID_NON_DETERMINISTIC_EXPRESSIONS] The operator expects a 
> deterministic expression, but the actual expression is "pyUDF()", "a". 
> SQLSTATE: 42K0E;



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



<    5   6   7   8   9   10   11   12   13   14   >