[jira] [Resolved] (SPARK-46114) Define IndexError for PySpark error framework
[ https://issues.apache.org/jira/browse/SPARK-46114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46114. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44028 [https://github.com/apache/spark/pull/44028] > Define IndexError for PySpark error framework > - > > Key: SPARK-46114 > URL: https://issues.apache.org/jira/browse/SPARK-46114 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46114) Define IndexError for PySpark error framework
Hyukjin Kwon created SPARK-46114: Summary: Define IndexError for PySpark error framework Key: SPARK-46114 URL: https://issues.apache.org/jira/browse/SPARK-46114 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 4.0.0 Reporter: Hyukjin Kwon -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46110) Use error classes in catalog, conf, connect, observation, pandas modules
Hyukjin Kwon created SPARK-46110: Summary: Use error classes in catalog, conf, connect, observation, pandas modules Key: SPARK-46110 URL: https://issues.apache.org/jira/browse/SPARK-46110 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 4.0.0 Reporter: Hyukjin Kwon -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Deleted] (SPARK-46109) Migrate to error classes in PySpark
[ https://issues.apache.org/jira/browse/SPARK-46109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon deleted SPARK-46109: - > Migrate to error classes in PySpark > --- > > Key: SPARK-46109 > URL: https://issues.apache.org/jira/browse/SPARK-46109 > Project: Spark > Issue Type: Umbrella >Reporter: Hyukjin Kwon >Priority: Major > > SPARK-41597 continues here to use error classes in PySpark. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46109) Migrate to error classes in PySpark
[ https://issues.apache.org/jira/browse/SPARK-46109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-46109: - > Migrate to error classes in PySpark > --- > > Key: SPARK-46109 > URL: https://issues.apache.org/jira/browse/SPARK-46109 > Project: Spark > Issue Type: Umbrella > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > > SPARK-41597 continues here to use error classes in PySpark. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46109) Migrate to error classes in PySpark
Hyukjin Kwon created SPARK-46109: Summary: Migrate to error classes in PySpark Key: SPARK-46109 URL: https://issues.apache.org/jira/browse/SPARK-46109 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 4.0.0 Reporter: Hyukjin Kwon SPARK-41597 continues here to use error classes in PySpark. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32933) Use keyword-only syntax for keyword_only methods
[ https://issues.apache.org/jira/browse/SPARK-32933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17789874#comment-17789874 ] Hyukjin Kwon commented on SPARK-32933: -- Here the PR and JIRA: https://github.com/apache/spark/pull/44023 https://issues.apache.org/jira/browse/SPARK-46107 > Use keyword-only syntax for keyword_only methods > > > Key: SPARK-32933 > URL: https://issues.apache.org/jira/browse/SPARK-32933 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.1.0 >Reporter: Maciej Szymkiewicz >Assignee: Maciej Szymkiewicz >Priority: Minor > Fix For: 3.1.0 > > > Since 3.0, provides syntax for indicating keyword-only arguments ([PEP > 3102|https://www.python.org/dev/peps/pep-3102/]). > It is not a full replacement for our current usage of {{keyword_only}}, but > it would allow us to make our expectations explicit: > {code:python} > @keyword_only > def __init__(self, degree=2, inputCol=None, outputCol=None): > {code} > {code:python} > @keyword_only > def __init__(self, *, degree=2, inputCol=None, outputCol=None): > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46107) Deprecate pyspark.keyword_only API
Hyukjin Kwon created SPARK-46107: Summary: Deprecate pyspark.keyword_only API Key: SPARK-46107 URL: https://issues.apache.org/jira/browse/SPARK-46107 Project: Spark Issue Type: Improvement Components: ML, PySpark Affects Versions: 4.0.0 Reporter: Hyukjin Kwon See https://issues.apache.org/jira/browse/SPARK-32933. We don't need this anymore -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46074) [CONNECT][SCALA] Insufficient details in error when a UDF fails
[ https://issues.apache.org/jira/browse/SPARK-46074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46074. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43983 [https://github.com/apache/spark/pull/43983] > [CONNECT][SCALA] Insufficient details in error when a UDF fails > --- > > Key: SPARK-46074 > URL: https://issues.apache.org/jira/browse/SPARK-46074 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.5.0 >Reporter: Niranjan Jayakar >Assignee: Niranjan Jayakar >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Currently, when a UDF fails the connect client does not receive the actual > error that caused the failure. > As an example, the error message looks like - > {code:java} > Exception in thread "main" org.apache.spark.SparkException: > grpc_shaded.io.grpc.StatusRuntimeException: INTERNAL: Job aborted due to > stage failure: Task 2 in stage 0.0 failed 4 times, most recent failure: Lost > task 2.3 in stage 0.0 (TID 10) (10.68.141.158 executor 0): > org.apache.spark.SparkException: [FAILED_EXECUTE_UDF] Failed to execute user > defined function (` (Main$$$Lambda$4770/1714264622)`: (int) => int). > SQLSTATE: 39000 {code} > In this case, the actual error was a {{{}java.lang.NoClassDefFoundError{}}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46074) [CONNECT][SCALA] Insufficient details in error when a UDF fails
[ https://issues.apache.org/jira/browse/SPARK-46074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46074: Assignee: Niranjan Jayakar > [CONNECT][SCALA] Insufficient details in error when a UDF fails > --- > > Key: SPARK-46074 > URL: https://issues.apache.org/jira/browse/SPARK-46074 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.5.0 >Reporter: Niranjan Jayakar >Assignee: Niranjan Jayakar >Priority: Major > Labels: pull-request-available > > Currently, when a UDF fails the connect client does not receive the actual > error that caused the failure. > As an example, the error message looks like - > {code:java} > Exception in thread "main" org.apache.spark.SparkException: > grpc_shaded.io.grpc.StatusRuntimeException: INTERNAL: Job aborted due to > stage failure: Task 2 in stage 0.0 failed 4 times, most recent failure: Lost > task 2.3 in stage 0.0 (TID 10) (10.68.141.158 executor 0): > org.apache.spark.SparkException: [FAILED_EXECUTE_UDF] Failed to execute user > defined function (` (Main$$$Lambda$4770/1714264622)`: (int) => int). > SQLSTATE: 39000 {code} > In this case, the actual error was a {{{}java.lang.NoClassDefFoundError{}}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45922) Multiple policies follow-up (Python)
[ https://issues.apache.org/jira/browse/SPARK-45922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-45922. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43800 [https://github.com/apache/spark/pull/43800] > Multiple policies follow-up (Python) > > > Key: SPARK-45922 > URL: https://issues.apache.org/jira/browse/SPARK-45922 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 4.0.0 >Reporter: Alice Sayutina >Assignee: Alice Sayutina >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Minor further improvements for multiple policies work -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46016) Fix pandas API support list properly
[ https://issues.apache.org/jira/browse/SPARK-46016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46016. -- Fix Version/s: 3.4.2 4.0.0 3.5.1 Assignee: Haejoon Lee Resolution: Fixed Fixed in https://github.com/apache/spark/pull/43996 > Fix pandas API support list properly > > > Key: SPARK-46016 > URL: https://issues.apache.org/jira/browse/SPARK-46016 > Project: Spark > Issue Type: Bug > Components: Documentation, Pandas API on Spark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Labels: pull-request-available > Fix For: 3.4.2, 4.0.0, 3.5.1 > > > Currently Supported pandas API is not generated properly, so we should fix it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46082) Fix protobuf string representation for Pandas Functions API with Spark Connect
[ https://issues.apache.org/jira/browse/SPARK-46082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46082: Assignee: Hyukjin Kwon > Fix protobuf string representation for Pandas Functions API with Spark Connect > -- > > Key: SPARK-46082 > URL: https://issues.apache.org/jira/browse/SPARK-46082 > Project: Spark > Issue Type: Improvement > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Minor > Labels: pull-request-available > > {code} > df = spark.range(1) > df.mapInPandas(lambda x: x, df.schema)._plan.print() > {code} > prints as below. It should includes functions. > {code} > > > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46082) Fix protobuf string representation for Pandas Functions API with Spark Connect
[ https://issues.apache.org/jira/browse/SPARK-46082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46082. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43991 [https://github.com/apache/spark/pull/43991] > Fix protobuf string representation for Pandas Functions API with Spark Connect > -- > > Key: SPARK-46082 > URL: https://issues.apache.org/jira/browse/SPARK-46082 > Project: Spark > Issue Type: Improvement > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > > {code} > df = spark.range(1) > df.mapInPandas(lambda x: x, df.schema)._plan.print() > {code} > prints as below. It should includes functions. > {code} > > > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46085) Dataset.groupingSets in Scala Spark Connect client
Hyukjin Kwon created SPARK-46085: Summary: Dataset.groupingSets in Scala Spark Connect client Key: SPARK-46085 URL: https://issues.apache.org/jira/browse/SPARK-46085 Project: Spark Issue Type: New Feature Components: Connect, SQL Affects Versions: 4.0.0 Reporter: Hyukjin Kwon Scala Spark Connect client for SPARK-45929 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46083) Make SparkNoSuchElementException as a canonical error API
Hyukjin Kwon created SPARK-46083: Summary: Make SparkNoSuchElementException as a canonical error API Key: SPARK-46083 URL: https://issues.apache.org/jira/browse/SPARK-46083 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 4.0.0 Reporter: Hyukjin Kwon https://github.com/apache/spark/pull/43927 added SparkNoSuchElementException. It should be a canonical error API, documented properly. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46082) Fix protobuf string representation for Pandas Functions API with Spark Connect
Hyukjin Kwon created SPARK-46082: Summary: Fix protobuf string representation for Pandas Functions API with Spark Connect Key: SPARK-46082 URL: https://issues.apache.org/jira/browse/SPARK-46082 Project: Spark Issue Type: Improvement Components: Connect, PySpark Affects Versions: 4.0.0 Reporter: Hyukjin Kwon {code} df = spark.range(1) df.mapInPandas(lambda x: x, df.schema)._plan.print() {code} prints as below. It should includes functions. {code} {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46080) Upgrade Cloudpickle to 3.0.0
Hyukjin Kwon created SPARK-46080: Summary: Upgrade Cloudpickle to 3.0.0 Key: SPARK-46080 URL: https://issues.apache.org/jira/browse/SPARK-46080 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 4.0.0 Reporter: Hyukjin Kwon It includes official support of Python 3.12 (https://github.com/cloudpipe/cloudpickle/pull/517) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46076) Remove `unittest` deprecated alias usage for Python 3.12
[ https://issues.apache.org/jira/browse/SPARK-46076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46076. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43986 [https://github.com/apache/spark/pull/43986] > Remove `unittest` deprecated alias usage for Python 3.12 > > > Key: SPARK-46076 > URL: https://issues.apache.org/jira/browse/SPARK-46076 > Project: Spark > Issue Type: Sub-task > Components: PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46065) Refactor `(DataFrame|Series).factorize()` to use `create_map`.
[ https://issues.apache.org/jira/browse/SPARK-46065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46065. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43970 [https://github.com/apache/spark/pull/43970] > Refactor `(DataFrame|Series).factorize()` to use `create_map`. > -- > > Key: SPARK-46065 > URL: https://issues.apache.org/jira/browse/SPARK-46065 > Project: Spark > Issue Type: Bug > Components: Pandas API on Spark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > We can accept Column object for Column.__getitem__ on remote Session, so we > can optimize the existing factorize implementation. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46065) Refactor `(DataFrame|Series).factorize()` to use `create_map`.
[ https://issues.apache.org/jira/browse/SPARK-46065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46065: Assignee: Haejoon Lee > Refactor `(DataFrame|Series).factorize()` to use `create_map`. > -- > > Key: SPARK-46065 > URL: https://issues.apache.org/jira/browse/SPARK-46065 > Project: Spark > Issue Type: Bug > Components: Pandas API on Spark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Labels: pull-request-available > > We can accept Column object for Column.__getitem__ on remote Session, so we > can optimize the existing factorize implementation. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Deleted] (SPARK-46049) Support groupingSets operation in PySpark (Spark Connect)
[ https://issues.apache.org/jira/browse/SPARK-46049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon deleted SPARK-46049: - > Support groupingSets operation in PySpark (Spark Connect) > - > > Key: SPARK-46049 > URL: https://issues.apache.org/jira/browse/SPARK-46049 > Project: Spark > Issue Type: New Feature >Reporter: Hyukjin Kwon >Priority: Major > > Connect version of SPARK-46048 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46063) Improve error messages related to argument types in cute, rollup, groupby, and pivot
[ https://issues.apache.org/jira/browse/SPARK-46063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-46063: - Description: {code} >>> spark.range(1).cube(cols=1.2) Traceback (most recent call last): File "", line 1, in File "/.../python/pyspark/sql/connect/dataframe.py", line 544, in cube raise PySparkTypeError( pyspark.errors.exceptions.base.PySparkTypeError: [NOT_COLUMN_OR_STR] Argument `cube` should be a Column or str, got float. {code} {code} >>> help(spark.range(1).cube) Help on method cube in module pyspark.sql.connect.dataframe: cube(*cols: 'ColumnOrName') -> 'GroupedData' method of pyspark.sql.connect.dataframe.DataFrame instance Create a multi-dimensional cube for the current :class:`DataFrame` using the specified columns, allowing aggregations to be performed on them. .. versionadded:: 1.4.0 .. versionchanged:: 3.4.0 {code} it has to be {cols} was: {code} >>> spark.range(1).cube(cols=1.2) Traceback (most recent call last): File "", line 1, in File "/.../python/pyspark/sql/connect/dataframe.py", line 544, in cube raise PySparkTypeError( pyspark.errors.exceptions.base.PySparkTypeError: [NOT_COLUMN_OR_STR] Argument `cube` should be a Column or str, got float. {code} ``` Help on method cube in module pyspark.sql.connect.dataframe: cube(*cols: 'ColumnOrName') -> 'GroupedData' method of pyspark.sql.connect.dataframe.DataFrame instance Create a multi-dimensional cube for the current :class:`DataFrame` using the specified columns, allowing aggregations to be performed on them. .. versionadded:: 1.4.0 .. versionchanged:: 3.4.0 ``` it has to be {cols} > Improve error messages related to argument types in cute, rollup, groupby, > and pivot > > > Key: SPARK-46063 > URL: https://issues.apache.org/jira/browse/SPARK-46063 > Project: Spark > Issue Type: Improvement > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Minor > > {code} > >>> spark.range(1).cube(cols=1.2) > Traceback (most recent call last): > File "", line 1, in > File "/.../python/pyspark/sql/connect/dataframe.py", line 544, in cube > raise PySparkTypeError( > pyspark.errors.exceptions.base.PySparkTypeError: [NOT_COLUMN_OR_STR] Argument > `cube` should be a Column or str, got float. > {code} > {code} > >>> help(spark.range(1).cube) > Help on method cube in module pyspark.sql.connect.dataframe: > cube(*cols: 'ColumnOrName') -> 'GroupedData' method of > pyspark.sql.connect.dataframe.DataFrame instance > Create a multi-dimensional cube for the current :class:`DataFrame` using > the specified columns, allowing aggregations to be performed on them. > .. versionadded:: 1.4.0 > .. versionchanged:: 3.4.0 > {code} > it has to be {cols} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46063) Improve error messages related to argument types in cute, rollup, groupby, and pivot
[ https://issues.apache.org/jira/browse/SPARK-46063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-46063: - Summary: Improve error messages related to argument types in cute, rollup, groupby, and pivot (was: Improve error messages related to argument types in cute, rollup, and pivot) > Improve error messages related to argument types in cute, rollup, groupby, > and pivot > > > Key: SPARK-46063 > URL: https://issues.apache.org/jira/browse/SPARK-46063 > Project: Spark > Issue Type: Improvement > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Minor > > {code} > >>> spark.range(1).cube(cols=1.2) > Traceback (most recent call last): > File "", line 1, in > File "/.../python/pyspark/sql/connect/dataframe.py", line 544, in cube > raise PySparkTypeError( > pyspark.errors.exceptions.base.PySparkTypeError: [NOT_COLUMN_OR_STR] Argument > `cube` should be a Column or str, got float. > {code} > ``` > Help on method cube in module pyspark.sql.connect.dataframe: > cube(*cols: 'ColumnOrName') -> 'GroupedData' method of > pyspark.sql.connect.dataframe.DataFrame instance > Create a multi-dimensional cube for the current :class:`DataFrame` using > the specified columns, allowing aggregations to be performed on them. > .. versionadded:: 1.4.0 > .. versionchanged:: 3.4.0 > ``` > it has to be {cols} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46063) Improve error messages related to argument types in cute, rollup, and pivot
Hyukjin Kwon created SPARK-46063: Summary: Improve error messages related to argument types in cute, rollup, and pivot Key: SPARK-46063 URL: https://issues.apache.org/jira/browse/SPARK-46063 Project: Spark Issue Type: Improvement Components: Connect, PySpark Affects Versions: 4.0.0 Reporter: Hyukjin Kwon {code} >>> spark.range(1).cube(cols=1.2) Traceback (most recent call last): File "", line 1, in File "/.../python/pyspark/sql/connect/dataframe.py", line 544, in cube raise PySparkTypeError( pyspark.errors.exceptions.base.PySparkTypeError: [NOT_COLUMN_OR_STR] Argument `cube` should be a Column or str, got float. {code} ``` Help on method cube in module pyspark.sql.connect.dataframe: cube(*cols: 'ColumnOrName') -> 'GroupedData' method of pyspark.sql.connect.dataframe.DataFrame instance Create a multi-dimensional cube for the current :class:`DataFrame` using the specified columns, allowing aggregations to be performed on them. .. versionadded:: 1.4.0 .. versionchanged:: 3.4.0 ``` it has to be {cols} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46061) Add the test party for reattach test case
[ https://issues.apache.org/jira/browse/SPARK-46061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46061: Assignee: Hyukjin Kwon > Add the test party for reattach test case > - > > Key: SPARK-46061 > URL: https://issues.apache.org/jira/browse/SPARK-46061 > Project: Spark > Issue Type: Test > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > We need the same test "ReleaseSession releases all queries and does not allow > more requests in the session" added in SPARK-45798 to identify an issue like > SPARK-46042. > This is caused by SPARK-46039 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46061) Add the test party for reattach test case
[ https://issues.apache.org/jira/browse/SPARK-46061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46061. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43965 [https://github.com/apache/spark/pull/43965] > Add the test party for reattach test case > - > > Key: SPARK-46061 > URL: https://issues.apache.org/jira/browse/SPARK-46061 > Project: Spark > Issue Type: Test > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > We need the same test "ReleaseSession releases all queries and does not allow > more requests in the session" added in SPARK-45798 to identify an issue like > SPARK-46042. > This is caused by SPARK-46039 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45600) Make Python data source registration session level
[ https://issues.apache.org/jira/browse/SPARK-45600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-45600. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43742 [https://github.com/apache/spark/pull/43742] > Make Python data source registration session level > -- > > Key: SPARK-45600 > URL: https://issues.apache.org/jira/browse/SPARK-45600 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Currently, registered data sources are stored in `sharedState` and can be > accessed across multiple sessions. This, however, will not work with Spark > Connect. We should make this registration session level, and support static > registration (e.g. using pip install) in the future. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45600) Make Python data source registration session level
[ https://issues.apache.org/jira/browse/SPARK-45600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-45600: Assignee: Allison Wang > Make Python data source registration session level > -- > > Key: SPARK-45600 > URL: https://issues.apache.org/jira/browse/SPARK-45600 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > Labels: pull-request-available > > Currently, registered data sources are stored in `sharedState` and can be > accessed across multiple sessions. This, however, will not work with Spark > Connect. We should make this registration session level, and support static > registration (e.g. using pip install) in the future. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46061) Add the test party for reattach test case
Hyukjin Kwon created SPARK-46061: Summary: Add the test party for reattach test case Key: SPARK-46061 URL: https://issues.apache.org/jira/browse/SPARK-46061 Project: Spark Issue Type: New Feature Components: Connect, PySpark Affects Versions: 4.0.0 Reporter: Hyukjin Kwon We need the same test "ReleaseSession releases all queries and does not allow more requests in the session" added in SPARK-45798 to identify an issue like SPARK-46042. This is caused by SPARK-46039 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46061) Add the test party for reattach test case
[ https://issues.apache.org/jira/browse/SPARK-46061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-46061: - Issue Type: Test (was: New Feature) > Add the test party for reattach test case > - > > Key: SPARK-46061 > URL: https://issues.apache.org/jira/browse/SPARK-46061 > Project: Spark > Issue Type: Test > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > > We need the same test "ReleaseSession releases all queries and does not allow > more requests in the session" added in SPARK-45798 to identify an issue like > SPARK-46042. > This is caused by SPARK-46039 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46048) Support groupingSets operation in PySpark
[ https://issues.apache.org/jira/browse/SPARK-46048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46048. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43951 [https://github.com/apache/spark/pull/43951] > Support groupingSets operation in PySpark > - > > Key: SPARK-46048 > URL: https://issues.apache.org/jira/browse/SPARK-46048 > Project: Spark > Issue Type: New Feature > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Python version of SPARK-45929 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46048) Support groupingSets operation in PySpark
[ https://issues.apache.org/jira/browse/SPARK-46048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46048: Assignee: Hyukjin Kwon > Support groupingSets operation in PySpark > - > > Key: SPARK-46048 > URL: https://issues.apache.org/jira/browse/SPARK-46048 > Project: Spark > Issue Type: New Feature > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > Python version of SPARK-45929 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46048) Support groupingSets operation in PySpark
[ https://issues.apache.org/jira/browse/SPARK-46048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-46048: - Issue Type: New Feature (was: Bug) > Support groupingSets operation in PySpark > - > > Key: SPARK-46048 > URL: https://issues.apache.org/jira/browse/SPARK-46048 > Project: Spark > Issue Type: New Feature > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > > Python version of SPARK-45929 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46048) Support groupingSets operation in PySpark
Hyukjin Kwon created SPARK-46048: Summary: Support groupingSets operation in PySpark Key: SPARK-46048 URL: https://issues.apache.org/jira/browse/SPARK-46048 Project: Spark Issue Type: Bug Components: Connect, PySpark Affects Versions: 4.0.0 Reporter: Hyukjin Kwon Python version of SPARK-45929 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46049) Support groupingSets operation in PySpark (Spark Connect)
Hyukjin Kwon created SPARK-46049: Summary: Support groupingSets operation in PySpark (Spark Connect) Key: SPARK-46049 URL: https://issues.apache.org/jira/browse/SPARK-46049 Project: Spark Issue Type: New Feature Components: Connect, PySpark Affects Versions: 4.0.0 Reporter: Hyukjin Kwon Connect version of SPARK-46048 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46017) PySpark doc build doesn't work properly on Mac
[ https://issues.apache.org/jira/browse/SPARK-46017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46017: Assignee: Haejoon Lee > PySpark doc build doesn't work properly on Mac > -- > > Key: SPARK-46017 > URL: https://issues.apache.org/jira/browse/SPARK-46017 > Project: Spark > Issue Type: Bug > Components: Build, PySpark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > > PySpark doc build is working properly on GitHub CI, but doesn't work properly > on local Mac env for some reason. We should investigate and fix it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46022) Remove deprecated functions APIs from documents
[ https://issues.apache.org/jira/browse/SPARK-46022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46022. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43932 [https://github.com/apache/spark/pull/43932] > Remove deprecated functions APIs from documents > --- > > Key: SPARK-46022 > URL: https://issues.apache.org/jira/browse/SPARK-46022 > Project: Spark > Issue Type: Bug > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > We should not expose the deprecated APIs on official documentation. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46022) Remove deprecated functions APIs from documents
[ https://issues.apache.org/jira/browse/SPARK-46022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46022: Assignee: Haejoon Lee > Remove deprecated functions APIs from documents > --- > > Key: SPARK-46022 > URL: https://issues.apache.org/jira/browse/SPARK-46022 > Project: Spark > Issue Type: Bug > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Labels: pull-request-available > > We should not expose the deprecated APIs on official documentation. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46013) Improve basic datasource examples
[ https://issues.apache.org/jira/browse/SPARK-46013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46013: Assignee: Allison Wang > Improve basic datasource examples > - > > Key: SPARK-46013 > URL: https://issues.apache.org/jira/browse/SPARK-46013 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > Labels: pull-request-available > > We should improve the Python examples on this page: > [https://spark.apache.org/docs/latest/sql-data-sources-load-save-functions.html] > (basic_datasource_examples.py) > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46017) PySpark doc build doesn't work properly on Mac
[ https://issues.apache.org/jira/browse/SPARK-46017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46017. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43932 [https://github.com/apache/spark/pull/43932] > PySpark doc build doesn't work properly on Mac > -- > > Key: SPARK-46017 > URL: https://issues.apache.org/jira/browse/SPARK-46017 > Project: Spark > Issue Type: Bug > Components: Build, PySpark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Fix For: 4.0.0 > > > PySpark doc build is working properly on GitHub CI, but doesn't work properly > on local Mac env for some reason. We should investigate and fix it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46013) Improve basic datasource examples
[ https://issues.apache.org/jira/browse/SPARK-46013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46013. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43917 [https://github.com/apache/spark/pull/43917] > Improve basic datasource examples > - > > Key: SPARK-46013 > URL: https://issues.apache.org/jira/browse/SPARK-46013 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > We should improve the Python examples on this page: > [https://spark.apache.org/docs/latest/sql-data-sources-load-save-functions.html] > (basic_datasource_examples.py) > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46042) Reenable a `releaseSession` test case in SparkConnectServiceE2ESuite
[ https://issues.apache.org/jira/browse/SPARK-46042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-46042: - Description: https://github.com/apache/spark/pull/43942#issuecomment-1821896165 > Reenable a `releaseSession` test case in SparkConnectServiceE2ESuite > > > Key: SPARK-46042 > URL: https://issues.apache.org/jira/browse/SPARK-46042 > Project: Spark > Issue Type: Sub-task > Components: Tests >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > > https://github.com/apache/spark/pull/43942#issuecomment-1821896165 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-46032) connect: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f
[ https://issues.apache.org/jira/browse/SPARK-46032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17788430#comment-17788430 ] Hyukjin Kwon commented on SPARK-46032: -- Are executors using the same versions too? The error is most likely from a different version of JDK and Scala version. I can't reproduce them locally so sharing fulll specification of the server and the client would be very helpful. > connect: cannot assign instance of java.lang.invoke.SerializedLambda to field > org.apache.spark.rdd.MapPartitionsRDD.f > - > > Key: SPARK-46032 > URL: https://issues.apache.org/jira/browse/SPARK-46032 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.5.0 >Reporter: Bobby Wang >Priority: Major > > I downloaded spark 3.5 from the spark official website, and then I started a > Spark Standalone cluster in which both master and the only worker are in the > same node. > > Then I started the connect server by > {code:java} > start-connect-server.sh \ > --master spark://10.19.183.93:7077 \ > --packages org.apache.spark:spark-connect_2.12:3.5.0 \ > --conf spark.executor.cores=12 \ > --conf spark.task.cpus=1 \ > --executor-memory 30G \ > --conf spark.executor.resource.gpu.amount=1 \ > --conf spark.task.resource.gpu.amount=0.08 \ > --driver-memory 1G{code} > > I can 100% ensure the spark standalone cluster, the connect server and spark > driver are started observed from the webui. > > Finally, I tried to run a very simple spark job > (spark.range(100).filter("id>2").collect()) from spark-connect-client using > pyspark, but I got the below error. > > _pyspark --remote sc://localhost_ > _Python 3.10.0 (default, Mar 3 2022, 09:58:08) [GCC 7.5.0] on linux_ > _Type "help", "copyright", "credits" or "license" for more information._ > _Welcome to_ > _ ___ > _/ __/_ {{_}}{_}__ ___{_}{{_}}/ /{{_}}{_}_ > {_}{{_}}\ \/ _ \/ _ `/ {_}{{_}}/ '{_}/{_} > {_}/{_}_ / .{_}{{_}}/{_},{_}/{_}/ /{_}/{_}\ version 3.5.0{_} > {_}/{_}/_ > > _Using Python version 3.10.0 (default, Mar 3 2022 09:58:08)_ > _Client connected to the Spark Connect server at localhost_ > _SparkSession available as 'spark'._ > _>>> spark.range(100).filter("id > 3").collect()_ > _Traceback (most recent call last):_ > _File "", line 1, in _ > _File > "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/dataframe.py", > line 1645, in collect_ > _table, schema = self._session.client.to_table(query)_ > _File > "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py", > line 858, in to_table_ > _table, schema, _, _, _ = self._execute_and_fetch(req)_ > _File > "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py", > line 1282, in _execute_and_fetch_ > _for response in self._execute_and_fetch_as_iterator(req):_ > _File > "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py", > line 1263, in _execute_and_fetch_as_iterator_ > _self._handle_error(error)_ > _File > "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py", > line 1502, in _handle_error_ > _self._handle_rpc_error(error)_ > _File > "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py", > line 1538, in _handle_rpc_error_ > _raise convert_exception(info, status.message) from None_ > _pyspark.errors.exceptions.connect.SparkConnectGrpcException: > (org.apache.spark.SparkException) Job aborted due to stage failure: Task 0 in > stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 > (TID 35) (10.19.183.93 executor 0): java.lang.ClassCastException: cannot > assign instance of java.lang.invoke.SerializedLambda to field > org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance > of org.apache.spark.rdd.MapPartitionsRDD_ > _at > java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2301)_ > _at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1431)_ > _at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2437)_ > _at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)_ > _at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)_ > _at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)_ > _at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)_ > _at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)_ > _at
[jira] [Commented] (SPARK-46032) connect: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f
[ https://issues.apache.org/jira/browse/SPARK-46032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17788381#comment-17788381 ] Hyukjin Kwon commented on SPARK-46032: -- and can you run without Spark Connect? Seems like just regular Spark shell would fail given the error messages. > connect: cannot assign instance of java.lang.invoke.SerializedLambda to field > org.apache.spark.rdd.MapPartitionsRDD.f > - > > Key: SPARK-46032 > URL: https://issues.apache.org/jira/browse/SPARK-46032 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.5.0 >Reporter: Bobby Wang >Priority: Major > > I downloaded spark 3.5 from the spark official website, and then I started a > Spark Standalone cluster in which both master and the only worker are in the > same node. > > Then I started the connect server by > {code:java} > start-connect-server.sh \ > --master spark://10.19.183.93:7077 \ > --packages org.apache.spark:spark-connect_2.12:3.5.0 \ > --conf spark.executor.cores=12 \ > --conf spark.task.cpus=1 \ > --executor-memory 30G \ > --conf spark.executor.resource.gpu.amount=1 \ > --conf spark.task.resource.gpu.amount=0.08 \ > --driver-memory 1G{code} > > I can 100% ensure the spark standalone cluster, the connect server and spark > driver are started observed from the webui. > > Finally, I tried to run a very simple spark job > (spark.range(100).filter("id>2").collect()) from spark-connect-client using > pyspark, but I got the below error. > > _pyspark --remote sc://localhost_ > _Python 3.10.0 (default, Mar 3 2022, 09:58:08) [GCC 7.5.0] on linux_ > _Type "help", "copyright", "credits" or "license" for more information._ > _Welcome to_ > _ ___ > _/ __/_ {{_}}{_}__ ___{_}{{_}}/ /{{_}}{_}_ > {_}{{_}}\ \/ _ \/ _ `/ {_}{{_}}/ '{_}/{_} > {_}/{_}_ / .{_}{{_}}/{_},{_}/{_}/ /{_}/{_}\ version 3.5.0{_} > {_}/{_}/_ > > _Using Python version 3.10.0 (default, Mar 3 2022 09:58:08)_ > _Client connected to the Spark Connect server at localhost_ > _SparkSession available as 'spark'._ > _>>> spark.range(100).filter("id > 3").collect()_ > _Traceback (most recent call last):_ > _File "", line 1, in _ > _File > "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/dataframe.py", > line 1645, in collect_ > _table, schema = self._session.client.to_table(query)_ > _File > "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py", > line 858, in to_table_ > _table, schema, _, _, _ = self._execute_and_fetch(req)_ > _File > "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py", > line 1282, in _execute_and_fetch_ > _for response in self._execute_and_fetch_as_iterator(req):_ > _File > "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py", > line 1263, in _execute_and_fetch_as_iterator_ > _self._handle_error(error)_ > _File > "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py", > line 1502, in _handle_error_ > _self._handle_rpc_error(error)_ > _File > "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py", > line 1538, in _handle_rpc_error_ > _raise convert_exception(info, status.message) from None_ > _pyspark.errors.exceptions.connect.SparkConnectGrpcException: > (org.apache.spark.SparkException) Job aborted due to stage failure: Task 0 in > stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 > (TID 35) (10.19.183.93 executor 0): java.lang.ClassCastException: cannot > assign instance of java.lang.invoke.SerializedLambda to field > org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance > of org.apache.spark.rdd.MapPartitionsRDD_ > _at > java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2301)_ > _at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1431)_ > _at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2437)_ > _at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)_ > _at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)_ > _at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)_ > _at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)_ > _at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)_ > _at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)_ > _at
[jira] [Commented] (SPARK-46032) connect: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f
[ https://issues.apache.org/jira/browse/SPARK-46032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17788380#comment-17788380 ] Hyukjin Kwon commented on SPARK-46032: -- What's your Scala version [~wbo4958]? > connect: cannot assign instance of java.lang.invoke.SerializedLambda to field > org.apache.spark.rdd.MapPartitionsRDD.f > - > > Key: SPARK-46032 > URL: https://issues.apache.org/jira/browse/SPARK-46032 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.5.0 >Reporter: Bobby Wang >Priority: Major > > I downloaded spark 3.5 from the spark official website, and then I started a > Spark Standalone cluster in which both master and the only worker are in the > same node. > > Then I started the connect server by > {code:java} > start-connect-server.sh \ > --master spark://10.19.183.93:7077 \ > --packages org.apache.spark:spark-connect_2.12:3.5.0 \ > --conf spark.executor.cores=12 \ > --conf spark.task.cpus=1 \ > --executor-memory 30G \ > --conf spark.executor.resource.gpu.amount=1 \ > --conf spark.task.resource.gpu.amount=0.08 \ > --driver-memory 1G{code} > > I can 100% ensure the spark standalone cluster, the connect server and spark > driver are started observed from the webui. > > Finally, I tried to run a very simple spark job > (spark.range(100).filter("id>2").collect()) from spark-connect-client using > pyspark, but I got the below error. > > _pyspark --remote sc://localhost_ > _Python 3.10.0 (default, Mar 3 2022, 09:58:08) [GCC 7.5.0] on linux_ > _Type "help", "copyright", "credits" or "license" for more information._ > _Welcome to_ > _ ___ > _/ __/_ {{_}}{_}__ ___{_}{{_}}/ /{{_}}{_}_ > {_}{{_}}\ \/ _ \/ _ `/ {_}{{_}}/ '{_}/{_} > {_}/{_}_ / .{_}{{_}}/{_},{_}/{_}/ /{_}/{_}\ version 3.5.0{_} > {_}/{_}/_ > > _Using Python version 3.10.0 (default, Mar 3 2022 09:58:08)_ > _Client connected to the Spark Connect server at localhost_ > _SparkSession available as 'spark'._ > _>>> spark.range(100).filter("id > 3").collect()_ > _Traceback (most recent call last):_ > _File "", line 1, in _ > _File > "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/dataframe.py", > line 1645, in collect_ > _table, schema = self._session.client.to_table(query)_ > _File > "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py", > line 858, in to_table_ > _table, schema, _, _, _ = self._execute_and_fetch(req)_ > _File > "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py", > line 1282, in _execute_and_fetch_ > _for response in self._execute_and_fetch_as_iterator(req):_ > _File > "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py", > line 1263, in _execute_and_fetch_as_iterator_ > _self._handle_error(error)_ > _File > "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py", > line 1502, in _handle_error_ > _self._handle_rpc_error(error)_ > _File > "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py", > line 1538, in _handle_rpc_error_ > _raise convert_exception(info, status.message) from None_ > _pyspark.errors.exceptions.connect.SparkConnectGrpcException: > (org.apache.spark.SparkException) Job aborted due to stage failure: Task 0 in > stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 > (TID 35) (10.19.183.93 executor 0): java.lang.ClassCastException: cannot > assign instance of java.lang.invoke.SerializedLambda to field > org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance > of org.apache.spark.rdd.MapPartitionsRDD_ > _at > java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2301)_ > _at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1431)_ > _at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2437)_ > _at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)_ > _at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)_ > _at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)_ > _at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)_ > _at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)_ > _at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)_ > _at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)_ > _at
[jira] [Comment Edited] (SPARK-46032) connect: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f
[ https://issues.apache.org/jira/browse/SPARK-46032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17788380#comment-17788380 ] Hyukjin Kwon edited comment on SPARK-46032 at 11/21/23 11:34 AM: - What's your Scala and JDK versions [~wbo4958]? was (Author: gurwls223): What's your Scala version [~wbo4958]? > connect: cannot assign instance of java.lang.invoke.SerializedLambda to field > org.apache.spark.rdd.MapPartitionsRDD.f > - > > Key: SPARK-46032 > URL: https://issues.apache.org/jira/browse/SPARK-46032 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.5.0 >Reporter: Bobby Wang >Priority: Major > > I downloaded spark 3.5 from the spark official website, and then I started a > Spark Standalone cluster in which both master and the only worker are in the > same node. > > Then I started the connect server by > {code:java} > start-connect-server.sh \ > --master spark://10.19.183.93:7077 \ > --packages org.apache.spark:spark-connect_2.12:3.5.0 \ > --conf spark.executor.cores=12 \ > --conf spark.task.cpus=1 \ > --executor-memory 30G \ > --conf spark.executor.resource.gpu.amount=1 \ > --conf spark.task.resource.gpu.amount=0.08 \ > --driver-memory 1G{code} > > I can 100% ensure the spark standalone cluster, the connect server and spark > driver are started observed from the webui. > > Finally, I tried to run a very simple spark job > (spark.range(100).filter("id>2").collect()) from spark-connect-client using > pyspark, but I got the below error. > > _pyspark --remote sc://localhost_ > _Python 3.10.0 (default, Mar 3 2022, 09:58:08) [GCC 7.5.0] on linux_ > _Type "help", "copyright", "credits" or "license" for more information._ > _Welcome to_ > _ ___ > _/ __/_ {{_}}{_}__ ___{_}{{_}}/ /{{_}}{_}_ > {_}{{_}}\ \/ _ \/ _ `/ {_}{{_}}/ '{_}/{_} > {_}/{_}_ / .{_}{{_}}/{_},{_}/{_}/ /{_}/{_}\ version 3.5.0{_} > {_}/{_}/_ > > _Using Python version 3.10.0 (default, Mar 3 2022 09:58:08)_ > _Client connected to the Spark Connect server at localhost_ > _SparkSession available as 'spark'._ > _>>> spark.range(100).filter("id > 3").collect()_ > _Traceback (most recent call last):_ > _File "", line 1, in _ > _File > "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/dataframe.py", > line 1645, in collect_ > _table, schema = self._session.client.to_table(query)_ > _File > "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py", > line 858, in to_table_ > _table, schema, _, _, _ = self._execute_and_fetch(req)_ > _File > "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py", > line 1282, in _execute_and_fetch_ > _for response in self._execute_and_fetch_as_iterator(req):_ > _File > "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py", > line 1263, in _execute_and_fetch_as_iterator_ > _self._handle_error(error)_ > _File > "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py", > line 1502, in _handle_error_ > _self._handle_rpc_error(error)_ > _File > "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py", > line 1538, in _handle_rpc_error_ > _raise convert_exception(info, status.message) from None_ > _pyspark.errors.exceptions.connect.SparkConnectGrpcException: > (org.apache.spark.SparkException) Job aborted due to stage failure: Task 0 in > stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 > (TID 35) (10.19.183.93 executor 0): java.lang.ClassCastException: cannot > assign instance of java.lang.invoke.SerializedLambda to field > org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance > of org.apache.spark.rdd.MapPartitionsRDD_ > _at > java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2301)_ > _at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1431)_ > _at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2437)_ > _at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)_ > _at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)_ > _at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)_ > _at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)_ > _at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)_ > _at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)_
[jira] [Updated] (SPARK-46032) connect: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f
[ https://issues.apache.org/jira/browse/SPARK-46032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-46032: - Priority: Major (was: Blocker) > connect: cannot assign instance of java.lang.invoke.SerializedLambda to field > org.apache.spark.rdd.MapPartitionsRDD.f > - > > Key: SPARK-46032 > URL: https://issues.apache.org/jira/browse/SPARK-46032 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.5.0 >Reporter: Bobby Wang >Priority: Major > > I downloaded spark 3.5 from the spark official website, and then I started a > Spark Standalone cluster in which both master and the only worker are in the > same node. > > Then I started the connect server by > {code:java} > start-connect-server.sh \ > --master spark://10.19.183.93:7077 \ > --packages org.apache.spark:spark-connect_2.12:3.5.0 \ > --conf spark.executor.cores=12 \ > --conf spark.task.cpus=1 \ > --executor-memory 30G \ > --conf spark.executor.resource.gpu.amount=1 \ > --conf spark.task.resource.gpu.amount=0.08 \ > --driver-memory 1G{code} > > I can 100% ensure the spark standalone cluster, the connect server and spark > driver are started observed from the webui. > > Finally, I tried to run a very simple spark job > (spark.range(100).filter("id>2").collect()) from spark-connect-client using > pyspark, but I got the below error. > > _pyspark --remote sc://localhost_ > _Python 3.10.0 (default, Mar 3 2022, 09:58:08) [GCC 7.5.0] on linux_ > _Type "help", "copyright", "credits" or "license" for more information._ > _Welcome to_ > _ ___ > _/ __/_ {{_}}{_}__ ___{_}{{_}}/ /{{_}}{_}_ > {_}{{_}}\ \/ _ \/ _ `/ {_}{{_}}/ '{_}/{_} > {_}/{_}_ / .{_}{{_}}/{_},{_}/{_}/ /{_}/{_}\ version 3.5.0{_} > {_}/{_}/_ > > _Using Python version 3.10.0 (default, Mar 3 2022 09:58:08)_ > _Client connected to the Spark Connect server at localhost_ > _SparkSession available as 'spark'._ > _>>> spark.range(100).filter("id > 3").collect()_ > _Traceback (most recent call last):_ > _File "", line 1, in _ > _File > "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/dataframe.py", > line 1645, in collect_ > _table, schema = self._session.client.to_table(query)_ > _File > "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py", > line 858, in to_table_ > _table, schema, _, _, _ = self._execute_and_fetch(req)_ > _File > "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py", > line 1282, in _execute_and_fetch_ > _for response in self._execute_and_fetch_as_iterator(req):_ > _File > "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py", > line 1263, in _execute_and_fetch_as_iterator_ > _self._handle_error(error)_ > _File > "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py", > line 1502, in _handle_error_ > _self._handle_rpc_error(error)_ > _File > "/home/xxx/github/mytools/spark.home/spark-3.5.0-bin-hadoop3/python/pyspark/sql/connect/client/core.py", > line 1538, in _handle_rpc_error_ > _raise convert_exception(info, status.message) from None_ > _pyspark.errors.exceptions.connect.SparkConnectGrpcException: > (org.apache.spark.SparkException) Job aborted due to stage failure: Task 0 in > stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 > (TID 35) (10.19.183.93 executor 0): java.lang.ClassCastException: cannot > assign instance of java.lang.invoke.SerializedLambda to field > org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance > of org.apache.spark.rdd.MapPartitionsRDD_ > _at > java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2301)_ > _at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1431)_ > _at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2437)_ > _at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)_ > _at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)_ > _at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)_ > _at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)_ > _at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)_ > _at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)_ > _at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)_ > _at java.io.ObjectInputStream.readObject(ObjectInputStream.java:503)_ > _at
[jira] [Assigned] (SPARK-46023) Annotate parameters at docstrings in pyspark.sql module
[ https://issues.apache.org/jira/browse/SPARK-46023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46023: Assignee: Hyukjin Kwon > Annotate parameters at docstrings in pyspark.sql module > --- > > Key: SPARK-46023 > URL: https://issues.apache.org/jira/browse/SPARK-46023 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > See PR -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46023) Annotate parameters at docstrings in pyspark.sql module
[ https://issues.apache.org/jira/browse/SPARK-46023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46023. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43925 [https://github.com/apache/spark/pull/43925] > Annotate parameters at docstrings in pyspark.sql module > --- > > Key: SPARK-46023 > URL: https://issues.apache.org/jira/browse/SPARK-46023 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > See PR -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46026) Refine docstring of UDTF
[ https://issues.apache.org/jira/browse/SPARK-46026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46026: Assignee: Hyukjin Kwon > Refine docstring of UDTF > > > Key: SPARK-46026 > URL: https://issues.apache.org/jira/browse/SPARK-46026 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46026) Refine docstring of UDTF
[ https://issues.apache.org/jira/browse/SPARK-46026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46026. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43928 [https://github.com/apache/spark/pull/43928] > Refine docstring of UDTF > > > Key: SPARK-46026 > URL: https://issues.apache.org/jira/browse/SPARK-46026 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46024) Document parameters and examples for RuntimeConf get, set and unset
[ https://issues.apache.org/jira/browse/SPARK-46024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46024: Assignee: Hyukjin Kwon > Document parameters and examples for RuntimeConf get, set and unset > --- > > Key: SPARK-46024 > URL: https://issues.apache.org/jira/browse/SPARK-46024 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46024) Document parameters and examples for RuntimeConf get, set and unset
[ https://issues.apache.org/jira/browse/SPARK-46024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46024. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43927 [https://github.com/apache/spark/pull/43927] > Document parameters and examples for RuntimeConf get, set and unset > --- > > Key: SPARK-46024 > URL: https://issues.apache.org/jira/browse/SPARK-46024 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46027) Add `Python 3.12` to the Daily Python Github Action job
[ https://issues.apache.org/jira/browse/SPARK-46027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46027. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43929 [https://github.com/apache/spark/pull/43929] > Add `Python 3.12` to the Daily Python Github Action job > --- > > Key: SPARK-46027 > URL: https://issues.apache.org/jira/browse/SPARK-46027 > Project: Spark > Issue Type: Sub-task > Components: Project Infra, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46027) Add `Python 3.12` to the Daily Python Github Action job
Hyukjin Kwon created SPARK-46027: Summary: Add `Python 3.12` to the Daily Python Github Action job Key: SPARK-46027 URL: https://issues.apache.org/jira/browse/SPARK-46027 Project: Spark Issue Type: Sub-task Components: Project Infra, PySpark Affects Versions: 4.0.0 Reporter: Hyukjin Kwon -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46004) Refine docstring of `DataFrame.dropna/fillna/replace`
[ https://issues.apache.org/jira/browse/SPARK-46004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46004: Assignee: BingKun Pan > Refine docstring of `DataFrame.dropna/fillna/replace` > - > > Key: SPARK-46004 > URL: https://issues.apache.org/jira/browse/SPARK-46004 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46004) Refine docstring of `DataFrame.dropna/fillna/replace`
[ https://issues.apache.org/jira/browse/SPARK-46004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46004. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43907 [https://github.com/apache/spark/pull/43907] > Refine docstring of `DataFrame.dropna/fillna/replace` > - > > Key: SPARK-46004 > URL: https://issues.apache.org/jira/browse/SPARK-46004 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46026) Refine docstring of UDTF
Hyukjin Kwon created SPARK-46026: Summary: Refine docstring of UDTF Key: SPARK-46026 URL: https://issues.apache.org/jira/browse/SPARK-46026 Project: Spark Issue Type: Sub-task Components: Documentation, PySpark Affects Versions: 4.0.0 Reporter: Hyukjin Kwon -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Deleted] (SPARK-46025) Support Python 3.12 in PySpark
[ https://issues.apache.org/jira/browse/SPARK-46025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon deleted SPARK-46025: - > Support Python 3.12 in PySpark > -- > > Key: SPARK-46025 > URL: https://issues.apache.org/jira/browse/SPARK-46025 > Project: Spark > Issue Type: Improvement >Reporter: Hyukjin Kwon >Priority: Major > > Python 3.12 is released out. We should make sure the tests pass, and mark it > supported in setup.py -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46025) Support Python 3.12 in PySpark
[ https://issues.apache.org/jira/browse/SPARK-46025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-46025: - Issue Type: Improvement (was: Bug) > Support Python 3.12 in PySpark > -- > > Key: SPARK-46025 > URL: https://issues.apache.org/jira/browse/SPARK-46025 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > > Python 3.12 is released out. We should make sure the tests pass, and mark it > supported in setup.py -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46025) Support Python 3.12 in PySpark
Hyukjin Kwon created SPARK-46025: Summary: Support Python 3.12 in PySpark Key: SPARK-46025 URL: https://issues.apache.org/jira/browse/SPARK-46025 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 4.0.0 Reporter: Hyukjin Kwon Python 3.12 is released out. We should make sure the tests pass, and mark it supported in setup.py -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46024) Document parameters and examples for RuntimeConf get, set and unset
Hyukjin Kwon created SPARK-46024: Summary: Document parameters and examples for RuntimeConf get, set and unset Key: SPARK-46024 URL: https://issues.apache.org/jira/browse/SPARK-46024 Project: Spark Issue Type: Sub-task Components: Documentation, PySpark Affects Versions: 4.0.0 Reporter: Hyukjin Kwon -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46023) Annotate parameters at docstrings in pyspark.sql module
Hyukjin Kwon created SPARK-46023: Summary: Annotate parameters at docstrings in pyspark.sql module Key: SPARK-46023 URL: https://issues.apache.org/jira/browse/SPARK-46023 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 4.0.0 Reporter: Hyukjin Kwon See PR -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46015) Fix broken link for Koalas issues
[ https://issues.apache.org/jira/browse/SPARK-46015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-46015: Assignee: Haejoon Lee > Fix broken link for Koalas issues > - > > Key: SPARK-46015 > URL: https://issues.apache.org/jira/browse/SPARK-46015 > Project: Spark > Issue Type: Bug > Components: Documentation, PS >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Labels: pull-request-available > > There is a link broken for Koalas old repo. We should address it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46015) Fix broken link for Koalas issues
[ https://issues.apache.org/jira/browse/SPARK-46015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-46015. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43918 [https://github.com/apache/spark/pull/43918] > Fix broken link for Koalas issues > - > > Key: SPARK-46015 > URL: https://issues.apache.org/jira/browse/SPARK-46015 > Project: Spark > Issue Type: Bug > Components: Documentation, PS >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > There is a link broken for Koalas old repo. We should address it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44867) Refactor Spark Connect Docs to incorporate Scala setup
[ https://issues.apache.org/jira/browse/SPARK-44867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-44867. -- Fix Version/s: 3.5.0 Assignee: Venkata Sai Akhil Gudesa Resolution: Fixed Fixed in https://github.com/apache/spark/pull/42556 > Refactor Spark Connect Docs to incorporate Scala setup > -- > > Key: SPARK-44867 > URL: https://issues.apache.org/jira/browse/SPARK-44867 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.5.0 >Reporter: Venkata Sai Akhil Gudesa >Assignee: Venkata Sai Akhil Gudesa >Priority: Major > Fix For: 3.5.0 > > > The current Spark Connect > [overview|https://spark.apache.org/docs/latest/spark-connect-overview.html] > does not include instructions to setup the Scala REPL as well using the Scala > client in applications. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45929) support grouping set operation in dataframe api
[ https://issues.apache.org/jira/browse/SPARK-45929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-45929. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43813 [https://github.com/apache/spark/pull/43813] > support grouping set operation in dataframe api > --- > > Key: SPARK-45929 > URL: https://issues.apache.org/jira/browse/SPARK-45929 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.1 >Reporter: JacobZheng >Assignee: JacobZheng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > I am using spark dataframe api for complex calculations. When I need to use > the grouping sets function, I can only convert the expression to sql via > analyzedPlan and then splice these sql into a complex sql to execute. In some > cases, this operation generates an extremely complex sql. executing this > complex sql, antlr4 continues to consume a large amount of memory, similar to > a memory leak scenario. If you can and rollup, cube function through the > dataframe api to calculate these operations will be much simpler. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45929) support grouping set operation in dataframe api
[ https://issues.apache.org/jira/browse/SPARK-45929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-45929: Assignee: JacobZheng > support grouping set operation in dataframe api > --- > > Key: SPARK-45929 > URL: https://issues.apache.org/jira/browse/SPARK-45929 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.1 >Reporter: JacobZheng >Assignee: JacobZheng >Priority: Major > Labels: pull-request-available > > I am using spark dataframe api for complex calculations. When I need to use > the grouping sets function, I can only convert the expression to sql via > analyzedPlan and then splice these sql into a complex sql to execute. In some > cases, this operation generates an extremely complex sql. executing this > complex sql, antlr4 continues to consume a large amount of memory, similar to > a memory leak scenario. If you can and rollup, cube function through the > dataframe api to calculate these operations will be much simpler. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45856) Move ArtifactManager from Spark Connect into SparkSession (sql/core)
[ https://issues.apache.org/jira/browse/SPARK-45856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-45856. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43735 [https://github.com/apache/spark/pull/43735] > Move ArtifactManager from Spark Connect into SparkSession (sql/core) > > > Key: SPARK-45856 > URL: https://issues.apache.org/jira/browse/SPARK-45856 > Project: Spark > Issue Type: Improvement > Components: Connect, SQL >Affects Versions: 4.0.0 >Reporter: Venkata Sai Akhil Gudesa >Assignee: Venkata Sai Akhil Gudesa >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > The `ArtifactManager` that currently lies in the connect package can be moved > into the wider sql/core package (e.g SparkSession) to expand the scope. This > is possible because the `ArtifactManager` is tied solely to the > `SparkSession#sessionUUID` and hence can be cleanly detached from Spark > Connect and be made generally available. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45996) Show proper dependency requirement messages for Spark Connect
[ https://issues.apache.org/jira/browse/SPARK-45996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-45996: Assignee: Hyukjin Kwon > Show proper dependency requirement messages for Spark Connect > - > > Key: SPARK-45996 > URL: https://issues.apache.org/jira/browse/SPARK-45996 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > {code} > ./bin/pyspark --remote local > {code} > We should improve the error messages below. > {code} > /.../pyspark/shell.py:57: UserWarning: Failed to initialize Spark session. > warnings.warn("Failed to initialize Spark session.") > Traceback (most recent call last): > File "/.../pyspark/shell.py", line 52, in > spark = SparkSession.builder.getOrCreate() > File "/.../pyspark/sql/session.py", line 476, in getOrCreate > from pyspark.sql.connect.session import SparkSession as RemoteSparkSession > File "/.../pyspark/sql/connect/session.py", line 53, in > from pyspark.sql.connect.client import SparkConnectClient, ChannelBuilder > File "/.../pyspark/sql/connect/client/__init__.py", line 22, in > from pyspark.sql.connect.client.core import * # noqa: F401,F403 > File "/.../pyspark/sql/connect/client/core.py", line 51, in > import google.protobuf.message > ModuleNotFoundError: No module named 'google > {code} > {code} > /.../pyspark/shell.py:57: UserWarning: Failed to initialize Spark session. > warnings.warn("Failed to initialize Spark session.") > Traceback (most recent call last): > File "/.../pyspark/shell.py", line 52, in > spark = SparkSession.builder.getOrCreate() > File "/.../pyspark/sql/session.py", line 476, in getOrCreate > from pyspark.sql.connect.session import SparkSession as RemoteSparkSession > File "/.../pyspark/sql/connect/session.py", line 53, in > from pyspark.sql.connect.client import SparkConnectClient, ChannelBuilder > File "/.../pyspark/sql/connect/client/__init__.py", line 22, in > from pyspark.sql.connect.client.core import * # noqa: F401,F403 > File "/.../pyspark/sql/connect/client/core.py", line 52, in > from grpc_status import rpc_status > ModuleNotFoundError: No module named 'grpc_status' > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45996) Show proper dependency requirement messages for Spark Connect
[ https://issues.apache.org/jira/browse/SPARK-45996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-45996. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43894 [https://github.com/apache/spark/pull/43894] > Show proper dependency requirement messages for Spark Connect > - > > Key: SPARK-45996 > URL: https://issues.apache.org/jira/browse/SPARK-45996 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > {code} > ./bin/pyspark --remote local > {code} > We should improve the error messages below. > {code} > /.../pyspark/shell.py:57: UserWarning: Failed to initialize Spark session. > warnings.warn("Failed to initialize Spark session.") > Traceback (most recent call last): > File "/.../pyspark/shell.py", line 52, in > spark = SparkSession.builder.getOrCreate() > File "/.../pyspark/sql/session.py", line 476, in getOrCreate > from pyspark.sql.connect.session import SparkSession as RemoteSparkSession > File "/.../pyspark/sql/connect/session.py", line 53, in > from pyspark.sql.connect.client import SparkConnectClient, ChannelBuilder > File "/.../pyspark/sql/connect/client/__init__.py", line 22, in > from pyspark.sql.connect.client.core import * # noqa: F401,F403 > File "/.../pyspark/sql/connect/client/core.py", line 51, in > import google.protobuf.message > ModuleNotFoundError: No module named 'google > {code} > {code} > /.../pyspark/shell.py:57: UserWarning: Failed to initialize Spark session. > warnings.warn("Failed to initialize Spark session.") > Traceback (most recent call last): > File "/.../pyspark/shell.py", line 52, in > spark = SparkSession.builder.getOrCreate() > File "/.../pyspark/sql/session.py", line 476, in getOrCreate > from pyspark.sql.connect.session import SparkSession as RemoteSparkSession > File "/.../pyspark/sql/connect/session.py", line 53, in > from pyspark.sql.connect.client import SparkConnectClient, ChannelBuilder > File "/.../pyspark/sql/connect/client/__init__.py", line 22, in > from pyspark.sql.connect.client.core import * # noqa: F401,F403 > File "/.../pyspark/sql/connect/client/core.py", line 52, in > from grpc_status import rpc_status > ModuleNotFoundError: No module named 'grpc_status' > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45942) Only do the thread interruption check for putIterator on executors
[ https://issues.apache.org/jira/browse/SPARK-45942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-45942: Assignee: Huanli Wang > Only do the thread interruption check for putIterator on executors > -- > > Key: SPARK-45942 > URL: https://issues.apache.org/jira/browse/SPARK-45942 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Huanli Wang >Assignee: Huanli Wang >Priority: Major > Labels: pull-request-available > > https://issues.apache.org/jira/browse/SPARK-45025 > introduces a peaceful thread interruption handling. However, there is an edge > case: when a streaming query is stopped on the driver, it interrupts the > stream execution thread. If the streaming query is doing memory store > operations on driver and performs {{doPutIterator}} at the same time, the > [unroll process will be > broken|https://github.com/apache/spark/blob/39fc6108bfaaa0ce471f6460880109f948ba5c62/core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala#L224] > and [returns used > memory|https://github.com/apache/spark/blob/39fc6108bfaaa0ce471f6460880109f948ba5c62/core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala#L245-L247]. > This can result in {{closeChannelException}} as it falls into this [case > clause|https://github.com/apache/spark/blob/aa646d3050028272f7333deaef52f20e6975e0ed/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L1614-L1622] > which opens an I/O channel and persists the data into the disk. However, > because the thread is interrupted, the channel will be closed at the begin: > [https://github.com/openjdk-mirror/jdk7u-jdk/blob/master/src/share/classes/java/nio/channels/spi/AbstractInterruptibleChannel.java#L172] > and throws out {{closeChannelException}} > On executors, [the task will be killed if the thread is > interrupted|https://github.com/apache/spark/blob/39fc6108bfaaa0ce471f6460880109f948ba5c62/core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala#L374], > however, we don't do it on the driver. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45942) Only do the thread interruption check for putIterator on executors
[ https://issues.apache.org/jira/browse/SPARK-45942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-45942. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43823 [https://github.com/apache/spark/pull/43823] > Only do the thread interruption check for putIterator on executors > -- > > Key: SPARK-45942 > URL: https://issues.apache.org/jira/browse/SPARK-45942 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Huanli Wang >Assignee: Huanli Wang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > https://issues.apache.org/jira/browse/SPARK-45025 > introduces a peaceful thread interruption handling. However, there is an edge > case: when a streaming query is stopped on the driver, it interrupts the > stream execution thread. If the streaming query is doing memory store > operations on driver and performs {{doPutIterator}} at the same time, the > [unroll process will be > broken|https://github.com/apache/spark/blob/39fc6108bfaaa0ce471f6460880109f948ba5c62/core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala#L224] > and [returns used > memory|https://github.com/apache/spark/blob/39fc6108bfaaa0ce471f6460880109f948ba5c62/core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala#L245-L247]. > This can result in {{closeChannelException}} as it falls into this [case > clause|https://github.com/apache/spark/blob/aa646d3050028272f7333deaef52f20e6975e0ed/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L1614-L1622] > which opens an I/O channel and persists the data into the disk. However, > because the thread is interrupted, the channel will be closed at the begin: > [https://github.com/openjdk-mirror/jdk7u-jdk/blob/master/src/share/classes/java/nio/channels/spi/AbstractInterruptibleChannel.java#L172] > and throws out {{closeChannelException}} > On executors, [the task will be killed if the thread is > interrupted|https://github.com/apache/spark/blob/39fc6108bfaaa0ce471f6460880109f948ba5c62/core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala#L374], > however, we don't do it on the driver. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45986) Fix `pyspark.ml.torch.tests.test_distributor` in Python 3.11
[ https://issues.apache.org/jira/browse/SPARK-45986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-45986: Assignee: Hyukjin Kwon (was: Dongjoon Hyun) > Fix `pyspark.ml.torch.tests.test_distributor` in Python 3.11 > > > Key: SPARK-45986 > URL: https://issues.apache.org/jira/browse/SPARK-45986 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > https://github.com/apache/spark/actions/runs/6914662405/job/18812759511 > {code} > == > FAIL [0.000s]: test_local_training_succeeds > (pyspark.ml.torch.tests.test_distributor.TorchDistributorLocalUnitTests.test_local_training_succeeds) > [subtest: 1] > -- > Traceback (most recent call last): > File "/__w/spark/spark/python/pyspark/ml/torch/tests/test_distributor.py", > line 384, in test_local_training_succeeds > self.assertEqual( > AssertionError: '1' != '0' > - 1 > + 0 > == > FAIL [0.142s]: test_local_training_succeeds > (pyspark.ml.torch.tests.test_distributor.TorchDistributorLocalUnitTests.test_local_training_succeeds) > [subtest: 2] > -- > Traceback (most recent call last): > File "/__w/spark/spark/python/pyspark/ml/torch/tests/test_distributor.py", > line 384, in test_local_training_succeeds > self.assertEqual( > AssertionError: '1,2,0' != '0,1,2' > - 1,2,0 > + 0,1,2 > == > FAIL [0.000s]: test_local_training_succeeds > (pyspark.ml.torch.tests.test_distributor.TorchDistributorLocalUnitTestsII.test_local_training_succeeds) > [subtest: 1] > -- > Traceback (most recent call last): > File "/__w/spark/spark/python/pyspark/ml/torch/tests/test_distributor.py", > line 384, in test_local_training_succeeds > self.assertEqual( > AssertionError: '1' != '0' > - 1 > + 0 > == > FAIL [0.139s]: test_local_training_succeeds > (pyspark.ml.torch.tests.test_distributor.TorchDistributorLocalUnitTestsII.test_local_training_succeeds) > [subtest: 2] > -- > Traceback (most recent call last): > File "/__w/spark/spark/python/pyspark/ml/torch/tests/test_distributor.py", > line 384, in test_local_training_succeeds > self.assertEqual( > AssertionError: '1,2,0' != '0,1,2' > - 1,2,0 > + 0,1,2 > -- > Ran 23 tests in 166.741s > FAILED (failures=4) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45996) Show proper dependency requirement messages for Spark Connect
Hyukjin Kwon created SPARK-45996: Summary: Show proper dependency requirement messages for Spark Connect Key: SPARK-45996 URL: https://issues.apache.org/jira/browse/SPARK-45996 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 4.0.0 Reporter: Hyukjin Kwon {code} ./bin/pyspark --remote local {code} We should improve the error messages below. {code} /.../pyspark/shell.py:57: UserWarning: Failed to initialize Spark session. warnings.warn("Failed to initialize Spark session.") Traceback (most recent call last): File "/.../pyspark/shell.py", line 52, in spark = SparkSession.builder.getOrCreate() File "/.../pyspark/sql/session.py", line 476, in getOrCreate from pyspark.sql.connect.session import SparkSession as RemoteSparkSession File "/.../pyspark/sql/connect/session.py", line 53, in from pyspark.sql.connect.client import SparkConnectClient, ChannelBuilder File "/.../pyspark/sql/connect/client/__init__.py", line 22, in from pyspark.sql.connect.client.core import * # noqa: F401,F403 File "/.../pyspark/sql/connect/client/core.py", line 51, in import google.protobuf.message ModuleNotFoundError: No module named 'google {code} {code} /.../pyspark/shell.py:57: UserWarning: Failed to initialize Spark session. warnings.warn("Failed to initialize Spark session.") Traceback (most recent call last): File "/.../pyspark/shell.py", line 52, in spark = SparkSession.builder.getOrCreate() File "/.../pyspark/sql/session.py", line 476, in getOrCreate from pyspark.sql.connect.session import SparkSession as RemoteSparkSession File "/.../pyspark/sql/connect/session.py", line 53, in from pyspark.sql.connect.client import SparkConnectClient, ChannelBuilder File "/.../pyspark/sql/connect/client/__init__.py", line 22, in from pyspark.sql.connect.client.core import * # noqa: F401,F403 File "/.../pyspark/sql/connect/client/core.py", line 52, in from grpc_status import rpc_status ModuleNotFoundError: No module named 'grpc_status' {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45994) Change description-file to description_file
[ https://issues.apache.org/jira/browse/SPARK-45994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-45994: Assignee: Bjørn Jørgensen > Change description-file to description_file > --- > > Key: SPARK-45994 > URL: https://issues.apache.org/jira/browse/SPARK-45994 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Bjørn Jørgensen >Assignee: Bjørn Jørgensen >Priority: Major > Labels: pull-request-available > > + cp -r /home/bjorn/spark/data /home/bjorn/spark/dist > + '[' true == true ']' > + echo 'Building python distribution package' > Building python distribution package > + pushd /home/bjorn/spark/python > + rm -rf pyspark.egg-info > + python3 setup.py sdist > /usr/lib/python3.11/site-packages/setuptools/dist.py:745: > SetuptoolsDeprecationWarning: Invalid dash-separated options > !! > > > Usage of dash-separated 'description-file' will not be supported in > future > versions. Please use the underscore name 'description_file' instead. > This deprecation is overdue, please update your project and remove > deprecated > calls to avoid build errors in the future. > See > https://setuptools.pypa.io/en/latest/userguide/declarative_config.html for > details. > > > !! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45994) Change description-file to description_file
[ https://issues.apache.org/jira/browse/SPARK-45994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-45994. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43891 [https://github.com/apache/spark/pull/43891] > Change description-file to description_file > --- > > Key: SPARK-45994 > URL: https://issues.apache.org/jira/browse/SPARK-45994 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Bjørn Jørgensen >Assignee: Bjørn Jørgensen >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > + cp -r /home/bjorn/spark/data /home/bjorn/spark/dist > + '[' true == true ']' > + echo 'Building python distribution package' > Building python distribution package > + pushd /home/bjorn/spark/python > + rm -rf pyspark.egg-info > + python3 setup.py sdist > /usr/lib/python3.11/site-packages/setuptools/dist.py:745: > SetuptoolsDeprecationWarning: Invalid dash-separated options > !! > > > Usage of dash-separated 'description-file' will not be supported in > future > versions. Please use the underscore name 'description_file' instead. > This deprecation is overdue, please update your project and remove > deprecated > calls to avoid build errors in the future. > See > https://setuptools.pypa.io/en/latest/userguide/declarative_config.html for > details. > > > !! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45995) Upgrade R version from 4.3.1 to 4.3.2 in AppVeyor
Hyukjin Kwon created SPARK-45995: Summary: Upgrade R version from 4.3.1 to 4.3.2 in AppVeyor Key: SPARK-45995 URL: https://issues.apache.org/jira/browse/SPARK-45995 Project: Spark Issue Type: Improvement Components: R Affects Versions: 4.0.0 Reporter: Hyukjin Kwon https://cran.r-project.org/doc/manuals/r-release/NEWS.html -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45988) Fix `pyspark.pandas.tests.computation.test_apply_func` in Python 3.11
[ https://issues.apache.org/jira/browse/SPARK-45988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-45988. -- Fix Version/s: 4.0.0 Resolution: Fixed Fixed in https://github.com/apache/spark/pull/43888 > Fix `pyspark.pandas.tests.computation.test_apply_func` in Python 3.11 > - > > Key: SPARK-45988 > URL: https://issues.apache.org/jira/browse/SPARK-45988 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.5.0, 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > https://github.com/apache/spark/actions/runs/6914662405/job/18812759697 > {code} > == > ERROR [0.686s]: test_apply_batch_with_type > (pyspark.pandas.tests.computation.test_apply_func.FrameApplyFunctionTests.test_apply_batch_with_type) > -- > Traceback (most recent call last): > File > "/__w/spark/spark/python/pyspark/pandas/tests/computation/test_apply_func.py", > line 248, in test_apply_batch_with_type > def identify3(x) -> ps.DataFrame[float, [int, List[int]]]: > ^ > File "/__w/spark/spark/python/pyspark/pandas/frame.py", line 13540, in > __class_getitem__ > return create_tuple_for_frame_type(params) >^^^ > File "/__w/spark/spark/python/pyspark/pandas/typedef/typehints.py", line > 721, in create_tuple_for_frame_type > return Tuple[_to_type_holders(params)] > > File "/__w/spark/spark/python/pyspark/pandas/typedef/typehints.py", line > 766, in _to_type_holders > data_types = _new_type_holders(data_types, NameTypeHolder) > ^ > File "/__w/spark/spark/python/pyspark/pandas/typedef/typehints.py", line > 832, in _new_type_holders > raise TypeError( > TypeError: Type hints should be specified as one of: > - DataFrame[type, type, ...] > - DataFrame[name: type, name: type, ...] > - DataFrame[dtypes instance] > - DataFrame[zip(names, types)] > - DataFrame[index_type, [type, ...]] > - DataFrame[(index_name, index_type), [(name, type), ...]] > - DataFrame[dtype instance, dtypes instance] > - DataFrame[(index_name, index_type), zip(names, types)] > - DataFrame[[index_type, ...], [type, ...]] > - DataFrame[[(index_name, index_type), ...], [(name, type), ...]] > - DataFrame[dtypes instance, dtypes instance] > - DataFrame[zip(index_names, index_types), zip(names, types)] > However, got (, typing.List[int]). > -- > Ran 10 tests in 34.327s > FAILED (errors=1) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45989) Fix `pyspark.pandas.tests.connect.computation.test_parity_apply_func` in Python 3.11
[ https://issues.apache.org/jira/browse/SPARK-45989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-45989. -- Fix Version/s: 4.0.0 Resolution: Fixed Fixed in https://github.com/apache/spark/pull/43888 > Fix `pyspark.pandas.tests.connect.computation.test_parity_apply_func` in > Python 3.11 > > > Key: SPARK-45989 > URL: https://issues.apache.org/jira/browse/SPARK-45989 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 4.0.0 > > > https://github.com/apache/spark/actions/runs/6914662405/job/18816505612 > {code} > == > ERROR [1.237s]: test_apply_batch_with_type > (pyspark.pandas.tests.connect.computation.test_parity_apply_func.FrameParityApplyFunctionTests.test_apply_batch_with_type) > -- > Traceback (most recent call last): > File > "/__w/spark/spark/python/pyspark/pandas/tests/computation/test_apply_func.py", > line 248, in test_apply_batch_with_type > def identify3(x) -> ps.DataFrame[float, [int, List[int]]]: > ^ > File "/__w/spark/spark/python/pyspark/pandas/frame.py", line 13540, in > __class_getitem__ > return create_tuple_for_frame_type(params) >^^^ > File "/__w/spark/spark/python/pyspark/pandas/typedef/typehints.py", line > 721, in create_tuple_for_frame_type > return Tuple[_to_type_holders(params)] > > File "/__w/spark/spark/python/pyspark/pandas/typedef/typehints.py", line > 766, in _to_type_holders > data_types = _new_type_holders(data_types, NameTypeHolder) > ^ > File "/__w/spark/spark/python/pyspark/pandas/typedef/typehints.py", line > 832, in _new_type_holders > raise TypeError( > TypeError: Type hints should be specified as one of: > - DataFrame[type, type, ...] > - DataFrame[name: type, name: type, ...] > - DataFrame[dtypes instance] > - DataFrame[zip(names, types)] > - DataFrame[index_type, [type, ...]] > - DataFrame[(index_name, index_type), [(name, type), ...]] > - DataFrame[dtype instance, dtypes instance] > - DataFrame[(index_name, index_type), zip(names, types)] > - DataFrame[[index_type, ...], [type, ...]] > - DataFrame[[(index_name, index_type), ...], [(name, type), ...]] > - DataFrame[dtypes instance, dtypes instance] > - DataFrame[zip(index_names, index_types), zip(names, types)] > However, got (, typing.List[int]). > -- > Ran 10 tests in 78.247s > FAILED (errors=1) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45965) Move DSv2 partitioning expressions into functions.partitioning
[ https://issues.apache.org/jira/browse/SPARK-45965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-45965. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43858 [https://github.com/apache/spark/pull/43858] > Move DSv2 partitioning expressions into functions.partitioning > -- > > Key: SPARK-45965 > URL: https://issues.apache.org/jira/browse/SPARK-45965 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > We weren't able to move those partitioning expressions into nested object > because of Scala 2.12 limitation. Now we're able to do it with Scala 2.13 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45985) Refine docstring of `DataFrame.intersect`
Hyukjin Kwon created SPARK-45985: Summary: Refine docstring of `DataFrame.intersect` Key: SPARK-45985 URL: https://issues.apache.org/jira/browse/SPARK-45985 Project: Spark Issue Type: Sub-task Components: Documentation, PySpark Affects Versions: 4.0.0 Reporter: Hyukjin Kwon -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45984) Refine docstring of `DataFrame.intersectAll`
Hyukjin Kwon created SPARK-45984: Summary: Refine docstring of `DataFrame.intersectAll` Key: SPARK-45984 URL: https://issues.apache.org/jira/browse/SPARK-45984 Project: Spark Issue Type: Sub-task Components: Documentation, PySpark Affects Versions: 4.0.0 Reporter: Hyukjin Kwon -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45983) Refine docstring of `DataFrame.substract`
Hyukjin Kwon created SPARK-45983: Summary: Refine docstring of `DataFrame.substract` Key: SPARK-45983 URL: https://issues.apache.org/jira/browse/SPARK-45983 Project: Spark Issue Type: Sub-task Components: Documentation, PySpark Affects Versions: 4.0.0 Reporter: Hyukjin Kwon -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45970) Provide partitioning expressions in Java as same as Scala
Hyukjin Kwon created SPARK-45970: Summary: Provide partitioning expressions in Java as same as Scala Key: SPARK-45970 URL: https://issues.apache.org/jira/browse/SPARK-45970 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: Hyukjin Kwon See https://github.com/apache/spark/pull/43858. Once Scala 3 is out, we can support the same way of partitioning expressions such as: {code} import static org.apache.spark.sql.functions.partitioning.*; {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45952) Use built-in math constant in math functions
[ https://issues.apache.org/jira/browse/SPARK-45952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-45952: Assignee: Ruifeng Zheng > Use built-in math constant in math functions > - > > Key: SPARK-45952 > URL: https://issues.apache.org/jira/browse/SPARK-45952 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45952) Use built-in math constant in math functions
[ https://issues.apache.org/jira/browse/SPARK-45952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-45952. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43837 [https://github.com/apache/spark/pull/43837] > Use built-in math constant in math functions > - > > Key: SPARK-45952 > URL: https://issues.apache.org/jira/browse/SPARK-45952 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45912) Enhancement of XSDToSchema API: Change to HDFS API for cloud storage accessibility
[ https://issues.apache.org/jira/browse/SPARK-45912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-45912: Assignee: Shujing Yang > Enhancement of XSDToSchema API: Change to HDFS API for cloud storage > accessibility > --- > > Key: SPARK-45912 > URL: https://issues.apache.org/jira/browse/SPARK-45912 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Shujing Yang >Assignee: Shujing Yang >Priority: Major > Labels: pull-request-available > > Previously, it utilized `java.nio.path`, which limited file reading to local > file systems only. By changing this to an HDFS-compatible API, we now enable > the XSDToSchema function to access files in cloud storage. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45912) Enhancement of XSDToSchema API: Change to HDFS API for cloud storage accessibility
[ https://issues.apache.org/jira/browse/SPARK-45912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-45912. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43789 [https://github.com/apache/spark/pull/43789] > Enhancement of XSDToSchema API: Change to HDFS API for cloud storage > accessibility > --- > > Key: SPARK-45912 > URL: https://issues.apache.org/jira/browse/SPARK-45912 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Shujing Yang >Assignee: Shujing Yang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Previously, it utilized `java.nio.path`, which limited file reading to local > file systems only. By changing this to an HDFS-compatible API, we now enable > the XSDToSchema function to access files in cloud storage. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45964) Remove private[sql] in XML and JSON package under catalyst package
Hyukjin Kwon created SPARK-45964: Summary: Remove private[sql] in XML and JSON package under catalyst package Key: SPARK-45964 URL: https://issues.apache.org/jira/browse/SPARK-45964 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 4.0.0 Reporter: Hyukjin Kwon catalyst is intenral, so we don't need to annotate them as private[sql] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45963) Restore documentation for DSv2 API
Hyukjin Kwon created SPARK-45963: Summary: Restore documentation for DSv2 API Key: SPARK-45963 URL: https://issues.apache.org/jira/browse/SPARK-45963 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.5.0, 3.4.1, 4.0.0 Reporter: Hyukjin Kwon DSv2 documentation is mistakenly gone after https://github.com/apache/spark/pull/38392. It used to exist in 3.3.0: https://spark.apache.org/docs/3.3.0/api/scala/org/apache/spark/sql/connector/catalog/index.html -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45950) Fix IvyTestUtils#createIvyDescriptor function and make common-utils module can run tests on GitHub Action
[ https://issues.apache.org/jira/browse/SPARK-45950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-45950: Assignee: Yang Jie > Fix IvyTestUtils#createIvyDescriptor function and make common-utils module > can run tests on GitHub Action > - > > Key: SPARK-45950 > URL: https://issues.apache.org/jira/browse/SPARK-45950 > Project: Spark > Issue Type: Bug > Components: Project Infra, Spark Core >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45950) Fix IvyTestUtils#createIvyDescriptor function and make common-utils module can run tests on GitHub Action
[ https://issues.apache.org/jira/browse/SPARK-45950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-45950. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43834 [https://github.com/apache/spark/pull/43834] > Fix IvyTestUtils#createIvyDescriptor function and make common-utils module > can run tests on GitHub Action > - > > Key: SPARK-45950 > URL: https://issues.apache.org/jira/browse/SPARK-45950 > Project: Spark > Issue Type: Bug > Components: Project Infra, Spark Core >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45960) Add Python 3.10 to the Daily Python Github Action job
[ https://issues.apache.org/jira/browse/SPARK-45960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-45960. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43847 [https://github.com/apache/spark/pull/43847] > Add Python 3.10 to the Daily Python Github Action job > - > > Key: SPARK-45960 > URL: https://issues.apache.org/jira/browse/SPARK-45960 > Project: Spark > Issue Type: Task > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45851) (Scala) Support different retry policies for connect client
[ https://issues.apache.org/jira/browse/SPARK-45851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-45851. -- Fix Version/s: 4.0.0 Assignee: Alice Sayutina Resolution: Fixed Fixed in https://github.com/apache/spark/pull/43757 > (Scala) Support different retry policies for connect client > --- > > Key: SPARK-45851 > URL: https://issues.apache.org/jira/browse/SPARK-45851 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 4.0.0 >Reporter: Alice Sayutina >Assignee: Alice Sayutina >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Support multiple retry policies defined at the same time. Each policy > determines which error types it can retry and how exactly. > For instance, networking errors should generally be retried differently that > remote resource being available. > Relevant python ticket: SPARK-45733 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45935) Fix RST files link substitutions error
[ https://issues.apache.org/jira/browse/SPARK-45935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-45935: Assignee: BingKun Pan > Fix RST files link substitutions error > -- > > Key: SPARK-45935 > URL: https://issues.apache.org/jira/browse/SPARK-45935 > Project: Spark > Issue Type: Bug > Components: Documentation, PySpark >Affects Versions: 3.3.3, 3.4.1, 3.5.0, 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45935) Fix RST files link substitutions error
[ https://issues.apache.org/jira/browse/SPARK-45935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-45935. -- Fix Version/s: 3.3.4 3.5.1 4.0.0 3.4.2 Resolution: Fixed Issue resolved by pull request 43815 [https://github.com/apache/spark/pull/43815] > Fix RST files link substitutions error > -- > > Key: SPARK-45935 > URL: https://issues.apache.org/jira/browse/SPARK-45935 > Project: Spark > Issue Type: Bug > Components: Documentation, PySpark >Affects Versions: 3.3.3, 3.4.1, 3.5.0, 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > Fix For: 3.3.4, 3.5.1, 4.0.0, 3.4.2 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45930) Allow non-deterministic Python UDFs in MapInPandas/MapInArrow
[ https://issues.apache.org/jira/browse/SPARK-45930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-45930. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43810 [https://github.com/apache/spark/pull/43810] > Allow non-deterministic Python UDFs in MapInPandas/MapInArrow > - > > Key: SPARK-45930 > URL: https://issues.apache.org/jira/browse/SPARK-45930 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Currently if a Python udf is non-deterministic, the analyzer will fail with > this error:[INVALID_NON_DETERMINISTIC_EXPRESSIONS] The operator expects a > deterministic expression, but the actual expression is "pyUDF()", "a". > SQLSTATE: 42K0E; -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45930) Allow non-deterministic Python UDFs in MapInPandas/MapInArrow
[ https://issues.apache.org/jira/browse/SPARK-45930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-45930: Assignee: Allison Wang > Allow non-deterministic Python UDFs in MapInPandas/MapInArrow > - > > Key: SPARK-45930 > URL: https://issues.apache.org/jira/browse/SPARK-45930 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > Labels: pull-request-available > > Currently if a Python udf is non-deterministic, the analyzer will fail with > this error:[INVALID_NON_DETERMINISTIC_EXPRESSIONS] The operator expects a > deterministic expression, but the actual expression is "pyUDF()", "a". > SQLSTATE: 42K0E; -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org