[spark] branch branch-3.5 updated: [SPARK-44644][PYTHON][3.5] Improve error messages for Python UDTFs with pickling errors
This is an automated email from the ASF dual-hosted git repository. ruifengz pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new b5ab247aa6e [SPARK-44644][PYTHON][3.5] Improve error messages for Python UDTFs with pickling errors b5ab247aa6e is described below commit b5ab247aa6e45180c2e826da74fcb615f3da3335 Author: allisonwang-db AuthorDate: Mon Aug 7 13:03:03 2023 +0800 [SPARK-44644][PYTHON][3.5] Improve error messages for Python UDTFs with pickling errors ### What changes were proposed in this pull request? Cherry-pick https://github.com/apache/spark/commit/62415dc59627e1f7b4e3449ae728e93c1fc0b74f This PR improves the error messages when a Python UDTF failed to pickle. ### Why are the changes needed? To make the error message more user-friendly ### Does this PR introduce _any_ user-facing change? Yes, before this PR, when a UDTF fails to pickle, it throws this confusing exception: ``` _pickle.PicklingError: Cannot pickle files that are not opened for reading: w ``` After this PR, the error is more clear: `[UDTF_SERIALIZATION_ERROR] Cannot serialize the UDTF 'TestUDTF': Please check the stack trace and make sure that the function is serializable.` And for spark session access inside a UDTF: `[UDTF_SERIALIZATION_ERROR] it appears that you are attempting to reference SparkSession inside a UDTF. SparkSession can only be used on the driver, not in code that runs on workers. Please remove the reference and try again.` ### How was this patch tested? New UTs. Closes #42349 from allisonwang-db/spark-44644-3.5. Authored-by: allisonwang-db Signed-off-by: Ruifeng Zheng --- python/pyspark/cloudpickle/cloudpickle_fast.py | 2 +- python/pyspark/errors/error_classes.py | 5 + python/pyspark/sql/connect/plan.py | 15 +++-- python/pyspark/sql/tests/test_udtf.py | 30 +- python/pyspark/sql/udtf.py | 25 - 5 files changed, 72 insertions(+), 5 deletions(-) diff --git a/python/pyspark/cloudpickle/cloudpickle_fast.py b/python/pyspark/cloudpickle/cloudpickle_fast.py index 63aaffa096b..ee1f4b8ee96 100644 --- a/python/pyspark/cloudpickle/cloudpickle_fast.py +++ b/python/pyspark/cloudpickle/cloudpickle_fast.py @@ -631,7 +631,7 @@ class CloudPickler(Pickler): try: return Pickler.dump(self, obj) except RuntimeError as e: -if "recursion" in e.args[0]: +if len(e.args) > 0 and "recursion" in e.args[0]: msg = ( "Could not pickle object as excessively deep recursion " "required." diff --git a/python/pyspark/errors/error_classes.py b/python/pyspark/errors/error_classes.py index 4ea3e678810..971dc59bbb2 100644 --- a/python/pyspark/errors/error_classes.py +++ b/python/pyspark/errors/error_classes.py @@ -743,6 +743,11 @@ ERROR_CLASSES_JSON = """ "Mismatch in return type for the UDTF ''. Expected a 'StructType', but got ''. Please ensure the return type is a correctly formatted StructType." ] }, + "UDTF_SERIALIZATION_ERROR" : { +"message" : [ + "Cannot serialize the UDTF '': " +] + }, "UNEXPECTED_RESPONSE_FROM_SERVER" : { "message" : [ "Unexpected response from iterator server." diff --git a/python/pyspark/sql/connect/plan.py b/python/pyspark/sql/connect/plan.py index 3390faa04de..2e918700848 100644 --- a/python/pyspark/sql/connect/plan.py +++ b/python/pyspark/sql/connect/plan.py @@ -21,6 +21,7 @@ check_dependencies(__name__) from typing import Any, List, Optional, Type, Sequence, Union, cast, TYPE_CHECKING, Mapping, Dict import functools import json +import pickle from threading import Lock from inspect import signature, isclass @@ -40,7 +41,7 @@ from pyspark.sql.connect.expressions import ( LiteralExpression, ) from pyspark.sql.connect.types import pyspark_types_to_proto_types, UnparsedDataType -from pyspark.errors import PySparkTypeError, PySparkNotImplementedError +from pyspark.errors import PySparkTypeError, PySparkNotImplementedError, PySparkRuntimeError if TYPE_CHECKING: from pyspark.sql.connect._typing import ColumnOrName @@ -2200,7 +2201,17 @@ class PythonUDTF: assert self._return_type is not None udtf.return_type.CopyFrom(pyspark_types_to_proto_types(self._return_type)) udtf.eval_type = self._eval_type -udtf.command = CloudPickleSerializer().dumps(self._func) +try: +udtf.command = CloudPickleSerializer().dumps(self._func) +except pickle.PicklingError: +raise PySparkRuntimeError( +error_class="UDTF_SERIALIZATION_ERROR", +message_parameters={ +
[spark] branch branch-3.5 updated: [SPARK-44682][PS] Make pandas error class message_parameters strings
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new 200440e1845 [SPARK-44682][PS] Make pandas error class message_parameters strings 200440e1845 is described below commit 200440e18458e84d178d9eda5a2c54bcedc634ee Author: Amanda Liu AuthorDate: Mon Aug 7 12:01:45 2023 +0900 [SPARK-44682][PS] Make pandas error class message_parameters strings This PR converts the types for message_parameters for pandas error classes to string, to ensure ability to compare error class messages in tests. The change ensures the ability to compare error class messages in tests. No, the PR does not affect the user-facing view of the error messages. Updated `python/pyspark/pandas/tests/test_utils.py` and existing tests Closes #42348 from asl3/string-pandas-error-types. Authored-by: Amanda Liu Signed-off-by: Hyukjin Kwon (cherry picked from commit df8e52d84d1eabf48f68d09491f66a0835f41693) Signed-off-by: Hyukjin Kwon --- python/pyspark/pandas/tests/test_utils.py | 16 - python/pyspark/testing/pandasutils.py | 56 +++ 2 files changed, 36 insertions(+), 36 deletions(-) diff --git a/python/pyspark/pandas/tests/test_utils.py b/python/pyspark/pandas/tests/test_utils.py index 3d658446f27..0bb03dd8749 100644 --- a/python/pyspark/pandas/tests/test_utils.py +++ b/python/pyspark/pandas/tests/test_utils.py @@ -208,10 +208,10 @@ class UtilsTestsMixin: exception=pe.exception, error_class="DIFFERENT_PANDAS_SERIES", message_parameters={ -"left": series1, -"left_dtype": series1.dtype, -"right": series2, -"right_dtype": series2.dtype, +"left": series1.to_string(), +"left_dtype": str(series1.dtype), +"right": series2.to_string(), +"right_dtype": str(series2.dtype), }, ) @@ -227,9 +227,9 @@ class UtilsTestsMixin: error_class="DIFFERENT_PANDAS_INDEX", message_parameters={ "left": index1, -"left_dtype": index1.dtype, +"left_dtype": str(index1.dtype), "right": index2, -"right_dtype": index2.dtype, +"right_dtype": str(index2.dtype), }, ) @@ -247,9 +247,9 @@ class UtilsTestsMixin: error_class="DIFFERENT_PANDAS_MULTIINDEX", message_parameters={ "left": multiindex1, -"left_dtype": multiindex1.dtype, +"left_dtype": str(multiindex1.dtype), "right": multiindex2, -"right_dtype": multiindex2.dtype, +"right_dtype": str(multiindex1.dtype), }, ) diff --git a/python/pyspark/testing/pandasutils.py b/python/pyspark/testing/pandasutils.py index 58999253521..39196873482 100644 --- a/python/pyspark/testing/pandasutils.py +++ b/python/pyspark/testing/pandasutils.py @@ -124,10 +124,10 @@ def _assert_pandas_equal( raise PySparkAssertionError( error_class="DIFFERENT_PANDAS_SERIES", message_parameters={ -"left": left, -"left_dtype": left.dtype, -"right": right, -"right_dtype": right.dtype, +"left": left.to_string(), +"left_dtype": str(left.dtype), +"right": right.to_string(), +"right_dtype": str(right.dtype), }, ) elif isinstance(left, pd.Index) and isinstance(right, pd.Index): @@ -143,9 +143,9 @@ def _assert_pandas_equal( error_class="DIFFERENT_PANDAS_INDEX", message_parameters={ "left": left, -"left_dtype": left.dtype, +"left_dtype": str(left.dtype), "right": right, -"right_dtype": right.dtype, +"right_dtype": str(right.dtype), }, ) else: @@ -228,10 +228,10 @@ def _assert_pandas_almost_equal( raise PySparkAssertionError( error_class="DIFFERENT_PANDAS_SERIES", message_parameters={ -"left": left, -"left_dtype": left.dtype, -"right": right, -"right_dtype": right.dtype, +"left": left.to_string(), +"left_dtype": str(left.dtype), +"right": right.to_string(), +"right_dtype": str(right.dtype), }, ) for lnull, rnull
[spark] branch master updated: [SPARK-44682][PS] Make pandas error class message_parameters strings
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new df8e52d84d1 [SPARK-44682][PS] Make pandas error class message_parameters strings df8e52d84d1 is described below commit df8e52d84d1eabf48f68d09491f66a0835f41693 Author: Amanda Liu AuthorDate: Mon Aug 7 12:01:45 2023 +0900 [SPARK-44682][PS] Make pandas error class message_parameters strings ### What changes were proposed in this pull request? This PR converts the types for message_parameters for pandas error classes to string, to ensure ability to compare error class messages in tests. ### Why are the changes needed? The change ensures the ability to compare error class messages in tests. ### Does this PR introduce _any_ user-facing change? No, the PR does not affect the user-facing view of the error messages. ### How was this patch tested? Updated `python/pyspark/pandas/tests/test_utils.py` and existing tests Closes #42348 from asl3/string-pandas-error-types. Authored-by: Amanda Liu Signed-off-by: Hyukjin Kwon --- python/pyspark/pandas/tests/test_utils.py | 16 - python/pyspark/testing/pandasutils.py | 56 +++ 2 files changed, 36 insertions(+), 36 deletions(-) diff --git a/python/pyspark/pandas/tests/test_utils.py b/python/pyspark/pandas/tests/test_utils.py index 3d658446f27..0bb03dd8749 100644 --- a/python/pyspark/pandas/tests/test_utils.py +++ b/python/pyspark/pandas/tests/test_utils.py @@ -208,10 +208,10 @@ class UtilsTestsMixin: exception=pe.exception, error_class="DIFFERENT_PANDAS_SERIES", message_parameters={ -"left": series1, -"left_dtype": series1.dtype, -"right": series2, -"right_dtype": series2.dtype, +"left": series1.to_string(), +"left_dtype": str(series1.dtype), +"right": series2.to_string(), +"right_dtype": str(series2.dtype), }, ) @@ -227,9 +227,9 @@ class UtilsTestsMixin: error_class="DIFFERENT_PANDAS_INDEX", message_parameters={ "left": index1, -"left_dtype": index1.dtype, +"left_dtype": str(index1.dtype), "right": index2, -"right_dtype": index2.dtype, +"right_dtype": str(index2.dtype), }, ) @@ -247,9 +247,9 @@ class UtilsTestsMixin: error_class="DIFFERENT_PANDAS_MULTIINDEX", message_parameters={ "left": multiindex1, -"left_dtype": multiindex1.dtype, +"left_dtype": str(multiindex1.dtype), "right": multiindex2, -"right_dtype": multiindex2.dtype, +"right_dtype": str(multiindex1.dtype), }, ) diff --git a/python/pyspark/testing/pandasutils.py b/python/pyspark/testing/pandasutils.py index 58999253521..39196873482 100644 --- a/python/pyspark/testing/pandasutils.py +++ b/python/pyspark/testing/pandasutils.py @@ -124,10 +124,10 @@ def _assert_pandas_equal( raise PySparkAssertionError( error_class="DIFFERENT_PANDAS_SERIES", message_parameters={ -"left": left, -"left_dtype": left.dtype, -"right": right, -"right_dtype": right.dtype, +"left": left.to_string(), +"left_dtype": str(left.dtype), +"right": right.to_string(), +"right_dtype": str(right.dtype), }, ) elif isinstance(left, pd.Index) and isinstance(right, pd.Index): @@ -143,9 +143,9 @@ def _assert_pandas_equal( error_class="DIFFERENT_PANDAS_INDEX", message_parameters={ "left": left, -"left_dtype": left.dtype, +"left_dtype": str(left.dtype), "right": right, -"right_dtype": right.dtype, +"right_dtype": str(right.dtype), }, ) else: @@ -228,10 +228,10 @@ def _assert_pandas_almost_equal( raise PySparkAssertionError( error_class="DIFFERENT_PANDAS_SERIES", message_parameters={ -"left": left, -"left_dtype": left.dtype, -"right": right, -"right_dtype": right.dtype, +"left": left.to_string(), +"left_dtype": str(left.dtype), +"right": right.to_string(), +"right_dtype":
[spark] branch master updated: [SPARK-44264][PYTHON][ML][TESTS][FOLLOWUP] Adding Deepspeed To The Test Dockerfile
This is an automated email from the ASF dual-hosted git repository. ruifengz pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 7515061ec23 [SPARK-44264][PYTHON][ML][TESTS][FOLLOWUP] Adding Deepspeed To The Test Dockerfile 7515061ec23 is described below commit 7515061ec237b0393e2fcc064a309fae29502dff Author: Mathew Jacob AuthorDate: Mon Aug 7 10:57:02 2023 +0800 [SPARK-44264][PYTHON][ML][TESTS][FOLLOWUP] Adding Deepspeed To The Test Dockerfile ### What changes were proposed in this pull request? Added tests to the Dockerfile for tests in OSS Spark CI. ### Why are the changes needed? They'll skip the deepspeed tests otherwise. ### Does this PR introduce _any_ user-facing change? Nope, testing infra. ### How was this patch tested? Running the tests on machine. Closes #42347 from mathewjacob1002/testing_infra. Authored-by: Mathew Jacob Signed-off-by: Ruifeng Zheng --- dev/infra/Dockerfile | 2 ++ 1 file changed, 2 insertions(+) diff --git a/dev/infra/Dockerfile b/dev/infra/Dockerfile index 9d7b29e25b4..b69e682f239 100644 --- a/dev/infra/Dockerfile +++ b/dev/infra/Dockerfile @@ -73,3 +73,5 @@ RUN python3.9 -m pip install grpcio protobuf googleapis-common-protos grpcio-sta # Add torch as a testing dependency for TorchDistributor RUN python3.9 -m pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu RUN python3.9 -m pip install torcheval +# Add Deepspeed as a testing dependency for DeepspeedTorchDistributor +RUN python3.9 -m pip install deepspeed - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.5 updated: [SPARK-44693][BUILD] Rename the `object Catalyst` in SparkBuild to `object SqlApi`
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new 816ee7e2716 [SPARK-44693][BUILD] Rename the `object Catalyst` in SparkBuild to `object SqlApi` 816ee7e2716 is described below commit 816ee7e2716f2652e7cebe85fe949b6d9b44ffab Author: yangjie01 AuthorDate: Mon Aug 7 10:27:53 2023 +0800 [SPARK-44693][BUILD] Rename the `object Catalyst` in SparkBuild to `object SqlApi` ### What changes were proposed in this pull request? This PR renames the Setting object used by the `SqlApi` module in `SparkBuild/scala` from `object Catalyst` to `object SqlApi`. ### Why are the changes needed? The `SqlApi` module should use a more appropriate Setting object name. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - Pass GitHub Actions Closes #42361 from LuciferYang/rename-catalyst-2-sqlapi. Authored-by: yangjie01 Signed-off-by: yangjie01 (cherry picked from commit a98b11274d95f7c9f6e550ef6394e803bc0c17ca) Signed-off-by: yangjie01 --- project/SparkBuild.scala | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/project/SparkBuild.scala b/project/SparkBuild.scala index 79007626026..bd65d3c4bd4 100644 --- a/project/SparkBuild.scala +++ b/project/SparkBuild.scala @@ -449,7 +449,7 @@ object SparkBuild extends PomBuild { enable(Unidoc.settings)(spark) /* Sql-api ANTLR generation settings */ - enable(Catalyst.settings)(sqlApi) + enable(SqlApi.settings)(sqlApi) /* Spark SQL Core console settings */ enable(SQL.settings)(sql) @@ -1169,7 +1169,7 @@ object OldDeps { ) } -object Catalyst { +object SqlApi { import com.simplytyped.Antlr4Plugin import com.simplytyped.Antlr4Plugin.autoImport._ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (656bf36363c -> a98b11274d9)
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 656bf36363c [SPARK-44690][SPARK-44376][BUILD] Downgrade Scala to 2.13.8 add a98b11274d9 [SPARK-44693][BUILD] Rename the `object Catalyst` in SparkBuild to `object SqlApi` No new revisions were added by this update. Summary of changes: project/SparkBuild.scala | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.5 updated: [SPARK-44690][SPARK-44376][BUILD] Downgrade Scala to 2.13.8
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new 1cfa2624c93 [SPARK-44690][SPARK-44376][BUILD] Downgrade Scala to 2.13.8 1cfa2624c93 is described below commit 1cfa2624c933088deef4281b207631c806230440 Author: yangjie01 AuthorDate: Mon Aug 7 10:13:59 2023 +0800 [SPARK-44690][SPARK-44376][BUILD] Downgrade Scala to 2.13.8 ### What changes were proposed in this pull request? The aim of this PR is to downgrade the Scala 2.13 dependency to 2.13.8 to ensure that Spark can be build with `-target:jvm-1.8`, and tested with Java 11/17. ### Why are the changes needed? As reported in SPARK-44376, there are issues when maven build and test using Java 11/17 with `-target:jvm-1.8`: - run `build/mvn clean install -Pscala-2.13` with Java 17 ``` [INFO] --- scala-maven-plugin:4.8.0:compile (scala-compile-first) spark-core_2.13 --- [INFO] Compiler bridge file: /Users/yangjie01/.sbt/1.0/zinc/org.scala-sbt/org.scala-sbt-compiler-bridge_2.13-1.8.0-bin_2.13.11__61.0-1.8.0_20221110T195421.jar [INFO] compiling 602 Scala sources and 77 Java sources to /Users/yangjie01/SourceCode/git/spark-mine-13/core/target/scala-2.13/classes ... [WARNING] [Warn] : [deprecation | origin= | version=] -target is deprecated: Use -release instead to compile against the correct platform API. [ERROR] [Error] /Users/yangjie01/SourceCode/git/spark-mine-13/core/src/main/scala/org/apache/spark/serializer/SerializationDebugger.scala:71: not found: value sun [ERROR] [Error] /Users/yangjie01/SourceCode/git/spark-mine-13/core/src/main/scala/org/apache/spark/storage/StorageUtils.scala:26: not found: object sun [ERROR] [Error] /Users/yangjie01/SourceCode/git/spark-mine-13/core/src/main/scala/org/apache/spark/storage/StorageUtils.scala:27: not found: object sun [ERROR] [Error] /Users/yangjie01/SourceCode/git/spark-mine-13/core/src/main/scala/org/apache/spark/storage/StorageUtils.scala:206: not found: type DirectBuffer [ERROR] [Error] /Users/yangjie01/SourceCode/git/spark-mine-13/core/src/main/scala/org/apache/spark/storage/StorageUtils.scala:210: not found: type Unsafe [ERROR] [Error] /Users/yangjie01/SourceCode/git/spark-mine-13/core/src/main/scala/org/apache/spark/storage/StorageUtils.scala:212: not found: type Unsafe [ERROR] [Error] /Users/yangjie01/SourceCode/git/spark-mine-13/core/src/main/scala/org/apache/spark/storage/StorageUtils.scala:213: not found: type DirectBuffer [ERROR] [Error] /Users/yangjie01/SourceCode/git/spark-mine-13/core/src/main/scala/org/apache/spark/storage/StorageUtils.scala:216: not found: type DirectBuffer [ERROR] [Error] /Users/yangjie01/SourceCode/git/spark-mine-13/core/src/main/scala/org/apache/spark/storage/StorageUtils.scala:236: not found: type DirectBuffer [ERROR] [Error] /Users/yangjie01/SourceCode/git/spark-mine-13/core/src/main/scala/org/apache/spark/storage/StorageUtils.scala:26: Unused import [ERROR] [Error] /Users/yangjie01/SourceCode/git/spark-mine-13/core/src/main/scala/org/apache/spark/storage/StorageUtils.scala:27: Unused import [ERROR] [Error] /Users/yangjie01/SourceCode/git/spark-mine-13/core/src/main/scala/org/apache/spark/util/ClosureCleaner.scala:452: not found: value sun [ERROR] [Error] /Users/yangjie01/SourceCode/git/spark-mine-13/core/src/main/scala/org/apache/spark/util/SignalUtils.scala:26: not found: object sun [ERROR] [Error] /Users/yangjie01/SourceCode/git/spark-mine-13/core/src/main/scala/org/apache/spark/util/SignalUtils.scala:99: not found: type SignalHandler [ERROR] [Error] /Users/yangjie01/SourceCode/git/spark-mine-13/core/src/main/scala/org/apache/spark/util/SignalUtils.scala:99: not found: type Signal [ERROR] [Error] /Users/yangjie01/SourceCode/git/spark-mine-13/core/src/main/scala/org/apache/spark/util/SignalUtils.scala:83: not found: type Signal [ERROR] [Error] /Users/yangjie01/SourceCode/git/spark-mine-13/core/src/main/scala/org/apache/spark/util/SignalUtils.scala:108: not found: type SignalHandler [ERROR] [Error] /Users/yangjie01/SourceCode/git/spark-mine-13/core/src/main/scala/org/apache/spark/util/SignalUtils.scala:108: not found: value Signal [ERROR] [Error] /Users/yangjie01/SourceCode/git/spark-mine-13/core/src/main/scala/org/apache/spark/util/SignalUtils.scala:114: not found: type Signal [ERROR] [Error] /Users/yangjie01/SourceCode/git/spark-mine-13/core/src/main/scala/org/apache/spark/util/SignalUtils.scala:116: not found: value Signal [ERROR] [Error] /Users/yangjie01/SourceCode/git/spark-mine-13/core/src/main/scala/org/apache/spark/util/SignalUtils.scala:128: not found: value Signal [ERROR] [Error]
[spark] branch master updated: [SPARK-44690][SPARK-44376][BUILD] Downgrade Scala to 2.13.8
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 656bf36363c [SPARK-44690][SPARK-44376][BUILD] Downgrade Scala to 2.13.8 656bf36363c is described below commit 656bf36363c466b60d0045234ccaaa654ed8 Author: yangjie01 AuthorDate: Mon Aug 7 10:13:59 2023 +0800 [SPARK-44690][SPARK-44376][BUILD] Downgrade Scala to 2.13.8 ### What changes were proposed in this pull request? The aim of this PR is to downgrade the Scala 2.13 dependency to 2.13.8 to ensure that Spark can be build with `-target:jvm-1.8`, and tested with Java 11/17. ### Why are the changes needed? As reported in SPARK-44376, there are issues when maven build and test using Java 11/17 with `-target:jvm-1.8`: - run `build/mvn clean install -Pscala-2.13` with Java 17 ``` [INFO] --- scala-maven-plugin:4.8.0:compile (scala-compile-first) spark-core_2.13 --- [INFO] Compiler bridge file: /Users/yangjie01/.sbt/1.0/zinc/org.scala-sbt/org.scala-sbt-compiler-bridge_2.13-1.8.0-bin_2.13.11__61.0-1.8.0_20221110T195421.jar [INFO] compiling 602 Scala sources and 77 Java sources to /Users/yangjie01/SourceCode/git/spark-mine-13/core/target/scala-2.13/classes ... [WARNING] [Warn] : [deprecation | origin= | version=] -target is deprecated: Use -release instead to compile against the correct platform API. [ERROR] [Error] /Users/yangjie01/SourceCode/git/spark-mine-13/core/src/main/scala/org/apache/spark/serializer/SerializationDebugger.scala:71: not found: value sun [ERROR] [Error] /Users/yangjie01/SourceCode/git/spark-mine-13/core/src/main/scala/org/apache/spark/storage/StorageUtils.scala:26: not found: object sun [ERROR] [Error] /Users/yangjie01/SourceCode/git/spark-mine-13/core/src/main/scala/org/apache/spark/storage/StorageUtils.scala:27: not found: object sun [ERROR] [Error] /Users/yangjie01/SourceCode/git/spark-mine-13/core/src/main/scala/org/apache/spark/storage/StorageUtils.scala:206: not found: type DirectBuffer [ERROR] [Error] /Users/yangjie01/SourceCode/git/spark-mine-13/core/src/main/scala/org/apache/spark/storage/StorageUtils.scala:210: not found: type Unsafe [ERROR] [Error] /Users/yangjie01/SourceCode/git/spark-mine-13/core/src/main/scala/org/apache/spark/storage/StorageUtils.scala:212: not found: type Unsafe [ERROR] [Error] /Users/yangjie01/SourceCode/git/spark-mine-13/core/src/main/scala/org/apache/spark/storage/StorageUtils.scala:213: not found: type DirectBuffer [ERROR] [Error] /Users/yangjie01/SourceCode/git/spark-mine-13/core/src/main/scala/org/apache/spark/storage/StorageUtils.scala:216: not found: type DirectBuffer [ERROR] [Error] /Users/yangjie01/SourceCode/git/spark-mine-13/core/src/main/scala/org/apache/spark/storage/StorageUtils.scala:236: not found: type DirectBuffer [ERROR] [Error] /Users/yangjie01/SourceCode/git/spark-mine-13/core/src/main/scala/org/apache/spark/storage/StorageUtils.scala:26: Unused import [ERROR] [Error] /Users/yangjie01/SourceCode/git/spark-mine-13/core/src/main/scala/org/apache/spark/storage/StorageUtils.scala:27: Unused import [ERROR] [Error] /Users/yangjie01/SourceCode/git/spark-mine-13/core/src/main/scala/org/apache/spark/util/ClosureCleaner.scala:452: not found: value sun [ERROR] [Error] /Users/yangjie01/SourceCode/git/spark-mine-13/core/src/main/scala/org/apache/spark/util/SignalUtils.scala:26: not found: object sun [ERROR] [Error] /Users/yangjie01/SourceCode/git/spark-mine-13/core/src/main/scala/org/apache/spark/util/SignalUtils.scala:99: not found: type SignalHandler [ERROR] [Error] /Users/yangjie01/SourceCode/git/spark-mine-13/core/src/main/scala/org/apache/spark/util/SignalUtils.scala:99: not found: type Signal [ERROR] [Error] /Users/yangjie01/SourceCode/git/spark-mine-13/core/src/main/scala/org/apache/spark/util/SignalUtils.scala:83: not found: type Signal [ERROR] [Error] /Users/yangjie01/SourceCode/git/spark-mine-13/core/src/main/scala/org/apache/spark/util/SignalUtils.scala:108: not found: type SignalHandler [ERROR] [Error] /Users/yangjie01/SourceCode/git/spark-mine-13/core/src/main/scala/org/apache/spark/util/SignalUtils.scala:108: not found: value Signal [ERROR] [Error] /Users/yangjie01/SourceCode/git/spark-mine-13/core/src/main/scala/org/apache/spark/util/SignalUtils.scala:114: not found: type Signal [ERROR] [Error] /Users/yangjie01/SourceCode/git/spark-mine-13/core/src/main/scala/org/apache/spark/util/SignalUtils.scala:116: not found: value Signal [ERROR] [Error] /Users/yangjie01/SourceCode/git/spark-mine-13/core/src/main/scala/org/apache/spark/util/SignalUtils.scala:128: not found: value Signal [ERROR] [Error]
[spark] branch branch-3.5 updated: [SPARK-41636][SQL] Make sure `selectFilters` returns predicates in deterministic order
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new c07eb181609 [SPARK-41636][SQL] Make sure `selectFilters` returns predicates in deterministic order c07eb181609 is described below commit c07eb18160952fab18a6d118ce2bb56e18d56741 Author: Jia Fan AuthorDate: Mon Aug 7 09:07:48 2023 +0900 [SPARK-41636][SQL] Make sure `selectFilters` returns predicates in deterministic order ### What changes were proposed in this pull request? Method `DataSourceStrategy#selectFilters`, which is used to determine "pushdown-able" filters, does not preserve the order of the input Seq[Expression] nor does it return the same order across the same plans. This is resulting in CodeGenerator cache misses even when the exact same LogicalPlan is executed. This PR to make sure `selectFilters` returns predicates in deterministic order. ### Why are the changes needed? Make sure `selectFilters` returns predicates in deterministic order, to reduce the probability of codegen cache misses. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? add new test. Closes #42265 from Hisoka-X/SPARK-41636_selectfilters_order. Authored-by: Jia Fan Signed-off-by: Hyukjin Kwon (cherry picked from commit 9462dcd0e996dd940d4970dc75482f7d088ac2ae) Signed-off-by: Hyukjin Kwon --- .../sql/execution/datasources/DataSourceStrategy.scala | 6 -- .../execution/datasources/DataSourceStrategySuite.scala| 14 ++ 2 files changed, 18 insertions(+), 2 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala index 5e6e0ad0392..94c2d2ffaca 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala @@ -19,6 +19,7 @@ package org.apache.spark.sql.execution.datasources import java.util.Locale +import scala.collection.immutable.ListMap import scala.collection.mutable import org.apache.hadoop.fs.Path @@ -670,9 +671,10 @@ object DataSourceStrategy // A map from original Catalyst expressions to corresponding translated data source filters. // If a predicate is not in this map, it means it cannot be pushed down. val supportNestedPredicatePushdown = DataSourceUtils.supportNestedPredicatePushdown(relation) -val translatedMap: Map[Expression, Filter] = predicates.flatMap { p => +// SPARK-41636: we keep the order of the predicates to avoid CodeGenerator cache misses +val translatedMap: Map[Expression, Filter] = ListMap(predicates.flatMap { p => translateFilter(p, supportNestedPredicatePushdown).map(f => p -> f) -}.toMap +}: _*) val pushedFilters: Seq[Filter] = translatedMap.values.toSeq diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategySuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategySuite.scala index a35fb5f6271..2b9ec97bace 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategySuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategySuite.scala @@ -324,4 +324,18 @@ class DataSourceStrategySuite extends PlanTest with SharedSparkSession { DataSourceStrategy.translateFilter(catalystFilter, true) } } + + test("SPARK-41636: selectFilters returns predicates in deterministic order") { + +val predicates = Seq(EqualTo($"id", 1), EqualTo($"id", 2), + EqualTo($"id", 3), EqualTo($"id", 4), EqualTo($"id", 5), EqualTo($"id", 6)) + +val (unhandledPredicates, pushedFilters, handledFilters) = + DataSourceStrategy.selectFilters(FakeRelation(), predicates) +assert(unhandledPredicates.equals(predicates)) +assert(pushedFilters.zipWithIndex.forall { case (f, i) => + f.equals(sources.EqualTo("id", i + 1)) +}) +assert(handledFilters.isEmpty) + } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (a640373fff3 -> 9462dcd0e99)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from a640373fff3 [SPARK-44629][PYTHON][DOCS] Publish PySpark Test Guidelines webpage add 9462dcd0e99 [SPARK-41636][SQL] Make sure `selectFilters` returns predicates in deterministic order No new revisions were added by this update. Summary of changes: .../sql/execution/datasources/DataSourceStrategy.scala | 6 -- .../execution/datasources/DataSourceStrategySuite.scala| 14 ++ 2 files changed, 18 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.5 updated: [SPARK-44629][PYTHON][DOCS] Publish PySpark Test Guidelines webpage
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new 36e7fe2e49c [SPARK-44629][PYTHON][DOCS] Publish PySpark Test Guidelines webpage 36e7fe2e49c is described below commit 36e7fe2e49cbde10b0faedcf701a04543a5c156f Author: Amanda Liu AuthorDate: Mon Aug 7 08:54:52 2023 +0900 [SPARK-44629][PYTHON][DOCS] Publish PySpark Test Guidelines webpage ### What changes were proposed in this pull request? This PR adds a webpage to the Spark docs website, https://spark.apache.org/docs, to outline PySpark testing best practices. ### Why are the changes needed? The changes are needed to provide PySpark end users with a guideline for how to use PySpark utils (introduced in SPARK-44629) to test PySpark code. ### Does this PR introduce _any_ user-facing change? Yes, the PR publishes a webpage on the Spark website. ### How was this patch tested? Existing tests Closes #42284 from asl3/testing-guidelines. Authored-by: Amanda Liu Signed-off-by: Hyukjin Kwon (cherry picked from commit a640373fff38f5c594e4e5c30587bcfe823dee1d) Signed-off-by: Hyukjin Kwon --- python/docs/source/getting_started/index.rst | 1 + .../source/getting_started/testing_pyspark.ipynb | 485 + 2 files changed, 486 insertions(+) diff --git a/python/docs/source/getting_started/index.rst b/python/docs/source/getting_started/index.rst index 3c1c7d80863..5f6d306651b 100644 --- a/python/docs/source/getting_started/index.rst +++ b/python/docs/source/getting_started/index.rst @@ -40,3 +40,4 @@ The list below is the contents of this quickstart page: quickstart_df quickstart_connect quickstart_ps + testing_pyspark diff --git a/python/docs/source/getting_started/testing_pyspark.ipynb b/python/docs/source/getting_started/testing_pyspark.ipynb new file mode 100644 index 000..268ace04376 --- /dev/null +++ b/python/docs/source/getting_started/testing_pyspark.ipynb @@ -0,0 +1,485 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "4ee2125b-f889-47e6-9c3d-8bd63a253683", + "metadata": {}, + "source": [ +"# Testing PySpark\n", +"\n", +"This guide is a reference for writing robust tests for PySpark code.\n", +"\n", +"To view the docs for PySpark test utils, see here. To see the code for PySpark built-in test utils, check out the Spark repository here. To see the JIRA board tickets for the PySpark test framework, see here." + ] + }, + { + "cell_type": "markdown", + "id": "0e8ee4b6-9544-45e1-8a91-e71ed8ef8b9d", + "metadata": {}, + "source": [ +"## Build a PySpark Application\n", +"Here is an example for how to start a PySpark application. Feel free to skip to the next section, “Testing your PySpark Application,” if you already have an application you’re ready to test.\n", +"\n", +"First, start your Spark Session." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "9af4a35b-17e8-4e45-816b-34c14c5902f7", + "metadata": {}, + "outputs": [], + "source": [ +"from pyspark.sql import SparkSession \n", +"from pyspark.sql.functions import col \n", +"\n", +"# Create a SparkSession \n", +"spark = SparkSession.builder.appName(\"Testing PySpark Example\").getOrCreate() " + ] + }, + { + "cell_type": "markdown", + "id": "4a4c6efe-91f5-4e18-b4b2-b0401c2368e4", + "metadata": {}, + "source": [ +"Next, create a DataFrame." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "3b483dd8-3a76-41c6-9206-301d7ef314d6", + "metadata": {}, + "outputs": [], + "source": [ +"sample_data = [{\"name\": \"JohnD.\", \"age\": 30}, \n", +" {\"name\": \"Alice G.\", \"age\": 25}, \n", +" {\"name\": \"Bob T.\", \"age\": 35}, \n", +" {\"name\": \"Eve A.\", \"age\": 28}] \n", +"\n", +"df = spark.createDataFrame(sample_data)" + ] + }, + { + "cell_type": "markdown", + "id": "e0f44333-0e08-470b-9fa2-38f59e3dbd63", + "metadata": {}, + "source": [ +"Now, let’s define and apply a transformation function to our DataFrame." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "a6c0b766-af5f-4e1d-acf8-887d7cf0b0b2", + "metadata": {}, + "outputs": [ +{ + "name": "stdout", + "output_type": "stream", + "text": [ + "+---++\n", + "|age|name|\n", + "+---++\n", + "| 30| John D.|\n", + "| 25|Alice G.|\n", + "| 35| Bob T.|\n", + "| 28| Eve A.|\n", + "+---++\n", + "\n" + ] +} + ], + "source": [ +"from pyspark.sql.functions import col, regexp_replace\n", +"\n", +"# Remove additional spaces in name\n", +"def
[spark] branch master updated: [SPARK-44629][PYTHON][DOCS] Publish PySpark Test Guidelines webpage
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new a640373fff3 [SPARK-44629][PYTHON][DOCS] Publish PySpark Test Guidelines webpage a640373fff3 is described below commit a640373fff38f5c594e4e5c30587bcfe823dee1d Author: Amanda Liu AuthorDate: Mon Aug 7 08:54:52 2023 +0900 [SPARK-44629][PYTHON][DOCS] Publish PySpark Test Guidelines webpage ### What changes were proposed in this pull request? This PR adds a webpage to the Spark docs website, https://spark.apache.org/docs, to outline PySpark testing best practices. ### Why are the changes needed? The changes are needed to provide PySpark end users with a guideline for how to use PySpark utils (introduced in SPARK-44629) to test PySpark code. ### Does this PR introduce _any_ user-facing change? Yes, the PR publishes a webpage on the Spark website. ### How was this patch tested? Existing tests Closes #42284 from asl3/testing-guidelines. Authored-by: Amanda Liu Signed-off-by: Hyukjin Kwon --- python/docs/source/getting_started/index.rst | 1 + .../source/getting_started/testing_pyspark.ipynb | 485 + 2 files changed, 486 insertions(+) diff --git a/python/docs/source/getting_started/index.rst b/python/docs/source/getting_started/index.rst index 3c1c7d80863..5f6d306651b 100644 --- a/python/docs/source/getting_started/index.rst +++ b/python/docs/source/getting_started/index.rst @@ -40,3 +40,4 @@ The list below is the contents of this quickstart page: quickstart_df quickstart_connect quickstart_ps + testing_pyspark diff --git a/python/docs/source/getting_started/testing_pyspark.ipynb b/python/docs/source/getting_started/testing_pyspark.ipynb new file mode 100644 index 000..268ace04376 --- /dev/null +++ b/python/docs/source/getting_started/testing_pyspark.ipynb @@ -0,0 +1,485 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "4ee2125b-f889-47e6-9c3d-8bd63a253683", + "metadata": {}, + "source": [ +"# Testing PySpark\n", +"\n", +"This guide is a reference for writing robust tests for PySpark code.\n", +"\n", +"To view the docs for PySpark test utils, see here. To see the code for PySpark built-in test utils, check out the Spark repository here. To see the JIRA board tickets for the PySpark test framework, see here." + ] + }, + { + "cell_type": "markdown", + "id": "0e8ee4b6-9544-45e1-8a91-e71ed8ef8b9d", + "metadata": {}, + "source": [ +"## Build a PySpark Application\n", +"Here is an example for how to start a PySpark application. Feel free to skip to the next section, “Testing your PySpark Application,” if you already have an application you’re ready to test.\n", +"\n", +"First, start your Spark Session." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "9af4a35b-17e8-4e45-816b-34c14c5902f7", + "metadata": {}, + "outputs": [], + "source": [ +"from pyspark.sql import SparkSession \n", +"from pyspark.sql.functions import col \n", +"\n", +"# Create a SparkSession \n", +"spark = SparkSession.builder.appName(\"Testing PySpark Example\").getOrCreate() " + ] + }, + { + "cell_type": "markdown", + "id": "4a4c6efe-91f5-4e18-b4b2-b0401c2368e4", + "metadata": {}, + "source": [ +"Next, create a DataFrame." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "3b483dd8-3a76-41c6-9206-301d7ef314d6", + "metadata": {}, + "outputs": [], + "source": [ +"sample_data = [{\"name\": \"JohnD.\", \"age\": 30}, \n", +" {\"name\": \"Alice G.\", \"age\": 25}, \n", +" {\"name\": \"Bob T.\", \"age\": 35}, \n", +" {\"name\": \"Eve A.\", \"age\": 28}] \n", +"\n", +"df = spark.createDataFrame(sample_data)" + ] + }, + { + "cell_type": "markdown", + "id": "e0f44333-0e08-470b-9fa2-38f59e3dbd63", + "metadata": {}, + "source": [ +"Now, let’s define and apply a transformation function to our DataFrame." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "a6c0b766-af5f-4e1d-acf8-887d7cf0b0b2", + "metadata": {}, + "outputs": [ +{ + "name": "stdout", + "output_type": "stream", + "text": [ + "+---++\n", + "|age|name|\n", + "+---++\n", + "| 30| John D.|\n", + "| 25|Alice G.|\n", + "| 35| Bob T.|\n", + "| 28| Eve A.|\n", + "+---++\n", + "\n" + ] +} + ], + "source": [ +"from pyspark.sql.functions import col, regexp_replace\n", +"\n", +"# Remove additional spaces in name\n", +"def remove_extra_spaces(df, column_name):\n", +"# Remove extra spaces from the specified column\n", +"df_transformed =
[spark] branch branch-3.5 updated: [SPARK-44634][SQL] Encoders.bean does no longer support nested beans with type arguments
This is an automated email from the ASF dual-hosted git repository. hvanhovell pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new e3b031276e4 [SPARK-44634][SQL] Encoders.bean does no longer support nested beans with type arguments e3b031276e4 is described below commit e3b031276e4ab626f7db7b8d95f01a598e25a6b1 Author: Giambattista Bloisi AuthorDate: Sun Aug 6 21:47:57 2023 +0200 [SPARK-44634][SQL] Encoders.bean does no longer support nested beans with type arguments ### What changes were proposed in this pull request? This PR fixes a regression introduced in Spark 3.4.x where Encoders.bean is no longer able to process nested beans having type arguments. For example: ``` class A { T value; // value getter and setter } class B { A stringHolder; // stringHolder getter and setter } Encoders.bean(B.class); // throws "SparkUnsupportedOperationException: [ENCODER_NOT_FOUND]..." ``` ### Why are the changes needed? JavaTypeInference.encoderFor main match does not manage ParameterizedType and TypeVariable cases. I think this is a regression introduced after getting rid of usage of guava TypeToken: [SPARK-42093 SQL Move JavaTypeInference to AgnosticEncoders](https://github.com/apache/spark/commit/18672003513d5a4aa610b6b94dbbc15c33185d3#diff-1191737b908340a2f4c22b71b1c40ebaa0da9d8b40c958089c346a3bda26943b) hvanhovell cloud-fan In this PR I'm leveraging commons lang3 TypeUtils functionalities to solve ParameterizedType type arguments for classes ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing tests have been extended to check correct encoding of a nested bean having type arguments. Closes #42327 from gbloisi-openaire/spark-44634. Authored-by: Giambattista Bloisi Signed-off-by: Herman van Hovell (cherry picked from commit d6998979427b6ad3a0f16d6966b3927d40440a60) Signed-off-by: Herman van Hovell --- .../spark/sql/catalyst/JavaTypeInference.scala | 84 +- .../spark/sql/catalyst/JavaBeanWithGenerics.java | 41 +++ .../sql/catalyst/JavaTypeInferenceSuite.scala | 4 ++ 3 files changed, 64 insertions(+), 65 deletions(-) diff --git a/sql/api/src/main/scala/org/apache/spark/sql/catalyst/JavaTypeInference.scala b/sql/api/src/main/scala/org/apache/spark/sql/catalyst/JavaTypeInference.scala index f352d28a7b5..3d536b735db 100644 --- a/sql/api/src/main/scala/org/apache/spark/sql/catalyst/JavaTypeInference.scala +++ b/sql/api/src/main/scala/org/apache/spark/sql/catalyst/JavaTypeInference.scala @@ -18,12 +18,14 @@ package org.apache.spark.sql.catalyst import java.beans.{Introspector, PropertyDescriptor} import java.lang.reflect.{ParameterizedType, Type, TypeVariable} -import java.util.{ArrayDeque, List => JList, Map => JMap} +import java.util.{List => JList, Map => JMap} import javax.annotation.Nonnull -import scala.annotation.tailrec +import scala.collection.JavaConverters._ import scala.reflect.ClassTag +import org.apache.commons.lang3.reflect.{TypeUtils => JavaTypeUtils} + import org.apache.spark.sql.catalyst.encoders.AgnosticEncoder import org.apache.spark.sql.catalyst.encoders.AgnosticEncoders.{ArrayEncoder, BinaryEncoder, BoxedBooleanEncoder, BoxedByteEncoder, BoxedDoubleEncoder, BoxedFloatEncoder, BoxedIntEncoder, BoxedLongEncoder, BoxedShortEncoder, DayTimeIntervalEncoder, DEFAULT_JAVA_DECIMAL_ENCODER, EncoderField, IterableEncoder, JavaBeanEncoder, JavaBigIntEncoder, JavaEnumEncoder, LocalDateTimeEncoder, MapEncoder, PrimitiveBooleanEncoder, PrimitiveByteEncoder, PrimitiveDoubleEncoder, PrimitiveFloatEncoder, P [...] import org.apache.spark.sql.errors.ExecutionErrors @@ -57,7 +59,8 @@ object JavaTypeInference { encoderFor(beanType, Set.empty).asInstanceOf[AgnosticEncoder[T]] } - private def encoderFor(t: Type, seenTypeSet: Set[Class[_]]): AgnosticEncoder[_] = t match { + private def encoderFor(t: Type, seenTypeSet: Set[Class[_]], +typeVariables: Map[TypeVariable[_], Type] = Map.empty): AgnosticEncoder[_] = t match { case c: Class[_] if c == java.lang.Boolean.TYPE => PrimitiveBooleanEncoder case c: Class[_] if c == java.lang.Byte.TYPE => PrimitiveByteEncoder @@ -101,18 +104,24 @@ object JavaTypeInference { UDTEncoder(udt, udt.getClass) case c: Class[_] if c.isArray => - val elementEncoder = encoderFor(c.getComponentType, seenTypeSet) + val elementEncoder = encoderFor(c.getComponentType, seenTypeSet, typeVariables) ArrayEncoder(elementEncoder, elementEncoder.nullable) -case ImplementsList(c, Array(elementCls)) => - val element = encoderFor(elementCls, seenTypeSet) +case c: Class[_] if classOf[JList[_]].isAssignableFrom(c) => +
[spark] branch master updated (74ae1e3434c -> d6998979427)
This is an automated email from the ASF dual-hosted git repository. hvanhovell pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 74ae1e3434c [SPARK-42500][SQL] ConstantPropagation support more case add d6998979427 [SPARK-44634][SQL] Encoders.bean does no longer support nested beans with type arguments No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/JavaTypeInference.scala | 84 +- .../spark/sql/catalyst/JavaBeanWithGenerics.java} | 29 .../sql/catalyst/JavaTypeInferenceSuite.scala | 4 ++ 3 files changed, 39 insertions(+), 78 deletions(-) copy sql/{hive/src/test/java/org/apache/spark/sql/hive/execution/UDFListString.java => catalyst/src/test/java/org/apache/spark/sql/catalyst/JavaBeanWithGenerics.java} (67%) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (41a2a7daeee -> 74ae1e3434c)
This is an automated email from the ASF dual-hosted git repository. yumwang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 41a2a7daeee [SPARK-44650][CORE] `spark.executor.defaultJavaOptions` Check illegal java options add 74ae1e3434c [SPARK-42500][SQL] ConstantPropagation support more case No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/optimizer/expressions.scala | 37 ++ .../optimizer/ConstantPropagationSuite.scala | 32 +-- 2 files changed, 47 insertions(+), 22 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-44650][CORE] `spark.executor.defaultJavaOptions` Check illegal java options
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 41a2a7daeee [SPARK-44650][CORE] `spark.executor.defaultJavaOptions` Check illegal java options 41a2a7daeee is described below commit 41a2a7daeee0a25d39f30364a694becf54ab37e7 Author: sychen AuthorDate: Sun Aug 6 08:24:40 2023 -0500 [SPARK-44650][CORE] `spark.executor.defaultJavaOptions` Check illegal java options ### What changes were proposed in this pull request? ### Why are the changes needed? Command ```bash ./bin/spark-shell --conf spark.executor.extraJavaOptions='-Dspark.foo=bar' ``` Error ``` spark.executor.extraJavaOptions is not allowed to set Spark options (was '-Dspark.foo=bar'). Set them directly on a SparkConf or in a properties file when using ./bin/spark-submit. ``` Command ```bash ./bin/spark-shell --conf spark.executor.defaultJavaOptions='-Dspark.foo=bar' ``` Start up normally. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? local test & add UT ``` ./bin/spark-shell --conf spark.executor.defaultJavaOptions='-Dspark.foo=bar' ``` ``` spark.executor.defaultJavaOptions is not allowed to set Spark options (was '-Dspark.foo=bar'). Set them directly on a SparkConf or in a properties file when using ./bin/spark-submit. ``` Closes #42313 from cxzl25/SPARK-44650. Authored-by: sychen Signed-off-by: Sean Owen --- .../main/scala/org/apache/spark/SparkConf.scala| 25 +++--- .../scala/org/apache/spark/SparkConfSuite.scala| 14 2 files changed, 27 insertions(+), 12 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/SparkConf.scala b/core/src/main/scala/org/apache/spark/SparkConf.scala index 813a14acd19..8c054d24b10 100644 --- a/core/src/main/scala/org/apache/spark/SparkConf.scala +++ b/core/src/main/scala/org/apache/spark/SparkConf.scala @@ -503,8 +503,6 @@ class SparkConf(loadDefaults: Boolean) extends Cloneable with Logging with Seria logWarning(msg) } -val executorOptsKey = EXECUTOR_JAVA_OPTIONS.key - // Used by Yarn in 1.1 and before sys.props.get("spark.driver.libraryPath").foreach { value => val warning = @@ -518,16 +516,19 @@ class SparkConf(loadDefaults: Boolean) extends Cloneable with Logging with Seria } // Validate spark.executor.extraJavaOptions -getOption(executorOptsKey).foreach { javaOpts => - if (javaOpts.contains("-Dspark")) { -val msg = s"$executorOptsKey is not allowed to set Spark options (was '$javaOpts'). " + - "Set them directly on a SparkConf or in a properties file when using ./bin/spark-submit." -throw new Exception(msg) - } - if (javaOpts.contains("-Xmx")) { -val msg = s"$executorOptsKey is not allowed to specify max heap memory settings " + - s"(was '$javaOpts'). Use spark.executor.memory instead." -throw new Exception(msg) +Seq(EXECUTOR_JAVA_OPTIONS.key, "spark.executor.defaultJavaOptions").foreach { executorOptsKey => + getOption(executorOptsKey).foreach { javaOpts => +if (javaOpts.contains("-Dspark")) { + val msg = s"$executorOptsKey is not allowed to set Spark options (was '$javaOpts'). " + +"Set them directly on a SparkConf or in a properties file " + +"when using ./bin/spark-submit." + throw new Exception(msg) +} +if (javaOpts.contains("-Xmx")) { + val msg = s"$executorOptsKey is not allowed to specify max heap memory settings " + +s"(was '$javaOpts'). Use spark.executor.memory instead." + throw new Exception(msg) +} } } diff --git a/core/src/test/scala/org/apache/spark/SparkConfSuite.scala b/core/src/test/scala/org/apache/spark/SparkConfSuite.scala index 74fd7816221..75e22e1418b 100644 --- a/core/src/test/scala/org/apache/spark/SparkConfSuite.scala +++ b/core/src/test/scala/org/apache/spark/SparkConfSuite.scala @@ -498,6 +498,20 @@ class SparkConfSuite extends SparkFunSuite with LocalSparkContext with ResetSyst } } } + + test("SPARK-44650: spark.executor.defaultJavaOptions Check illegal java options") { +val conf = new SparkConf() +conf.validateSettings() +conf.set(EXECUTOR_JAVA_OPTIONS.key, "-Dspark.foo=bar") +intercept[Exception] { + conf.validateSettings() +} +conf.remove(EXECUTOR_JAVA_OPTIONS.key) +conf.set("spark.executor.defaultJavaOptions", "-Dspark.foo=bar") +intercept[Exception] { + conf.validateSettings() +} + } } class Class1 {} - To unsubscribe, e-mail:
[spark] branch master updated: [SPARK-44688][INFRA] Add a file existence check before executing `free_disk_space` and `free_disk_space_container`
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new d264ee37f31 [SPARK-44688][INFRA] Add a file existence check before executing `free_disk_space` and `free_disk_space_container` d264ee37f31 is described below commit d264ee37f316b32bf37fff770b72b5841b84f7cc Author: yangjie01 AuthorDate: Sun Aug 6 17:41:11 2023 +0800 [SPARK-44688][INFRA] Add a file existence check before executing `free_disk_space` and `free_disk_space_container` ### What changes were proposed in this pull request? This pr add a file existence check before executing `dev/free_disk_space` and `dev/free_disk_space_container` ### Why are the changes needed? We added `free_disk_space` and `free_disk_space_container` to clean up the disk, but because the daily tests of other branches and the master branch share the yml file, we should check if the file exists before execution, otherwise it will affect the daily tests of other branches. - branch-3.5: https://github.com/apache/spark/actions/runs/5761479443 - branch-3.4: https://github.com/apache/spark/actions/runs/5760423900 - branch-3.3: https://github.com/apache/spark/actions/runs/5759384052 https://github.com/apache/spark/assets/1475305/6e46b34b-645a-4da5-b9c3-8a89bfacabcb;> ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - Pass GitHub Action Closes #42359 from LuciferYang/test-free_disk_space-exist. Authored-by: yangjie01 Signed-off-by: yangjie01 --- .github/workflows/build_and_test.yml | 19 +++ 1 file changed, 15 insertions(+), 4 deletions(-) diff --git a/.github/workflows/build_and_test.yml b/.github/workflows/build_and_test.yml index 04585481a9c..cd68c0904d9 100644 --- a/.github/workflows/build_and_test.yml +++ b/.github/workflows/build_and_test.yml @@ -241,7 +241,10 @@ jobs: restore-keys: | ${{ matrix.java }}-${{ matrix.hadoop }}-coursier- - name: Free up disk space - run: ./dev/free_disk_space + run: | +if [ -f ./dev/free_disk_space ]; then + ./dev/free_disk_space +fi - name: Install Java ${{ matrix.java }} uses: actions/setup-java@v3 with: @@ -419,7 +422,9 @@ jobs: # uninstall libraries dedicated for ML testing python3.9 -m pip uninstall -y torch torchvision torcheval torchtnt tensorboard mlflow fi -./dev/free_disk_space_container +if [ -f ./dev/free_disk_space_container ]; then + ./dev/free_disk_space_container +fi - name: Install Java ${{ matrix.java }} uses: actions/setup-java@v3 with: @@ -519,7 +524,10 @@ jobs: restore-keys: | sparkr-coursier- - name: Free up disk space - run: ./dev/free_disk_space_container + run: | +if [ -f ./dev/free_disk_space_container ]; then + ./dev/free_disk_space_container +fi - name: Install Java ${{ inputs.java }} uses: actions/setup-java@v3 with: @@ -629,7 +637,10 @@ jobs: restore-keys: | docs-maven- - name: Free up disk space - run: ./dev/free_disk_space_container + run: | +if [ -f ./dev/free_disk_space_container ]; then + ./dev/free_disk_space_container +fi - name: Install Java 8 uses: actions/setup-java@v3 with: - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org