(spark) branch master updated: [SPARK-46216][CORE] Improve `FileSystemPersistenceEngine` to support compressions
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 3da2e5c6324 [SPARK-46216][CORE] Improve `FileSystemPersistenceEngine` to support compressions 3da2e5c6324 is described below commit 3da2e5c632468ec7cf7001255c1a44197b46ce30 Author: Dongjoon Hyun AuthorDate: Sun Dec 3 00:26:16 2023 -0800 [SPARK-46216][CORE] Improve `FileSystemPersistenceEngine` to support compressions ### What changes were proposed in this pull request? This PR aims to improve `FileSystemPersistenceEngine` to support compressions via a new configuration, `spark.deploy.recoveryCompressionCodec`. ### Why are the changes needed? To allow the users to choose a proper compression codec for their workloads. For `JavaSerializer` case, `LZ4` compression is **2x** faster than the baseline (no compression). ``` OpenJDK 64-Bit Server VM 17.0.9+9-LTS on Linux 5.15.0-1051-azure AMD EPYC 7763 64-Core Processor 2000 Workers: Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative ZooKeeperPersistenceEngine with JavaSerializer 2276 2360 115 0.0 1137909.6 1.0X ZooKeeperPersistenceEngine with KryoSerializer 1883 1906 34 0.0 941364.2 1.2X FileSystemPersistenceEngine with JavaSerializer 431 436 7 0.0 215436.9 5.3X FileSystemPersistenceEngine with JavaSerializer (lz4) 209 216 9 0.0 104404.1 10.9X FileSystemPersistenceEngine with JavaSerializer (lzf) 199 202 2 0.0 99489.5 11.4X FileSystemPersistenceEngine with JavaSerializer (snappy)192 199 9 0.0 95872.9 11.9X FileSystemPersistenceEngine with JavaSerializer (zstd) 258 264 6 0.0 129249.4 8.8X FileSystemPersistenceEngine with KryoSerializer 139 151 13 0.0 69374.5 16.4X FileSystemPersistenceEngine with KryoSerializer (lz4) 159 165 8 0.0 79588.9 14.3X FileSystemPersistenceEngine with KryoSerializer (lzf) 180 195 18 0.0 89844.0 12.7X FileSystemPersistenceEngine with KryoSerializer (snappy)164 183 18 0.0 82016.0 13.9X FileSystemPersistenceEngine with KryoSerializer (zstd) 206 218 11 0.0 102838.9 11.1X BlackHolePersistenceEngine0 0 0 35.1 28.5 39908.5X ``` ### Does this PR introduce _any_ user-facing change? No, this is a new feature. ### How was this patch tested? Pass the CIs with the newly added test case. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #44129 from dongjoon-hyun/SPARK-46216. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- .../PersistenceEngineBenchmark-jdk21-results.txt | 22 .../PersistenceEngineBenchmark-results.txt | 22 .../master/FileSystemPersistenceEngine.scala | 10 -- .../spark/deploy/master/RecoveryModeFactory.scala | 6 ++-- .../org/apache/spark/internal/config/Deploy.scala | 7 .../apache/spark/deploy/master/MasterSuite.scala | 40 ++ .../deploy/master/PersistenceEngineBenchmark.scala | 19 -- .../deploy/master/PersistenceEngineSuite.scala | 13 +++ 8 files changed, 118 insertions(+), 21 deletions(-) diff --git a/core/benchmarks/PersistenceEngineBenchmark-jdk21-results.txt b/core/benchmarks/PersistenceEngineBenchmark-jdk21-results.txt index 65dbfd0990d..38e74ed6b53 100644 --- a/core/benchmarks/PersistenceEngineBenchmark-jdk21-results.txt +++ b/core/benchmarks/PersistenceEngineBenchmark-jdk21-results.txt @@ -4,12 +4,20 @@ PersistenceEngineBenchmark OpenJDK 64-Bit Server VM 21.0.1+12-LTS on Linux 5.15.0-1051-azure AMD EPYC 7763 64-Core Processor -1000 Workers:Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative ---
(spark) branch master updated: [SPARK-45093][CONNECT][PYTHON] Properly support error handling and conversion for AddArtifactHandler
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 400573d33af [SPARK-45093][CONNECT][PYTHON] Properly support error handling and conversion for AddArtifactHandler 400573d33af is described below commit 400573d33af39d61748dd0b2960c47161277f85e Author: Martin Grund AuthorDate: Mon Dec 4 08:22:13 2023 +0900 [SPARK-45093][CONNECT][PYTHON] Properly support error handling and conversion for AddArtifactHandler ### What changes were proposed in this pull request? This patch improves the error handling when errors are happening in the `AddArtifact` path. In particular, the `AddArtifactHandler` would not properly return exceptions but all exceptions would end up yielding `UNKNOWN` errors. This patch makes sure we wrap the errors in the add artifact path the same way as we're wrapping the errors in the normal query execution path. In addition, it adds tests and verification for this behavior in Scala and in Python. ### Why are the changes needed? Stability ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Added UT for Scala and Python ### Was this patch authored or co-authored using generative AI tooling? No Closes #44092 from grundprinzip/SPARK-ADD_ARTIFACT_CRAP. Authored-by: Martin Grund Signed-off-by: Hyukjin Kwon --- .../service/SparkConnectAddArtifactsHandler.scala | 26 +++--- .../spark/sql/connect/utils/ErrorUtils.scala | 25 +++-- .../connect/service/AddArtifactsHandlerSuite.scala | 20 +++-- python/pyspark/sql/connect/client/core.py | 15 - .../sql/tests/connect/client/test_artifact.py | 20 + 5 files changed, 93 insertions(+), 13 deletions(-) diff --git a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectAddArtifactsHandler.scala b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectAddArtifactsHandler.scala index e664e07dce1..ea3b578be3b 100644 --- a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectAddArtifactsHandler.scala +++ b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectAddArtifactsHandler.scala @@ -24,7 +24,6 @@ import scala.collection.mutable import scala.util.control.NonFatal import com.google.common.io.CountingOutputStream -import io.grpc.StatusRuntimeException import io.grpc.stub.StreamObserver import org.apache.spark.connect.proto @@ -32,6 +31,7 @@ import org.apache.spark.connect.proto.{AddArtifactsRequest, AddArtifactsResponse import org.apache.spark.connect.proto.AddArtifactsResponse.ArtifactSummary import org.apache.spark.sql.artifact.ArtifactManager import org.apache.spark.sql.artifact.util.ArtifactUtils +import org.apache.spark.sql.connect.utils.ErrorUtils import org.apache.spark.util.Utils /** @@ -51,7 +51,7 @@ class SparkConnectAddArtifactsHandler(val responseObserver: StreamObserver[AddAr private var chunkedArtifact: StagedChunkedArtifact = _ private var holder: SessionHolder = _ - override def onNext(req: AddArtifactsRequest): Unit = { + override def onNext(req: AddArtifactsRequest): Unit = try { if (this.holder == null) { this.holder = SparkConnectService.getOrCreateIsolatedSession( req.getUserContext.getUserId, @@ -78,6 +78,17 @@ class SparkConnectAddArtifactsHandler(val responseObserver: StreamObserver[AddAr } else { throw new UnsupportedOperationException(s"Unsupported data transfer request: $req") } + } catch { +ErrorUtils.handleError( + "addArtifacts.onNext", + responseObserver, + holder.userId, + holder.sessionId, + None, + false, + Some(() => { +cleanUpStagedArtifacts() + })) } override def onError(throwable: Throwable): Unit = { @@ -128,7 +139,16 @@ class SparkConnectAddArtifactsHandler(val responseObserver: StreamObserver[AddAr responseObserver.onNext(builder.build()) responseObserver.onCompleted() } catch { - case e: StatusRuntimeException => onError(e) + ErrorUtils.handleError( +"addArtifacts.onComplete", +responseObserver, +holder.userId, +holder.sessionId, +None, +false, +Some(() => { + cleanUpStagedArtifacts() +})) } } diff --git a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/utils/ErrorUtils.scala b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/utils/ErrorUtils.scala index 7cb555ca47e..703b11c0c73 100644 --- a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/utils/Er
(spark) branch master updated: [SPARK-46221][PS][DOCS] Change `to_spark_io` to `spark.to_spark_io` in `quickstart_ps.ipynb`
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 349037567a6 [SPARK-46221][PS][DOCS] Change `to_spark_io` to `spark.to_spark_io` in `quickstart_ps.ipynb` 349037567a6 is described below commit 349037567a67dd5acb71f6e22e6e26fa304ba035 Author: Bjørn Jørgensen AuthorDate: Mon Dec 4 08:28:18 2023 +0900 [SPARK-46221][PS][DOCS] Change `to_spark_io` to `spark.to_spark_io` in `quickstart_ps.ipynb` ### What changes were proposed in this pull request? Change `to_spark_io` to `spark.to_spark_io` in `quickstart_ps.ipynb` ### Why are the changes needed? `to_spark_io` is change to `spark.to_spark_io` in spark 4.0 To have user guilds that works :) ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? ![image](https://github.com/apache/spark/assets/47577197/76f7067a-eeb4-4acc-88f1-3af94f5e78c7) ### Was this patch authored or co-authored using generative AI tooling? No. Closes #44135 from bjornjorgensen/spark.to_spark_io. Authored-by: Bjørn Jørgensen Signed-off-by: Hyukjin Kwon --- python/docs/source/getting_started/quickstart_ps.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/python/docs/source/getting_started/quickstart_ps.ipynb b/python/docs/source/getting_started/quickstart_ps.ipynb index 2b6b3f8142c..efd2753dcdc 100644 --- a/python/docs/source/getting_started/quickstart_ps.ipynb +++ b/python/docs/source/getting_started/quickstart_ps.ipynb @@ -14457,7 +14457,7 @@ } ], "source": [ -"psdf.to_spark_io('zoo.orc', format=\"orc\")\n", +"psdf.spark.to_spark_io('zoo.orc', format=\"orc\")\n", "ps.read_spark_io('zoo.orc', format=\"orc\").head(10)" ] }, - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-46217][CORE][TESTS] Include `Driver/App` data in `PersistenceEngineBenchmark`
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 5031e52f9e0 [SPARK-46217][CORE][TESTS] Include `Driver/App` data in `PersistenceEngineBenchmark` 5031e52f9e0 is described below commit 5031e52f9e032e8e450af9fcd294f5b53e2c4cfd Author: Dongjoon Hyun AuthorDate: Sun Dec 3 15:50:09 2023 -0800 [SPARK-46217][CORE][TESTS] Include `Driver/App` data in `PersistenceEngineBenchmark` ### What changes were proposed in this pull request? This PR aims to include `DirverInfo` and `ApplicationInfo` data in `PersistenceEngineBenchmark`. ### Why are the changes needed? Previously, `PersistenceEngine` recovers three kind of information. Previously, `PersistenceEngineBenchmark ` focused on `WorkerInfo` only. This PR will add two other informations to be more complete. https://github.com/apache/spark/blob/3da2e5c632468ec7cf7001255c1a44197b46ce30/core/src/main/scala/org/apache/spark/deploy/master/PersistenceEngine.scala#L56-L78 ### Does this PR introduce _any_ user-facing change? No. This is a test improvement. ### How was this patch tested? Manual tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #44130 from dongjoon-hyun/SPARK-46217. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- .../PersistenceEngineBenchmark-jdk21-results.txt | 28 +- .../PersistenceEngineBenchmark-results.txt | 28 +- .../deploy/master/PersistenceEngineBenchmark.scala | 65 +- 3 files changed, 80 insertions(+), 41 deletions(-) diff --git a/core/benchmarks/PersistenceEngineBenchmark-jdk21-results.txt b/core/benchmarks/PersistenceEngineBenchmark-jdk21-results.txt index 38e74ed6b53..314fb6958b6 100644 --- a/core/benchmarks/PersistenceEngineBenchmark-jdk21-results.txt +++ b/core/benchmarks/PersistenceEngineBenchmark-jdk21-results.txt @@ -4,20 +4,20 @@ PersistenceEngineBenchmark OpenJDK 64-Bit Server VM 21.0.1+12-LTS on Linux 5.15.0-1051-azure AMD EPYC 7763 64-Core Processor -2000 Workers: Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative +1000 Workers: Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative -ZooKeeperPersistenceEngine with JavaSerializer 2254 2329 119 0.0 1126867.1 1.0X -ZooKeeperPersistenceEngine with KryoSerializer 1911 1912 1 0.0 955667.1 1.2X -FileSystemPersistenceEngine with JavaSerializer 438 448 15 0.0 218868.1 5.1X -FileSystemPersistenceEngine with JavaSerializer (lz4) 187 195 8 0.0 93337.8 12.1X -FileSystemPersistenceEngine with JavaSerializer (lzf) 193 216 20 0.0 96678.8 11.7X -FileSystemPersistenceEngine with JavaSerializer (snappy)175 183 10 0.0 87652.3 12.9X -FileSystemPersistenceEngine with JavaSerializer (zstd) 243 255 14 0.0 121695.2 9.3X -FileSystemPersistenceEngine with KryoSerializer 150 160 15 0.0 75089.7 15.0X -FileSystemPersistenceEngine with KryoSerializer (lz4) 170 177 10 0.0 84996.7 13.3X -FileSystemPersistenceEngine with KryoSerializer (lzf) 192 203 12 0.0 96019.1 11.7X -FileSystemPersistenceEngine with KryoSerializer (snappy)184 202 16 0.0 92241.3 12.2X -FileSystemPersistenceEngine with KryoSerializer (zstd) 232 238 5 0.0 116075.2 9.7X -BlackHolePersistenceEngine0 0 0 27.3 36.6 30761.0X +ZooKeeperPersistenceEngine with JavaSerializer 5402 5546 233 0.0 5402030.8 1.0X +ZooKeeperPersistenceEngine with KryoSerializer 4185 4220 32 0.0 4184623.1 1.3X +FileSystemPersistenceEngine with JavaSerializer1591 1634 37 0.0 1590836.4 3.4X +FileSystemPersistenceEngine with JavaSerializer (lz4)
(spark) branch master updated: [SPARK-40559][PYTHON] Add applyInArrow to groupBy and cogroup
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 393c69a515b [SPARK-40559][PYTHON] Add applyInArrow to groupBy and cogroup 393c69a515b is described below commit 393c69a515b4acb6ea0659c0a7f09ae487801c40 Author: Enrico Minack AuthorDate: Mon Dec 4 10:21:26 2023 +0900 [SPARK-40559][PYTHON] Add applyInArrow to groupBy and cogroup ### What changes were proposed in this pull request? Add `applyInArrow` method to PySpark `groupBy` and `groupBy.cogroup` to allow for user functions that work on Arrow. Similar to existing `mapInArrow`. ### Why are the changes needed? PySpark allows to transform a `DataFrame` via Pandas and Arrow API: ``` df.mapInArrow(map_arrow, schema="...") df.mapInPandas(map_pandas, schema="...") ``` For `df.groupBy(...)` and `df.groupBy(...).cogroup(...)`, there is only a Pandas interface, no Arrow interface: ``` df.groupBy("id").applyInPandas(apply_pandas, schema="...") ``` Providing a pure Arrow interface allows user code to use **any** Arrow-based data framework, not only Pandas, e.g. Polars: ``` def apply_polars(df: polars.DataFrame) -> polars.DataFrame: return df def apply_arrow(table: pyarrow.Table) -> pyarrow.Table: df = polars.from_arrow(table) return apply_polars(df).to_arrow() df.groupBy("id").applyInArrow(apply_arrow, schema="...") ``` ### Does this PR introduce _any_ user-facing change? This adds method `applyInPandas` to PySpark `groupBy` and `groupBy.cogroup`. ### How was this patch tested? Tested with unit tests. Closes #38624 from EnricoMi/branch-pyspark-grouped-apply-in-arrow. Authored-by: Enrico Minack Signed-off-by: Hyukjin Kwon --- .../org/apache/spark/api/python/PythonRunner.scala | 4 + dev/sparktestsupport/modules.py| 2 + python/pyspark/errors/error_classes.py | 15 ++ python/pyspark/rdd.py | 4 + python/pyspark/sql/pandas/_typing/__init__.pyi | 11 + python/pyspark/sql/pandas/functions.py | 22 ++ python/pyspark/sql/pandas/group_ops.py | 230 +++- python/pyspark/sql/pandas/serializers.py | 81 +- python/pyspark/sql/tests/arrow/__init__.py | 16 ++ .../sql/tests/arrow/test_arrow_cogrouped_map.py| 300 + .../sql/tests/arrow/test_arrow_grouped_map.py | 291 python/pyspark/sql/udf.py | 40 +++ python/pyspark/worker.py | 207 +- .../plans/logical/pythonLogicalOperators.scala | 48 .../spark/sql/RelationalGroupedDataset.scala | 82 +- .../spark/sql/execution/SparkStrategies.scala | 6 + .../python/FlatMapCoGroupsInArrowExec.scala| 58 .../python/FlatMapCoGroupsInPandasExec.scala | 62 + ...xec.scala => FlatMapCoGroupsInPythonExec.scala} | 43 +-- .../python/FlatMapGroupsInArrowExec.scala | 64 + .../python/FlatMapGroupsInPandasExec.scala | 60 + ...sExec.scala => FlatMapGroupsInPythonExec.scala} | 54 ++-- 22 files changed, 1513 insertions(+), 187 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala b/core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala index e6d5a750ea3..148f80540d9 100644 --- a/core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala +++ b/core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala @@ -57,6 +57,8 @@ private[spark] object PythonEvalType { val SQL_COGROUPED_MAP_PANDAS_UDF = 206 val SQL_MAP_ARROW_ITER_UDF = 207 val SQL_GROUPED_MAP_PANDAS_UDF_WITH_STATE = 208 + val SQL_GROUPED_MAP_ARROW_UDF = 209 + val SQL_COGROUPED_MAP_ARROW_UDF = 210 val SQL_TABLE_UDF = 300 val SQL_ARROW_TABLE_UDF = 301 @@ -74,6 +76,8 @@ private[spark] object PythonEvalType { case SQL_COGROUPED_MAP_PANDAS_UDF => "SQL_COGROUPED_MAP_PANDAS_UDF" case SQL_MAP_ARROW_ITER_UDF => "SQL_MAP_ARROW_ITER_UDF" case SQL_GROUPED_MAP_PANDAS_UDF_WITH_STATE => "SQL_GROUPED_MAP_PANDAS_UDF_WITH_STATE" +case SQL_GROUPED_MAP_ARROW_UDF => "SQL_GROUPED_MAP_ARROW_UDF" +case SQL_COGROUPED_MAP_ARROW_UDF => "SQL_COGROUPED_MAP_ARROW_UDF" case SQL_TABLE_UDF => "SQL_TABLE_UDF" case SQL_ARROW_TABLE_UDF => "SQL_ARROW_TABLE_UDF" } diff --git a/dev/sparktestsupport/modules.py b/dev/sparktestsupport/modules.py index feb49062316..718a2509741 100644 --- a/dev/sparktestsupport/modules.py +++ b/dev/sparktestsupport/modules.py @@ -491,6 +491,8 @@ pyspark_sql = Module( "pyspark.sql.pandas.utils", "pyspark.sql.observation", # unittests +"pyspark.sql.tests.
(spark) branch master updated: [SPARK-46222][PYTHON][TESTS] Test invalid error class (pyspark.errors.utils)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 6ddd0081989 [SPARK-46222][PYTHON][TESTS] Test invalid error class (pyspark.errors.utils) 6ddd0081989 is described below commit 6ddd0081989eab9e27f84f006f5d7d76359f17bc Author: Hyukjin Kwon AuthorDate: Mon Dec 4 10:31:11 2023 +0900 [SPARK-46222][PYTHON][TESTS] Test invalid error class (pyspark.errors.utils) ### What changes were proposed in this pull request? This PR adds a test for invalid error classes ### Why are the changes needed? ![Screenshot 2023-12-04 at 8 50 37 AM](https://github.com/apache/spark/assets/6477701/47e80d92-3f28-4fb3-a2aa-8745f7fb40ec) https://app.codecov.io/gh/apache/spark/blob/master/python%2Fpyspark%2Ferrors%2Futils.py This is not being tested. ### Does this PR introduce _any_ user-facing change? No, test-only ### How was this patch tested? Manually ran the new unittest. ### Was this patch authored or co-authored using generative AI tooling? Np. Closes #44136 from HyukjinKwon/SPARK-46222. Authored-by: Hyukjin Kwon Signed-off-by: Hyukjin Kwon --- python/pyspark/errors/tests/test_errors.py | 5 + 1 file changed, 5 insertions(+) diff --git a/python/pyspark/errors/tests/test_errors.py b/python/pyspark/errors/tests/test_errors.py index 4e743bfb9a0..e191d7eff38 100644 --- a/python/pyspark/errors/tests/test_errors.py +++ b/python/pyspark/errors/tests/test_errors.py @@ -19,6 +19,7 @@ import json import unittest +from pyspark.errors import PySparkValueError from pyspark.errors.error_classes import ERROR_CLASSES_JSON from pyspark.errors.utils import ErrorClassesReader @@ -46,6 +47,10 @@ class ErrorsTest(unittest.TestCase): json.loads(ERROR_CLASSES_JSON, object_pairs_hook=detect_duplication) +def test_invalid_error_class(self): +with self.assertRaisesRegex(ValueError, "Cannot find main error class"): +PySparkValueError(error_class="invalid", message_parameters={}) + if __name__ == "__main__": import unittest - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-46220][SQL] Restrict charsets in `decode()`
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 52b94a6f6a9 [SPARK-46220][SQL] Restrict charsets in `decode()` 52b94a6f6a9 is described below commit 52b94a6f6a9335ebfaee1456f78d1943c24694d7 Author: Max Gekk AuthorDate: Mon Dec 4 10:42:40 2023 +0900 [SPARK-46220][SQL] Restrict charsets in `decode()` ### What changes were proposed in this pull request? In the PR, I propose to restrict the supported charsets in the `decode()` functions by the list from [the doc](https://spark.apache.org/docs/latest/api/sql/#decode): ``` 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16' ``` and use the existing SQL config `spark.sql.legacy.javaCharsets` for restoring the previous behaviour. ### Why are the changes needed? Currently the list of supported charsets in `decode()` is not stable and fully depends on the used JDK version. So, sometimes user code might not work because a devop changed Java version in a Spark cluster. ### Does this PR introduce _any_ user-facing change? Yes. ### How was this patch tested? By running new checks: ``` $ PYSPARK_PYTHON=python3 build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z string-functions.sql" ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #44131 from MaxGekk/restrict-charsets-in-stringdecode. Authored-by: Max Gekk Signed-off-by: Hyukjin Kwon --- .../explain-results/function_decode.explain| 2 +- docs/sql-migration-guide.md| 2 +- .../catalyst/expressions/stringExpressions.scala | 25 +++- .../analyzer-results/ansi/string-functions.sql.out | 42 ++ .../analyzer-results/string-functions.sql.out | 42 ++ .../sql-tests/inputs/string-functions.sql | 6 ++ .../results/ansi/string-functions.sql.out | 66 ++ .../sql-tests/results/string-functions.sql.out | 66 ++ 8 files changed, 246 insertions(+), 5 deletions(-) diff --git a/connector/connect/common/src/test/resources/query-tests/explain-results/function_decode.explain b/connector/connect/common/src/test/resources/query-tests/explain-results/function_decode.explain index 3b8e1eea576..165be9b9e12 100644 --- a/connector/connect/common/src/test/resources/query-tests/explain-results/function_decode.explain +++ b/connector/connect/common/src/test/resources/query-tests/explain-results/function_decode.explain @@ -1,2 +1,2 @@ -Project [decode(cast(g#0 as binary), UTF-8) AS decode(g, UTF-8)#0] +Project [decode(cast(g#0 as binary), UTF-8, false) AS decode(g, UTF-8)#0] +- LocalRelation , [id#0L, a#0, b#0, d#0, e#0, f#0, g#0] diff --git a/docs/sql-migration-guide.md b/docs/sql-migration-guide.md index cb4f59323c3..9f9c15521c6 100644 --- a/docs/sql-migration-guide.md +++ b/docs/sql-migration-guide.md @@ -29,7 +29,7 @@ license: | - Since Spark 4.0, `spark.sql.hive.metastore` drops the support of Hive prior to 2.0.0 as they require JDK 8 that Spark does not support anymore. Users should migrate to higher versions. - Since Spark 4.0, `spark.sql.parquet.compression.codec` drops the support of codec name `lz4raw`, please use `lz4_raw` instead. - Since Spark 4.0, when overflowing during casting timestamp to byte/short/int under non-ansi mode, Spark will return null instead a wrapping value. -- Since Spark 4.0, the `encode()` function supports only the following charsets 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16'. To restore the previous behavior when the function accepts charsets of the current JDK used by Spark, set `spark.sql.legacy.javaCharsets` to `true`. +- Since Spark 4.0, the `encode()` and `decode()` functions support only the following charsets 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16'. To restore the previous behavior when the function accepts charsets of the current JDK used by Spark, set `spark.sql.legacy.javaCharsets` to `true`. ## Upgrading from Spark SQL 3.4 to 3.5 diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala index 84a5eebd70e..7c5d65d2b95 100755 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala @@ -2638,18 +2638,26 @@ case class Decode(params: Seq[Expression], replacement: Expression) since = "1.5.0", group = "string_funcs") // scalastyle:on line.size.limit -case class StringDecode(bin: Expression, charset: Expression) +
(spark) branch master updated: [SPARK-46048][PYTHON][CONNECT][FOLLOWUP] Correct the string representation
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 0f3c58aa9e7d [SPARK-46048][PYTHON][CONNECT][FOLLOWUP] Correct the string representation 0f3c58aa9e7d is described below commit 0f3c58aa9e7dd7966e07a391767bdba294a97c9e Author: Ruifeng Zheng AuthorDate: Mon Dec 4 11:02:28 2023 +0900 [SPARK-46048][PYTHON][CONNECT][FOLLOWUP] Correct the string representation ### What changes were proposed in this pull request? Correct the string representation ### Why are the changes needed? to fix a minor issue ### Does this PR introduce _any_ user-facing change? yes before: ``` In [1]: spark.range(10).groupingSets([]) Out[1]: GroupedData[grouping expressions: [], value: [id: bigint], type: Pivot] ``` after: ``` In [2]: spark.range(10).groupingSets([]) Out[2]: GroupedData[grouping expressions: [], value: [id: bigint], type: GroupingSets] ``` ### How was this patch tested? manually test, it is trivial so I don't add a doctest ### Was this patch authored or co-authored using generative AI tooling? no Closes #44140 from zhengruifeng/connect_grouping_str. Authored-by: Ruifeng Zheng Signed-off-by: Hyukjin Kwon --- python/pyspark/sql/connect/group.py | 2 ++ 1 file changed, 2 insertions(+) diff --git a/python/pyspark/sql/connect/group.py b/python/pyspark/sql/connect/group.py index 610ef036bc5e..bb963c910e2f 100644 --- a/python/pyspark/sql/connect/group.py +++ b/python/pyspark/sql/connect/group.py @@ -109,6 +109,8 @@ class GroupedData: type_str = "RollUp" elif self._group_type == "cube": type_str = "Cube" +elif self._group_type == "grouping_sets": +type_str = "GroupingSets" else: type_str = "Pivot" - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-46223][PS] Test SparkPandasNotImplementedError with cleaning up unused code
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 91a43c6db185 [SPARK-46223][PS] Test SparkPandasNotImplementedError with cleaning up unused code 91a43c6db185 is described below commit 91a43c6db18595f481958c3e1c67c9724a561aca Author: Hyukjin Kwon AuthorDate: Mon Dec 4 11:03:08 2023 +0900 [SPARK-46223][PS] Test SparkPandasNotImplementedError with cleaning up unused code ### What changes were proposed in this pull request? This PR adds a test for `SparkPandasNotImplementedError`, and clean up some unused code. ### Why are the changes needed? ![Screenshot 2023-12-04 at 9 07 36 AM](https://github.com/apache/spark/assets/6477701/87f7dcb5-b448-4841-ab6a-dd92eb07a795) This is not being tested (https://app.codecov.io/gh/apache/spark/blob/master/python%2Fpyspark%2Fpandas%2Fexceptions.py). ### Does this PR introduce _any_ user-facing change? No, test-only ### How was this patch tested? Manually ran the new unittest. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #44137 from HyukjinKwon/SPARK-46223. Authored-by: Hyukjin Kwon Signed-off-by: Hyukjin Kwon --- python/pyspark/pandas/exceptions.py | 24 +++- python/pyspark/pandas/tests/test_indexing.py | 8 +++- 2 files changed, 14 insertions(+), 18 deletions(-) diff --git a/python/pyspark/pandas/exceptions.py b/python/pyspark/pandas/exceptions.py index d93f0bf0b68e..8644d0306bf0 100644 --- a/python/pyspark/pandas/exceptions.py +++ b/python/pyspark/pandas/exceptions.py @@ -29,28 +29,18 @@ class SparkPandasIndexingError(Exception): pass -def code_change_hint(pandas_function: Optional[str], spark_target_function: Optional[str]) -> str: -if pandas_function is not None and spark_target_function is not None: -return "You are trying to use pandas function {}, use spark function {}".format( -pandas_function, spark_target_function -) -elif pandas_function is not None and spark_target_function is None: -return ( -"You are trying to use pandas function {}, checkout the spark " -"user guide to find a relevant function" -).format(pandas_function) -elif pandas_function is None and spark_target_function is not None: -return "Use spark function {}".format(spark_target_function) -else: # both none -return "Checkout the spark user guide to find a relevant function" +def code_change_hint(pandas_function: str, spark_target_function: str) -> str: +return "You are trying to use pandas function {}, use spark function {}".format( +pandas_function, spark_target_function +) class SparkPandasNotImplementedError(NotImplementedError): def __init__( self, -pandas_function: Optional[str] = None, -spark_target_function: Optional[str] = None, -description: str = "", +pandas_function: str, +spark_target_function: str, +description: str, ): self.pandas_source = pandas_function self.spark_target = spark_target_function diff --git a/python/pyspark/pandas/tests/test_indexing.py b/python/pyspark/pandas/tests/test_indexing.py index 590bef16bb22..a4ca03005b33 100644 --- a/python/pyspark/pandas/tests/test_indexing.py +++ b/python/pyspark/pandas/tests/test_indexing.py @@ -22,7 +22,7 @@ import numpy as np import pandas as pd from pyspark import pandas as ps -from pyspark.pandas.exceptions import SparkPandasIndexingError +from pyspark.pandas.exceptions import SparkPandasIndexingError, SparkPandasNotImplementedError from pyspark.testing.pandasutils import ComparisonTestBase, compare_both @@ -902,6 +902,12 @@ class IndexingTest(ComparisonTestBase): self.assert_eq(psdf.iloc[:-1, indexer], pdf.iloc[:-1, indexer]) # self.assert_eq(psdf.iloc[psdf.index == 2, indexer], pdf.iloc[pdf.index == 2, indexer]) +self.assertRaisesRegex( +SparkPandasNotImplementedError, +".iloc requires numeric slice, conditional boolean", +lambda: ps.range(10).iloc["a", :], +) + def test_iloc_multiindex_columns(self): arrays = [np.array(["bar", "bar", "baz", "baz"]), np.array(["one", "two", "one", "two"])] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-40559][PYTHON][FOLLOW-UP] Fix linter for `getfullargspec`
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new c6ad0b0825d7 [SPARK-40559][PYTHON][FOLLOW-UP] Fix linter for `getfullargspec` c6ad0b0825d7 is described below commit c6ad0b0825d7174a9b24b13156a0fa7c59056158 Author: Hyukjin Kwon AuthorDate: Mon Dec 4 11:15:39 2023 +0900 [SPARK-40559][PYTHON][FOLLOW-UP] Fix linter for `getfullargspec` ### What changes were proposed in this pull request? This PR proposes to use `inspect.getfullargspec` instead of unimported `getfullargspec`. This PR is a followup of https://github.com/apache/spark/pull/38624. ### Why are the changes needed? To recover the CI. It fails as below: ``` ./python/pyspark/worker.py:749:19: F821 undefined name 'getfullargspec' argspec = getfullargspec(chained_func) # signature was lost when wrapping it ^ ./python/pyspark/worker.py:757:19: F8[21](https://github.com/apache/spark/actions/runs/7080907452/job/19269484124#step:21:22) undefined name 'getfullargspec' argspec = getfullargspec(chained_func) # signature was lost when wrapping it ``` https://github.com/apache/spark/actions/runs/7080907452/job/19269484124 It was caused by the logical conflict w/ https://github.com/apache/spark/commit/f5e4e84ce3a7f65407d07cfc3eed2f51837527c1 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manually tested via `linter-python`. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #44141 from HyukjinKwon/SPARK-40559-followup2. Authored-by: Hyukjin Kwon Signed-off-by: Hyukjin Kwon --- python/pyspark/worker.py | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/python/pyspark/worker.py b/python/pyspark/worker.py index 2534238b43cb..158e3ae62bb7 100644 --- a/python/pyspark/worker.py +++ b/python/pyspark/worker.py @@ -746,7 +746,7 @@ def read_single_udf(pickleSer, infile, eval_type, runner_conf, udf_index): argspec = inspect.getfullargspec(chained_func) # signature was lost when wrapping it return args_offsets, wrap_grouped_map_pandas_udf(func, return_type, argspec, runner_conf) elif eval_type == PythonEvalType.SQL_GROUPED_MAP_ARROW_UDF: -argspec = getfullargspec(chained_func) # signature was lost when wrapping it +argspec = inspect.getfullargspec(chained_func) # signature was lost when wrapping it return args_offsets, wrap_grouped_map_arrow_udf(func, return_type, argspec, runner_conf) elif eval_type == PythonEvalType.SQL_GROUPED_MAP_PANDAS_UDF_WITH_STATE: return args_offsets, wrap_grouped_map_pandas_udf_with_state(func, return_type) @@ -754,7 +754,7 @@ def read_single_udf(pickleSer, infile, eval_type, runner_conf, udf_index): argspec = inspect.getfullargspec(chained_func) # signature was lost when wrapping it return args_offsets, wrap_cogrouped_map_pandas_udf(func, return_type, argspec, runner_conf) elif eval_type == PythonEvalType.SQL_COGROUPED_MAP_ARROW_UDF: -argspec = getfullargspec(chained_func) # signature was lost when wrapping it +argspec = inspect.getfullargspec(chained_func) # signature was lost when wrapping it return args_offsets, wrap_cogrouped_map_arrow_udf(func, return_type, argspec, runner_conf) elif eval_type == PythonEvalType.SQL_GROUPED_AGG_PANDAS_UDF: return wrap_grouped_agg_pandas_udf(func, args_offsets, kwargs_offsets, return_type) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-46224][PYTHON][MLLIB][TESTS] Test string representation of TestResult (pyspark.mllib.stat.test)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 108e25086e07 [SPARK-46224][PYTHON][MLLIB][TESTS] Test string representation of TestResult (pyspark.mllib.stat.test) 108e25086e07 is described below commit 108e25086e07e9bcc8290ec946aa421068f4cd59 Author: Hyukjin Kwon AuthorDate: Mon Dec 4 11:16:17 2023 +0900 [SPARK-46224][PYTHON][MLLIB][TESTS] Test string representation of TestResult (pyspark.mllib.stat.test) ### What changes were proposed in this pull request? This PR adds a test for `toString` on the model via `TestResult` within `pyspark.mllib.stat.test`. ### Why are the changes needed? ![Screenshot 2023-12-04 at 9 19 37 AM](https://github.com/apache/spark/assets/6477701/8a360172-0913-43ce-a5f0-a68cdefe252c) https://app.codecov.io/gh/apache/spark/blob/master/python%2Fpyspark%2Fmllib%2Fstat%2Ftest.py#L28 This is not being tested. ### Does this PR introduce _any_ user-facing change? No, test-only ### How was this patch tested? Manually ran the new unittest. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #44138 from HyukjinKwon/SPARK-46041. Authored-by: Hyukjin Kwon Signed-off-by: Hyukjin Kwon --- python/pyspark/mllib/tests/test_stat.py | 1 + 1 file changed, 1 insertion(+) diff --git a/python/pyspark/mllib/tests/test_stat.py b/python/pyspark/mllib/tests/test_stat.py index 4fcd49f11bdc..c851768a1fb5 100644 --- a/python/pyspark/mllib/tests/test_stat.py +++ b/python/pyspark/mllib/tests/test_stat.py @@ -65,6 +65,7 @@ class ChiSqTestTests(MLlibTestCase): observed = Vectors.dense([4, 6, 5]) pearson = Statistics.chiSqTest(observed) +self.assertIn("Chi squared test summary", str(pearson)) # Validated against the R command `chisq.test(c(4, 6, 5), p=c(1/3, 1/3, 1/3))` self.assertEqual(pearson.statistic, 0.4) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-46227][SQL] Move `withSQLConf` from `SQLHelper` to `SQLConfHelper`
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new aee6b1582775 [SPARK-46227][SQL] Move `withSQLConf` from `SQLHelper` to `SQLConfHelper` aee6b1582775 is described below commit aee6b158277537709a717223b518923431bca0a6 Author: ulysses-you AuthorDate: Sun Dec 3 21:23:34 2023 -0800 [SPARK-46227][SQL] Move `withSQLConf` from `SQLHelper` to `SQLConfHelper` ### What changes were proposed in this pull request? This pr moves method `withSQLConf` from `SQLHelper` in catalyst test module to `SQLConfHelper` trait in catalyst module. To make it easy to use such case: `val x = withSQLConf {}`, this pr also changes its return type. ### Why are the changes needed? A part of https://github.com/apache/spark/pull/44013 ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? Pass CI ### Was this patch authored or co-authored using generative AI tooling? no Closes #44142 from ulysses-you/withSQLConf. Authored-by: ulysses-you Signed-off-by: Dongjoon Hyun --- .../apache/spark/sql/catalyst/SQLConfHelper.scala | 29 .../spark/sql/catalyst/plans/SQLHelper.scala | 32 ++ .../sql/internal/ExecutorSideSQLConfSuite.scala| 2 +- .../org/apache/spark/sql/test/SQLTestUtils.scala | 2 +- .../spark/sql/hive/execution/HiveSerDeSuite.scala | 2 +- 5 files changed, 34 insertions(+), 33 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SQLConfHelper.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SQLConfHelper.scala index cee35cdb8d84..f4605b9218f0 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SQLConfHelper.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SQLConfHelper.scala @@ -17,6 +17,7 @@ package org.apache.spark.sql.catalyst +import org.apache.spark.sql.AnalysisException import org.apache.spark.sql.internal.SQLConf /** @@ -29,4 +30,32 @@ trait SQLConfHelper { * See [[SQLConf.get]] for more information. */ def conf: SQLConf = SQLConf.get + + /** + * Sets all SQL configurations specified in `pairs`, calls `f`, and then restores all SQL + * configurations. + */ + protected def withSQLConf[T](pairs: (String, String)*)(f: => T): T = { +val conf = SQLConf.get +val (keys, values) = pairs.unzip +val currentValues = keys.map { key => + if (conf.contains(key)) { +Some(conf.getConfString(key)) + } else { +None + } +} +keys.lazyZip(values).foreach { (k, v) => + if (SQLConf.isStaticConfigKey(k)) { +throw new AnalysisException(s"Cannot modify the value of a static config: $k") + } + conf.setConfString(k, v) +} +try f finally { + keys.zip(currentValues).foreach { +case (key, Some(value)) => conf.setConfString(key, value) +case (key, None) => conf.unsetConf(key) + } +} + } } diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/plans/SQLHelper.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/plans/SQLHelper.scala index eb844e6f057f..92681613bd83 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/plans/SQLHelper.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/plans/SQLHelper.scala @@ -23,41 +23,13 @@ import scala.util.control.NonFatal import org.scalatest.Assertions.fail -import org.apache.spark.sql.AnalysisException +import org.apache.spark.sql.catalyst.SQLConfHelper import org.apache.spark.sql.catalyst.util.DateTimeTestUtils import org.apache.spark.sql.catalyst.util.DateTimeUtils.getZoneId import org.apache.spark.sql.internal.SQLConf import org.apache.spark.util.Utils -trait SQLHelper { - - /** - * Sets all SQL configurations specified in `pairs`, calls `f`, and then restores all SQL - * configurations. - */ - protected def withSQLConf(pairs: (String, String)*)(f: => Unit): Unit = { -val conf = SQLConf.get -val (keys, values) = pairs.unzip -val currentValues = keys.map { key => - if (conf.contains(key)) { -Some(conf.getConfString(key)) - } else { -None - } -} -keys.lazyZip(values).foreach { (k, v) => - if (SQLConf.isStaticConfigKey(k)) { -throw new AnalysisException(s"Cannot modify the value of a static config: $k") - } - conf.setConfString(k, v) -} -try f finally { - keys.zip(currentValues).foreach { -case (key, Some(value)) => conf.setConfString(key, value) -case (key, None) => conf.unsetConf(key) - } -} - } +trait SQLHelper extends SQLConfHelper { /** * Generates a temporary path w
(spark) branch master updated: [SPARK-46218][BUILD] Upgrade commons-cli to 1.6.0
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 0c029e70706c [SPARK-46218][BUILD] Upgrade commons-cli to 1.6.0 0c029e70706c is described below commit 0c029e70706c7e1a4c3a7bb763dbbcb4fe1ccd9f Author: panbingkun AuthorDate: Sun Dec 3 21:27:21 2023 -0800 [SPARK-46218][BUILD] Upgrade commons-cli to 1.6.0 ### What changes were proposed in this pull request? The pr aims to upgrade `commons-cli` from `1.5.0` to `1.6.0`. ### Why are the changes needed? - The last upgrade occurred two years ago, https://github.com/apache/spark/pull/34707 - The full release notes: https://commons.apache.org/proper/commons-cli/changes-report.html#a1.6.0 - The version mainly focus on fixing bugs: Fix NPE in CommandLine.resolveOption(String). Fixes [CLI-283](https://issues.apache.org/jira/browse/CLI-283). CommandLine.addOption(Option) should not allow a null Option. Fixes [CLI-283](https://issues.apache.org/jira/browse/CLI-283). CommandLine.addArgs(String) should not allow a null String. Fixes [CLI-283](https://issues.apache.org/jira/browse/CLI-283). NullPointerException thrown by CommandLineParser.parse(). Fixes [CLI-317](https://issues.apache.org/jira/browse/CLI-317). StringIndexOutOfBoundsException thrown by CommandLineParser.parse(). Fixes [CLI-313](https://issues.apache.org/jira/browse/CLI-313). ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #44132 from panbingkun/SPARK-46218. Authored-by: panbingkun Signed-off-by: Dongjoon Hyun --- dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +- pom.xml | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 b/dev/deps/spark-deps-hadoop-3-hive-2.3 index f1d675d92b6d..ebfe6acad960 100644 --- a/dev/deps/spark-deps-hadoop-3-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3-hive-2.3 @@ -35,7 +35,7 @@ breeze_2.13/2.1.0//breeze_2.13-2.1.0.jar cats-kernel_2.13/2.8.0//cats-kernel_2.13-2.8.0.jar chill-java/0.10.0//chill-java-0.10.0.jar chill_2.13/0.10.0//chill_2.13-0.10.0.jar -commons-cli/1.5.0//commons-cli-1.5.0.jar +commons-cli/1.6.0//commons-cli-1.6.0.jar commons-codec/1.16.0//commons-codec-1.16.0.jar commons-collections/3.2.2//commons-collections-3.2.2.jar commons-collections4/4.4//commons-collections4-4.4.jar diff --git a/pom.xml b/pom.xml index 2a259cfd322b..27ee42f103dd 100644 --- a/pom.xml +++ b/pom.xml @@ -220,7 +220,7 @@ 2.70.0 3.1.0 1.1.0 -1.5.0 +1.6.0 1.70 1.9.0 4.1.100.Final - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated (0c029e70706c -> 712352e37ec5)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 0c029e70706c [SPARK-46218][BUILD] Upgrade commons-cli to 1.6.0 add 712352e37ec5 [SPARK-40559][PYTHON][DOCS][FOLLOW-UP] Fix the docstring and document both applyInArrows No new revisions were added by this update. Summary of changes: python/docs/source/reference/pyspark.sql/grouping.rst | 2 ++ python/pyspark/sql/pandas/group_ops.py| 15 +++ 2 files changed, 9 insertions(+), 8 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-46182][CORE] Track `lastTaskFinishTime` using the exact task finished event
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 6f112f7b1a50 [SPARK-46182][CORE] Track `lastTaskFinishTime` using the exact task finished event 6f112f7b1a50 is described below commit 6f112f7b1a50a2b8a59952c69f67dd5f80ab6633 Author: Xingbo Jiang AuthorDate: Sun Dec 3 22:08:20 2023 -0800 [SPARK-46182][CORE] Track `lastTaskFinishTime` using the exact task finished event ### What changes were proposed in this pull request? We found a race condition between lastTaskRunningTime and lastShuffleMigrationTime that could lead to a decommissioned executor exit before all the shuffle blocks have been discovered. The issue could lead to immediate task retry right after an executor exit, thus longer query execution time. To fix the issue, we choose to update the lastTaskRunningTime only when a task updates its status to finished through the StatusUpdate event. This is better than the current approach (which use a thread to check for number of running tasks every second), because in this way we clearly know whether the shuffle block refresh happened after all tasks finished running or not, thus resolved the race condition mentioned above. ### Why are the changes needed? To fix a race condition that could lead to shuffle data lost, thus longer query execution time. ### How was this patch tested? This is a very subtle race condition that is hard to write a unit test using current unit test framework. And we are confident the change is low risk. Thus only verify by passing all the existing tests. ### Was this patch authored or co-authored using generative AI tooling? No Closes #44090 from jiangxb1987/SPARK-46182. Authored-by: Xingbo Jiang Signed-off-by: Dongjoon Hyun --- .../spark/executor/CoarseGrainedExecutorBackend.scala| 16 +++- 1 file changed, 7 insertions(+), 9 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala b/core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala index f1a9aa353e76..4bf4929c1339 100644 --- a/core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala +++ b/core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala @@ -21,7 +21,7 @@ import java.net.URL import java.nio.ByteBuffer import java.util.Locale import java.util.concurrent.ConcurrentHashMap -import java.util.concurrent.atomic.AtomicBoolean +import java.util.concurrent.atomic.{AtomicBoolean, AtomicLong} import scala.util.{Failure, Success} import scala.util.control.NonFatal @@ -77,6 +77,10 @@ private[spark] class CoarseGrainedExecutorBackend( private var decommissioned = false + // Track the last time in ns that at least one task is running. If no task is running and all + // shuffle/RDD data migration are done, the decommissioned executor should exit. + private var lastTaskFinishTime = new AtomicLong(System.nanoTime()) + override def onStart(): Unit = { if (env.conf.get(DECOMMISSION_ENABLED)) { val signal = env.conf.get(EXECUTOR_DECOMMISSION_SIGNAL) @@ -273,6 +277,7 @@ private[spark] class CoarseGrainedExecutorBackend( val msg = StatusUpdate(executorId, taskId, state, data, cpus, resources) if (TaskState.isFinished(state)) { taskResources.remove(taskId) + lastTaskFinishTime.set(System.nanoTime()) } driver match { case Some(driverRef) => driverRef.send(msg) @@ -345,7 +350,6 @@ private[spark] class CoarseGrainedExecutorBackend( val shutdownThread = new Thread("wait-for-blocks-to-migrate") { override def run(): Unit = { - var lastTaskRunningTime = System.nanoTime() val sleep_time = 1000 // 1s // This config is internal and only used by unit tests to force an executor // to hang around for longer when decommissioned. @@ -362,7 +366,7 @@ private[spark] class CoarseGrainedExecutorBackend( val (migrationTime, allBlocksMigrated) = env.blockManager.lastMigrationInfo() // We can only trust allBlocksMigrated boolean value if there were no tasks running // since the start of computing it. -if (allBlocksMigrated && (migrationTime > lastTaskRunningTime)) { +if (allBlocksMigrated && (migrationTime > lastTaskFinishTime.get())) { logInfo("No running tasks, all blocks migrated, stopping.") exitExecutor(0, ExecutorLossMessage.decommissionFinished, notifyDriver = true) } else { @@ -374,12 +378,6 @@ private[spark] class CoarseGrainedExecutorBackend( } } else { logInfo(s"Blocked
(spark) branch branch-3.5 updated: [SPARK-46182][CORE] Track `lastTaskFinishTime` using the exact task finished event
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new 273ef5708fc3 [SPARK-46182][CORE] Track `lastTaskFinishTime` using the exact task finished event 273ef5708fc3 is described below commit 273ef5708fc33872cfe3091627617bbac8fdd56f Author: Xingbo Jiang AuthorDate: Sun Dec 3 22:08:20 2023 -0800 [SPARK-46182][CORE] Track `lastTaskFinishTime` using the exact task finished event ### What changes were proposed in this pull request? We found a race condition between lastTaskRunningTime and lastShuffleMigrationTime that could lead to a decommissioned executor exit before all the shuffle blocks have been discovered. The issue could lead to immediate task retry right after an executor exit, thus longer query execution time. To fix the issue, we choose to update the lastTaskRunningTime only when a task updates its status to finished through the StatusUpdate event. This is better than the current approach (which use a thread to check for number of running tasks every second), because in this way we clearly know whether the shuffle block refresh happened after all tasks finished running or not, thus resolved the race condition mentioned above. ### Why are the changes needed? To fix a race condition that could lead to shuffle data lost, thus longer query execution time. ### How was this patch tested? This is a very subtle race condition that is hard to write a unit test using current unit test framework. And we are confident the change is low risk. Thus only verify by passing all the existing tests. ### Was this patch authored or co-authored using generative AI tooling? No Closes #44090 from jiangxb1987/SPARK-46182. Authored-by: Xingbo Jiang Signed-off-by: Dongjoon Hyun (cherry picked from commit 6f112f7b1a50a2b8a59952c69f67dd5f80ab6633) Signed-off-by: Dongjoon Hyun --- .../spark/executor/CoarseGrainedExecutorBackend.scala| 16 +++- 1 file changed, 7 insertions(+), 9 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala b/core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala index c695a9ec2851..537522326fc7 100644 --- a/core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala +++ b/core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala @@ -21,7 +21,7 @@ import java.net.URL import java.nio.ByteBuffer import java.util.Locale import java.util.concurrent.ConcurrentHashMap -import java.util.concurrent.atomic.AtomicBoolean +import java.util.concurrent.atomic.{AtomicBoolean, AtomicLong} import scala.util.{Failure, Success} import scala.util.control.NonFatal @@ -80,6 +80,10 @@ private[spark] class CoarseGrainedExecutorBackend( private var decommissioned = false + // Track the last time in ns that at least one task is running. If no task is running and all + // shuffle/RDD data migration are done, the decommissioned executor should exit. + private var lastTaskFinishTime = new AtomicLong(System.nanoTime()) + override def onStart(): Unit = { if (env.conf.get(DECOMMISSION_ENABLED)) { val signal = env.conf.get(EXECUTOR_DECOMMISSION_SIGNAL) @@ -269,6 +273,7 @@ private[spark] class CoarseGrainedExecutorBackend( val msg = StatusUpdate(executorId, taskId, state, data, cpus, resources) if (TaskState.isFinished(state)) { taskResources.remove(taskId) + lastTaskFinishTime.set(System.nanoTime()) } driver match { case Some(driverRef) => driverRef.send(msg) @@ -341,7 +346,6 @@ private[spark] class CoarseGrainedExecutorBackend( val shutdownThread = new Thread("wait-for-blocks-to-migrate") { override def run(): Unit = { - var lastTaskRunningTime = System.nanoTime() val sleep_time = 1000 // 1s // This config is internal and only used by unit tests to force an executor // to hang around for longer when decommissioned. @@ -358,7 +362,7 @@ private[spark] class CoarseGrainedExecutorBackend( val (migrationTime, allBlocksMigrated) = env.blockManager.lastMigrationInfo() // We can only trust allBlocksMigrated boolean value if there were no tasks running // since the start of computing it. -if (allBlocksMigrated && (migrationTime > lastTaskRunningTime)) { +if (allBlocksMigrated && (migrationTime > lastTaskFinishTime.get())) { logInfo("No running tasks, all blocks migrated, stopping.") exitExecutor(0, ExecutorLossMessage.decommissionFinished, notifyDriver = true) } else { @@ -370,12 +374,6 @@ private[
(spark) branch branch-3.4 updated: [SPARK-46182][CORE] Track `lastTaskFinishTime` using the exact task finished event
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.4 by this push: new b8750d5c0b41 [SPARK-46182][CORE] Track `lastTaskFinishTime` using the exact task finished event b8750d5c0b41 is described below commit b8750d5c0b416137ce802cf73dd92b0fc7ff5467 Author: Xingbo Jiang AuthorDate: Sun Dec 3 22:08:20 2023 -0800 [SPARK-46182][CORE] Track `lastTaskFinishTime` using the exact task finished event ### What changes were proposed in this pull request? We found a race condition between lastTaskRunningTime and lastShuffleMigrationTime that could lead to a decommissioned executor exit before all the shuffle blocks have been discovered. The issue could lead to immediate task retry right after an executor exit, thus longer query execution time. To fix the issue, we choose to update the lastTaskRunningTime only when a task updates its status to finished through the StatusUpdate event. This is better than the current approach (which use a thread to check for number of running tasks every second), because in this way we clearly know whether the shuffle block refresh happened after all tasks finished running or not, thus resolved the race condition mentioned above. ### Why are the changes needed? To fix a race condition that could lead to shuffle data lost, thus longer query execution time. ### How was this patch tested? This is a very subtle race condition that is hard to write a unit test using current unit test framework. And we are confident the change is low risk. Thus only verify by passing all the existing tests. ### Was this patch authored or co-authored using generative AI tooling? No Closes #44090 from jiangxb1987/SPARK-46182. Authored-by: Xingbo Jiang Signed-off-by: Dongjoon Hyun (cherry picked from commit 6f112f7b1a50a2b8a59952c69f67dd5f80ab6633) Signed-off-by: Dongjoon Hyun --- .../spark/executor/CoarseGrainedExecutorBackend.scala| 16 +++- 1 file changed, 7 insertions(+), 9 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala b/core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala index c695a9ec2851..537522326fc7 100644 --- a/core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala +++ b/core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala @@ -21,7 +21,7 @@ import java.net.URL import java.nio.ByteBuffer import java.util.Locale import java.util.concurrent.ConcurrentHashMap -import java.util.concurrent.atomic.AtomicBoolean +import java.util.concurrent.atomic.{AtomicBoolean, AtomicLong} import scala.util.{Failure, Success} import scala.util.control.NonFatal @@ -80,6 +80,10 @@ private[spark] class CoarseGrainedExecutorBackend( private var decommissioned = false + // Track the last time in ns that at least one task is running. If no task is running and all + // shuffle/RDD data migration are done, the decommissioned executor should exit. + private var lastTaskFinishTime = new AtomicLong(System.nanoTime()) + override def onStart(): Unit = { if (env.conf.get(DECOMMISSION_ENABLED)) { val signal = env.conf.get(EXECUTOR_DECOMMISSION_SIGNAL) @@ -269,6 +273,7 @@ private[spark] class CoarseGrainedExecutorBackend( val msg = StatusUpdate(executorId, taskId, state, data, cpus, resources) if (TaskState.isFinished(state)) { taskResources.remove(taskId) + lastTaskFinishTime.set(System.nanoTime()) } driver match { case Some(driverRef) => driverRef.send(msg) @@ -341,7 +346,6 @@ private[spark] class CoarseGrainedExecutorBackend( val shutdownThread = new Thread("wait-for-blocks-to-migrate") { override def run(): Unit = { - var lastTaskRunningTime = System.nanoTime() val sleep_time = 1000 // 1s // This config is internal and only used by unit tests to force an executor // to hang around for longer when decommissioned. @@ -358,7 +362,7 @@ private[spark] class CoarseGrainedExecutorBackend( val (migrationTime, allBlocksMigrated) = env.blockManager.lastMigrationInfo() // We can only trust allBlocksMigrated boolean value if there were no tasks running // since the start of computing it. -if (allBlocksMigrated && (migrationTime > lastTaskRunningTime)) { +if (allBlocksMigrated && (migrationTime > lastTaskFinishTime.get())) { logInfo("No running tasks, all blocks migrated, stopping.") exitExecutor(0, ExecutorLossMessage.decommissionFinished, notifyDriver = true) } else { @@ -370,12 +374,6 @@ private[
(spark) branch master updated: [SPARK-46232][PYTHON] Migrate all remaining ValueError into PySpark error framework
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new b23ae15da019 [SPARK-46232][PYTHON] Migrate all remaining ValueError into PySpark error framework b23ae15da019 is described below commit b23ae15da019082891d71853682329c2d24c2e9e Author: Haejoon Lee AuthorDate: Sun Dec 3 22:49:30 2023 -0800 [SPARK-46232][PYTHON] Migrate all remaining ValueError into PySpark error framework ### What changes were proposed in this pull request? This PR proposes to migrate all remaining `ValueError` from `pyspark/sql/*` into PySpark error framework, `PySparkValueError` with assigning dedicated error classes. ### Why are the changes needed? To improve the error handling in PySpark. ### Does this PR introduce _any_ user-facing change? No API changes, but the user-facing error messages will be improved. ### How was this patch tested? The existing CI should pass. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #44149 from itholic/migrate_value_error. Authored-by: Haejoon Lee Signed-off-by: Dongjoon Hyun --- python/pyspark/errors/error_classes.py | 19 +-- python/pyspark/sql/pandas/serializers.py | 5 +++-- python/pyspark/sql/pandas/typehints.py | 12 +--- python/pyspark/sql/pandas/types.py | 7 +-- python/pyspark/sql/sql_formatter.py | 7 --- 5 files changed, 38 insertions(+), 12 deletions(-) diff --git a/python/pyspark/errors/error_classes.py b/python/pyspark/errors/error_classes.py index c7199ac938be..d0c0d1c115b0 100644 --- a/python/pyspark/errors/error_classes.py +++ b/python/pyspark/errors/error_classes.py @@ -287,6 +287,11 @@ ERROR_CLASSES_JSON = """ "NumPy array input should be of dimensions." ] }, + "INVALID_NUMBER_OF_DATAFRAMES_IN_GROUP" : { +"message" : [ + "Invalid number of dataframes in group ." +] + }, "INVALID_PANDAS_UDF" : { "message" : [ "Invalid function: " @@ -803,9 +808,9 @@ ERROR_CLASSES_JSON = """ "Expected values for ``, got ." ] }, - "TYPE_HINT_REQUIRED" : { + "TYPE_HINT_SHOULD_BE_SPECIFIED" : { "message" : [ - "A is required ." + "Type hints for should be specified; however, got ." ] }, "UDF_RETURN_TYPE" : { @@ -888,6 +893,11 @@ ERROR_CLASSES_JSON = """ "Unknown response: ." ] }, + "UNKNOWN_VALUE_FOR" : { +"message" : [ + "Unknown value for ``." +] + }, "UNSUPPORTED_DATA_TYPE" : { "message" : [ "Unsupported DataType ``." @@ -983,6 +993,11 @@ ERROR_CLASSES_JSON = """ "Value for `` only supports the 'pearson', got ''." ] }, + "VALUE_NOT_PLAIN_COLUMN_REFERENCE" : { +"message" : [ + "Value in should be a plain column reference such as `df.col` or `col('column')`." +] + }, "VALUE_NOT_POSITIVE" : { "message" : [ "Value for `` must be positive, got ''." diff --git a/python/pyspark/sql/pandas/serializers.py b/python/pyspark/sql/pandas/serializers.py index 8ffb7407714b..6c5bd826a023 100644 --- a/python/pyspark/sql/pandas/serializers.py +++ b/python/pyspark/sql/pandas/serializers.py @@ -707,8 +707,9 @@ class CogroupArrowUDFSerializer(ArrowStreamGroupUDFSerializer): yield batches1, batches2 elif dataframes_in_group != 0: -raise ValueError( -"Invalid number of dataframes in group {0}".format(dataframes_in_group) +raise PySparkValueError( +error_class="INVALID_NUMBER_OF_DATAFRAMES_IN_GROUP", +message_parameters={"dataframes_in_group": str(dataframes_in_group)}, ) diff --git a/python/pyspark/sql/pandas/typehints.py b/python/pyspark/sql/pandas/typehints.py index f0c13e66a63d..37ba02a94d58 100644 --- a/python/pyspark/sql/pandas/typehints.py +++ b/python/pyspark/sql/pandas/typehints.py @@ -18,7 +18,7 @@ from inspect import Signature from typing import Any, Callable, Dict, Optional, Union, TYPE_CHECKING from pyspark.sql.pandas.utils import require_minimum_pandas_version -from pyspark.errors import PySparkNotImplementedError +from pyspark.errors import PySparkNotImplementedError, PySparkValueError if TYPE_CHECKING: from pyspark.sql.pandas._typing import ( @@ -51,12 +51,18 @@ def infer_eval_type( annotations[parameter] for parameter in sig.parameters if parameter in annotations ] if len(parameters_sig) != len(sig.parameters): -raise ValueError("Type hints for all parameters should be specified; however, got %s" % sig) +raise PySparkValueError( +error_class="TYPE_HINT_SHOULD_BE_SPECIFIED", +message_parameters={"target": "
(spark) branch master updated (b23ae15da019 -> fc56bb53910e)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from b23ae15da019 [SPARK-46232][PYTHON] Migrate all remaining ValueError into PySpark error framework add fc56bb53910e [SPARK-46229][PYTHON][CONNECT] Add applyInArrow to groupBy and cogroup in Spark Connect No new revisions were added by this update. Summary of changes: .../sql/connect/planner/SparkConnectPlanner.scala | 29 +++-- dev/sparktestsupport/modules.py| 6 +- python/pyspark/sql/connect/_typing.py | 12 +++- python/pyspark/sql/connect/group.py| 53 +++- python/pyspark/sql/tests/arrow/__init__.py | 16 - .../connect/test_parity_arrow_cogrouped_map.py}| 14 ++--- .../connect/test_parity_arrow_grouped_map.py} | 14 +++-- .../tests/{arrow => }/test_arrow_cogrouped_map.py | 71 -- .../tests/{arrow => }/test_arrow_grouped_map.py| 61 ++- 9 files changed, 176 insertions(+), 100 deletions(-) delete mode 100644 python/pyspark/sql/tests/arrow/__init__.py copy python/pyspark/{pandas/tests/connect/test_parity_series_datetime.py => sql/tests/connect/test_parity_arrow_cogrouped_map.py} (75%) copy python/pyspark/{pandas/tests/connect/series/test_parity_series.py => sql/tests/connect/test_parity_arrow_grouped_map.py} (75%) rename python/pyspark/sql/tests/{arrow => }/test_arrow_cogrouped_map.py (92%) rename python/pyspark/sql/tests/{arrow => }/test_arrow_grouped_map.py (96%) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-46236][PYTHON][DOCS] Using brighter color for docs h3 title for better visibility
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 4398bb5d3732 [SPARK-46236][PYTHON][DOCS] Using brighter color for docs h3 title for better visibility 4398bb5d3732 is described below commit 4398bb5d37328e2f3594302d98f98803a379a2e9 Author: panbingkun AuthorDate: Mon Dec 4 16:26:39 2023 +0900 [SPARK-46236][PYTHON][DOCS] Using brighter color for docs h3 title for better visibility ### What changes were proposed in this pull request? The pr aims to using brighter color for docs `h3 title` for better visibility. ### Why are the changes needed? - Before: Dark theme: https://github.com/apache/spark/assets/15246973/ee024e67-d86b-4960-968f-329b32d3e370";> Light theme: https://github.com/apache/spark/assets/15246973/20a562ed-b9b5-44f6-bf52-30967fb7dc3a";> - After: Dark theme: https://github.com/apache/spark/assets/15246973/546139b5-a94c-4dec-a4fb-24299b3388ba";> Light theme: https://github.com/apache/spark/assets/15246973/4d912002-4c9e-46d0-a397-bcca0024f160";> ### Does this PR introduce _any_ user-facing change? No API changes, but the font color of the `h3 title` has been updated to a brighter color, improving contrast and readability. ### How was this patch tested? Manually test. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #44152 from panbingkun/SPARK-46236. Authored-by: panbingkun Signed-off-by: Hyukjin Kwon --- python/docs/source/_static/css/pyspark.css | 4 1 file changed, 4 deletions(-) diff --git a/python/docs/source/_static/css/pyspark.css b/python/docs/source/_static/css/pyspark.css index 2743629ff61c..565eaea29935 100644 --- a/python/docs/source/_static/css/pyspark.css +++ b/python/docs/source/_static/css/pyspark.css @@ -27,10 +27,6 @@ h1,h2 { color:#17A2B8!important; } -h3 { -color: #55 -} - /* Top menu */ #navbar-main { background: #1B5162!important; - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org