(spark-website) branch asf-site updated: Fix typo in downloads.md
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/spark-website.git The following commit(s) were added to refs/heads/asf-site by this push: new e73fcd924f Fix typo in downloads.md e73fcd924f is described below commit e73fcd924f6e30a292053c85d52b1eba2c074d90 Author: asikeero <60272147+asike...@users.noreply.github.com> AuthorDate: Sat Sep 14 19:25:00 2024 -0500 Fix typo in downloads.md There seems to have been a small typo in the Docker section of downloads. Author: asikeero <60272147+asike...@users.noreply.github.com> Author: Eero Asikainen <60272147+asike...@users.noreply.github.com> Closes #554 from asikeero/patch-1. --- downloads.md| 2 +- site/downloads.html | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/downloads.md b/downloads.md index cc89ec8382..0d1af9d5c8 100644 --- a/downloads.md +++ b/downloads.md @@ -45,7 +45,7 @@ Spark artifacts are [hosted in Maven Central](https://search.maven.org/search?q= Spark docker images are available from Dockerhub under the accounts of both [The Apache Software Foundation](https://hub.docker.com/r/apache/spark/) and [Official Images](https://hub.docker.com/_/spark). -Note that, these images contain non-ASF software and may be subject to different license terms. Please check their [Dockerfiles](https://github.com/apache/spark-docker) to verify whether to verify whether they are compatible with your deployment. +Note that, these images contain non-ASF software and may be subject to different license terms. Please check their [Dockerfiles](https://github.com/apache/spark-docker) to verify whether they are compatible with your deployment. ### Release notes for stable releases diff --git a/site/downloads.html b/site/downloads.html index ddb50cd9bd..5541878a5c 100644 --- a/site/downloads.html +++ b/site/downloads.html @@ -198,7 +198,7 @@ version: 3.5.2 Spark docker images are available from Dockerhub under the accounts of both https://hub.docker.com/r/apache/spark/";>The Apache Software Foundation and https://hub.docker.com/_/spark";>Official Images. -Note that, these images contain non-ASF software and may be subject to different license terms. Please check their https://github.com/apache/spark-docker";>Dockerfiles to verify whether to verify whether they are compatible with your deployment. +Note that, these images contain non-ASF software and may be subject to different license terms. Please check their https://github.com/apache/spark-docker";>Dockerfiles to verify whether they are compatible with your deployment. Release notes for stable releases - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark-website) branch asf-site updated: add dataflint to third party projects page
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/spark-website.git The following commit(s) were added to refs/heads/asf-site by this push: new 55f231b067 add dataflint to third party projects page 55f231b067 is described below commit 55f231b067c1e5e44fc1ead737a0bfd37a7d0327 Author: menishmueli AuthorDate: Sat Sep 14 19:11:52 2024 -0500 add dataflint to third party projects page Added DataFlint(https://github.com/dataflint/spark) to the third party projects page Generated site HTML with `bundle exec jekyll build` and tested it with `bundle exec jekyll serve` Author: menishmueli Closes #538 from menishmueli/asf-site. --- site/third-party-projects.html | 1 + third-party-projects.md| 1 + 2 files changed, 2 insertions(+) diff --git a/site/third-party-projects.html b/site/third-party-projects.html index cbb07d2506..24f92c5639 100644 --- a/site/third-party-projects.html +++ b/site/third-party-projects.html @@ -227,6 +227,7 @@ transforming, and analyzing genomic data using Apache Spark https://www.datamechanics.co/delight";>Data Mechanics Delight - Delight is a free, hosted, cross-platform Spark UI alternative backed by an open-source Spark agent. It features new metrics and visualizations to simplify Spark monitoring and performance tuning. + https://github.com/dataflint/spark";>DataFlint - DataFlint is A Spark UI replacement installed via an open-source library, which updates in real-time and alerts on performance issues Additional language bindings diff --git a/third-party-projects.md b/third-party-projects.md index e83ff1eadf..7d2f3feb26 100644 --- a/third-party-projects.md +++ b/third-party-projects.md @@ -70,6 +70,7 @@ transforming, and analyzing genomic data using Apache Spark Performance, monitoring, and debugging tools for Spark - https://www.datamechanics.co/delight";>Data Mechanics Delight - Delight is a free, hosted, cross-platform Spark UI alternative backed by an open-source Spark agent. It features new metrics and visualizations to simplify Spark monitoring and performance tuning. +- https://github.com/dataflint/spark";>DataFlint - DataFlint is A Spark UI replacement installed via an open-source library, which updates in real-time and alerts on performance issues Additional language bindings - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark-website) branch asf-site updated: Update rexml per Github security warning
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/spark-website.git The following commit(s) were added to refs/heads/asf-site by this push: new 6e602593f3 Update rexml per Github security warning 6e602593f3 is described below commit 6e602593f3e6bd49151bb8eaa7da4faa427a751d Author: Sean Owen AuthorDate: Tue Jul 30 10:57:38 2024 -0500 Update rexml per Github security warning Author: Sean Owen Closes #540 from srowen/rexml. --- Gemfile.lock | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/Gemfile.lock b/Gemfile.lock index f4dedba223..db9f15953f 100644 --- a/Gemfile.lock +++ b/Gemfile.lock @@ -48,11 +48,13 @@ GEM rb-fsevent (0.11.2) rb-inotify (0.10.1) ffi (~> 1.0) -rexml (3.2.6) +rexml (3.3.2) + strscan rouge (3.26.0) safe_yaml (1.0.5) sassc (2.4.0) ffi (~> 1.9) +strscan (3.1.0) terminal-table (2.0.0) unicode-display_width (~> 1.1, >= 1.1.1) unicode-display_width (1.8.0) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark-website) branch asf-site updated: Patch 1
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/spark-website.git The following commit(s) were added to refs/heads/asf-site by this push: new a3693ee235 Patch 1 a3693ee235 is described below commit a3693ee2358fde320fc9000be3a9fbc84e1df959 Author: Stefan Krawczyk AuthorDate: Wed Jun 5 07:27:23 2024 -0500 Patch 1 This adds [Hamilton](https://github.com/DAGWorks-Inc/hamilton) to the list of libraries with integrations. Hamilton has PySpark support (e.g. [examples](https://github.com/DAGWorks-Inc/hamilton/tree/main/examples/spark)) and this specific functionality is utilized by several enterprises in production. Author: Stefan Krawczyk Closes #520 from skrawcz/patch-1. --- site/third-party-projects.html | 1 + third-party-projects.md| 1 + 2 files changed, 2 insertions(+) diff --git a/site/third-party-projects.html b/site/third-party-projects.html index f5e2dd2873..2629d07d38 100644 --- a/site/third-party-projects.html +++ b/site/third-party-projects.html @@ -153,6 +153,7 @@ https://github.com/awslabs/python-deequ";>python-deequ - Measures data quality in large datasets https://github.com/datahub-project/datahub";>datahub - Metadata platform for the modern data stack https://github.com/dbt-labs/dbt-spark";>dbt-spark - Enables dbt to work with Apache Spark + https://github.com/DAGWorks-Inc/hamilton";>Hamilton - Enables one to declaratively describe PySpark transformations that helps keep code testable, modular, and logically visualizable. Connectors diff --git a/third-party-projects.md b/third-party-projects.md index e8b4b16c85..ed7e7b3353 100644 --- a/third-party-projects.md +++ b/third-party-projects.md @@ -18,6 +18,7 @@ This page tracks external software projects that supplement Apache Spark and add - [python-deequ](https://github.com/awslabs/python-deequ) - Measures data quality in large datasets - [datahub](https://github.com/datahub-project/datahub) - Metadata platform for the modern data stack - [dbt-spark](https://github.com/dbt-labs/dbt-spark) - Enables dbt to work with Apache Spark +- [Hamilton](https://github.com/DAGWorks-Inc/hamilton) - Enables one to declaratively describe PySpark transformations that helps keep code testable, modular, and logically visualizable. ## Connectors - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-46760][SQL][DOCS] Make the document of spark.sql.adaptive.coalescePartitions.parallelismFirst clearer
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 9d4d41c43f1c [SPARK-46760][SQL][DOCS] Make the document of spark.sql.adaptive.coalescePartitions.parallelismFirst clearer 9d4d41c43f1c is described below commit 9d4d41c43f1cb4cf724e0e27c1762df8bbdf2a54 Author: beliefer AuthorDate: Sat Feb 3 09:06:38 2024 -0600 [SPARK-46760][SQL][DOCS] Make the document of spark.sql.adaptive.coalescePartitions.parallelismFirst clearer ### What changes were proposed in this pull request? This PR propose to make the document of `spark.sql.adaptive.coalescePartitions.parallelismFirst` clearer. ### Why are the changes needed? The default value of `spark.sql.adaptive.coalescePartitions.parallelismFirst` is true, but the document contains the word `recommended to set this config to false and respect the configured target size`. It's very confused. ### Does this PR introduce _any_ user-facing change? 'Yes'. The document is more clear. ### How was this patch tested? N/A ### Was this patch authored or co-authored using generative AI tooling? 'No'. Closes #44787 from beliefer/SPARK-46760. Authored-by: beliefer Signed-off-by: Sean Owen --- docs/sql-performance-tuning.md | 2 +- .../src/main/scala/org/apache/spark/sql/internal/SQLConf.scala | 5 +++-- 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/docs/sql-performance-tuning.md b/docs/sql-performance-tuning.md index 1dbe1bb7e1a2..25c22d660562 100644 --- a/docs/sql-performance-tuning.md +++ b/docs/sql-performance-tuning.md @@ -267,7 +267,7 @@ This feature coalesces the post shuffle partitions based on the map output stati spark.sql.adaptive.coalescePartitions.parallelismFirst true - When true, Spark ignores the target size specified by spark.sql.adaptive.advisoryPartitionSizeInBytes (default 64MB) when coalescing contiguous shuffle partitions, and only respect the minimum partition size specified by spark.sql.adaptive.coalescePartitions.minPartitionSize (default 1MB), to maximize the parallelism. This is to avoid performance regression when enabling adaptive query execution. It's recommended to set this config to false and respect th [...] + When true, Spark ignores the target size specified by spark.sql.adaptive.advisoryPartitionSizeInBytes (default 64MB) when coalescing contiguous shuffle partitions, and only respect the minimum partition size specified by spark.sql.adaptive.coalescePartitions.minPartitionSize (default 1MB), to maximize the parallelism. This is to avoid performance regressions when enabling adaptive query execution. It's recommended to set this config to true on a busy clus [...] 3.2.0 diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala index d88cbed6b27d..1bff0ff1a350 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala @@ -713,8 +713,9 @@ object SQLConf { "shuffle partitions, but adaptively calculate the target size according to the default " + "parallelism of the Spark cluster. The calculated size is usually smaller than the " + "configured target size. This is to maximize the parallelism and avoid performance " + -"regression when enabling adaptive query execution. It's recommended to set this config " + -"to false and respect the configured target size.") +"regressions when enabling adaptive query execution. It's recommended to set this " + +"config to true on a busy cluster to make resource utilization more efficient (not many " + +"small tasks).") .version("3.2.0") .booleanConf .createWithDefault(true) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-45110][BUILD] Upgrade rocksdbjni to 8.8.1
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 1870de0b329a [SPARK-45110][BUILD] Upgrade rocksdbjni to 8.8.1 1870de0b329a is described below commit 1870de0b329ac5ef35a331a653b4debd85eaa684 Author: panbingkun AuthorDate: Thu Feb 1 06:37:00 2024 -0600 [SPARK-45110][BUILD] Upgrade rocksdbjni to 8.8.1 ### What changes were proposed in this pull request? The pr aims to upgrade rocksdbjni from `8.3.2` to `8.8.1`. Why version `8.8.1`? Because so far, `32` tests have been conducted based on version `8.6.7` or `8.8.1`, and no previous core issues have been found. The later versions have not been rigorously validated. ### Why are the changes needed? 1.The full release notes: - https://github.com/facebook/rocksdb/releases/tag/v8.8.1 - https://github.com/facebook/rocksdb/releases/tag/v8.7.3 - https://github.com/facebook/rocksdb/releases/tag/v8.6.7 - https://github.com/facebook/rocksdb/releases/tag/v8.5.4 - https://github.com/facebook/rocksdb/releases/tag/v8.5.3 - https://github.com/facebook/rocksdb/releases/tag/v8.4.4 - https://github.com/facebook/rocksdb/releases/tag/v8.3.3 2.Bug Fixes, eg: - Fixed a bug where compaction read under non direct IO still falls back to RocksDB internal prefetching after file system's prefetching returns non-OK status other than Status::NotSupported() - Fix a bug with atomic_flush=true that can cause DB to stuck after a flush fails - Fix a bug where if there is an error reading from offset 0 of a file from L1+ and that the file is not the first file in the sorted run, data can be lost in compaction and read/scan can return incorrect results. - Fix a bug where iterator may return incorrect result for DeleteRange() users if there was an error reading from a file. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - Pass GA. - Manually test: ``` ./build/mvn clean install -pl core -am -Dtest.exclude.tags=org.apache.spark.tags.ExtendedLevelDBTest -fn ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #43924 from panbingkun/upgrade_rocksdbjni. Lead-authored-by: panbingkun Co-authored-by: panbingkun Signed-off-by: Sean Owen --- dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +- pom.xml| 2 +- ...StoreBasicOperationsBenchmark-jdk21-results.txt | 70 ++--- .../StateStoreBasicOperationsBenchmark-results.txt | 72 +++--- 4 files changed, 73 insertions(+), 73 deletions(-) diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 b/dev/deps/spark-deps-hadoop-3-hive-2.3 index fcb3350e5de2..e02733883642 100644 --- a/dev/deps/spark-deps-hadoop-3-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3-hive-2.3 @@ -239,7 +239,7 @@ parquet-jackson/1.13.1//parquet-jackson-1.13.1.jar pickle/1.3//pickle-1.3.jar py4j/0.10.9.7//py4j-0.10.9.7.jar remotetea-oncrpc/1.1.2//remotetea-oncrpc-1.1.2.jar -rocksdbjni/8.3.2//rocksdbjni-8.3.2.jar +rocksdbjni/8.8.1//rocksdbjni-8.8.1.jar scala-collection-compat_2.13/2.7.0//scala-collection-compat_2.13-2.7.0.jar scala-compiler/2.13.12//scala-compiler-2.13.12.jar scala-library/2.13.12//scala-library-2.13.12.jar diff --git a/pom.xml b/pom.xml index 6e118bb27f5a..2fc14a4cdede 100644 --- a/pom.xml +++ b/pom.xml @@ -677,7 +677,7 @@ org.rocksdb rocksdbjni -8.3.2 +8.8.1 ${leveldbjni.group} diff --git a/sql/core/benchmarks/StateStoreBasicOperationsBenchmark-jdk21-results.txt b/sql/core/benchmarks/StateStoreBasicOperationsBenchmark-jdk21-results.txt index f92ae8668e16..c0d710873aed 100644 --- a/sql/core/benchmarks/StateStoreBasicOperationsBenchmark-jdk21-results.txt +++ b/sql/core/benchmarks/StateStoreBasicOperationsBenchmark-jdk21-results.txt @@ -6,33 +6,33 @@ OpenJDK 64-Bit Server VM 21.0.1+12-LTS on Linux 5.15.0-1053-azure AMD EPYC 7763 64-Core Processor putting 1 rows (1 rows to overwrite - rate 100): Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative --- -In-memory5 6 0 1.8 541.4 1.0X -RocksDB (trackTotalNumberOfRows: true) 40 41 2 0.24023.4 0.1X -RocksDB (trackTotalNumberOfRows: false) 15 15 1 0.71452.5 0.4X +In-m
(spark) branch master updated: [SPARK-46473][SQL] Reuse `getPartitionedFile` method
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 223afea9960c [SPARK-46473][SQL] Reuse `getPartitionedFile` method 223afea9960c is described below commit 223afea9960c7ef1a4c8654e043e860f6c248185 Author: huangxiaoping <1754789...@qq.com> AuthorDate: Wed Jan 31 22:59:20 2024 -0600 [SPARK-46473][SQL] Reuse `getPartitionedFile` method ### What changes were proposed in this pull request? Reuse `getPartitionedFile` method to reduce redundant code. ### Why are the changes needed? Reduce redundant code. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? ### Was this patch authored or co-authored using generative AI tooling? No Closes #44437 from huangxiaopingRD/SPARK-46473. Authored-by: huangxiaoping <1754789...@qq.com> Signed-off-by: Sean Owen --- .../apache/spark/sql/execution/DataSourceScanExec.scala| 2 +- .../apache/spark/sql/execution/PartitionedFileUtil.scala | 14 +++--- 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala index b3b2b0eab055..2622eadaefb3 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala @@ -645,7 +645,7 @@ case class FileSourceScanExec( logInfo(s"Planning with ${bucketSpec.numBuckets} buckets") val filesGroupedToBuckets = selectedPartitions.flatMap { p => -p.files.map(f => PartitionedFileUtil.getPartitionedFile(f, p.values)) +p.files.map(f => PartitionedFileUtil.getPartitionedFile(f, p.values, 0, f.getLen)) }.groupBy { f => BucketingUtils .getBucketId(f.toPath.getName) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/PartitionedFileUtil.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/PartitionedFileUtil.scala index b31369b6768e..997859058de1 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/PartitionedFileUtil.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/PartitionedFileUtil.scala @@ -33,20 +33,20 @@ object PartitionedFileUtil { (0L until file.getLen by maxSplitBytes).map { offset => val remaining = file.getLen - offset val size = if (remaining > maxSplitBytes) maxSplitBytes else remaining -val hosts = getBlockHosts(getBlockLocations(file.fileStatus), offset, size) -PartitionedFile(partitionValues, SparkPath.fromPath(file.getPath), offset, size, hosts, - file.getModificationTime, file.getLen, file.metadata) +getPartitionedFile(file, partitionValues, offset, size) } } else { - Seq(getPartitionedFile(file, partitionValues)) + Seq(getPartitionedFile(file, partitionValues, 0, file.getLen)) } } def getPartitionedFile( file: FileStatusWithMetadata, - partitionValues: InternalRow): PartitionedFile = { -val hosts = getBlockHosts(getBlockLocations(file.fileStatus), 0, file.getLen) -PartitionedFile(partitionValues, SparkPath.fromPath(file.getPath), 0, file.getLen, hosts, + partitionValues: InternalRow, + start: Long, + length: Long): PartitionedFile = { +val hosts = getBlockHosts(getBlockLocations(file.fileStatus), start, length) +PartitionedFile(partitionValues, SparkPath.fromPath(file.getPath), start, length, hosts, file.getModificationTime, file.getLen, file.metadata) } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-46929][CORE][CONNECT][SS] Use ThreadUtils.shutdown to close thread pools
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 262ed5bcab0b [SPARK-46929][CORE][CONNECT][SS] Use ThreadUtils.shutdown to close thread pools 262ed5bcab0b is described below commit 262ed5bcab0ba750b089b0693dbb1a59ef6fd11f Author: beliefer AuthorDate: Wed Jan 31 09:52:19 2024 -0600 [SPARK-46929][CORE][CONNECT][SS] Use ThreadUtils.shutdown to close thread pools ### What changes were proposed in this pull request? This PR propose use `ThreadUtils.shutdown` to close thread pools. ### Why are the changes needed? `ThreadUtils` provided the `shutdown` to close thread pools. `ThreadUtils` wraps common logic to shutdown thread pools. We should use `ThreadUtils.shutdown` to close the thread pool. ### Does this PR introduce _any_ user-facing change? 'No'. ### How was this patch tested? GA ### Was this patch authored or co-authored using generative AI tooling? 'No'. Closes #44962 from beliefer/SPARK-46929. Authored-by: beliefer Signed-off-by: Sean Owen --- .../sql/connect/service/SparkConnectExecutionManager.scala | 5 +++-- .../sql/connect/service/SparkConnectSessionManager.scala| 5 +++-- .../connect/service/SparkConnectStreamingQueryCache.scala | 9 +++-- .../scala/org/apache/spark/ExecutorAllocationManager.scala | 4 ++-- .../org/apache/spark/status/ElementTrackingStore.scala | 6 ++ .../main/scala/org/apache/spark/streaming/Checkpoint.scala | 12 +--- .../org/apache/spark/streaming/scheduler/JobScheduler.scala | 13 ++--- 7 files changed, 24 insertions(+), 30 deletions(-) diff --git a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectExecutionManager.scala b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectExecutionManager.scala index c90f53ac07df..85fb150b3171 100644 --- a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectExecutionManager.scala +++ b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectExecutionManager.scala @@ -21,6 +21,7 @@ import java.util.concurrent.{Executors, ScheduledExecutorService, TimeUnit} import javax.annotation.concurrent.GuardedBy import scala.collection.mutable +import scala.concurrent.duration.FiniteDuration import scala.jdk.CollectionConverters._ import scala.util.control.NonFatal @@ -30,6 +31,7 @@ import org.apache.spark.{SparkEnv, SparkSQLException} import org.apache.spark.connect.proto import org.apache.spark.internal.Logging import org.apache.spark.sql.connect.config.Connect.{CONNECT_EXECUTE_MANAGER_ABANDONED_TOMBSTONES_SIZE, CONNECT_EXECUTE_MANAGER_DETACHED_TIMEOUT, CONNECT_EXECUTE_MANAGER_MAINTENANCE_INTERVAL} +import org.apache.spark.util.ThreadUtils // Unique key identifying execution by combination of user, session and operation id case class ExecuteKey(userId: String, sessionId: String, operationId: String) @@ -167,8 +169,7 @@ private[connect] class SparkConnectExecutionManager() extends Logging { private[connect] def shutdown(): Unit = executionsLock.synchronized { scheduledExecutor.foreach { executor => - executor.shutdown() - executor.awaitTermination(1, TimeUnit.MINUTES) + ThreadUtils.shutdown(executor, FiniteDuration(1, TimeUnit.MINUTES)) } scheduledExecutor = None // note: this does not cleanly shut down the executions, but the server is shutting down. diff --git a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectSessionManager.scala b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectSessionManager.scala index ef14cd305d40..4da728b95a33 100644 --- a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectSessionManager.scala +++ b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectSessionManager.scala @@ -22,6 +22,7 @@ import java.util.concurrent.{Executors, ScheduledExecutorService, TimeUnit} import javax.annotation.concurrent.GuardedBy import scala.collection.mutable +import scala.concurrent.duration.FiniteDuration import scala.jdk.CollectionConverters._ import scala.util.control.NonFatal @@ -31,6 +32,7 @@ import org.apache.spark.{SparkEnv, SparkSQLException} import org.apache.spark.internal.Logging import org.apache.spark.sql.SparkSession import org.apache.spark.sql.connect.config.Connect.{CONNECT_SESSION_MANAGER_CLOSED_SESSIONS_TOMBSTONES_SIZE, CONNECT_SESSION_MANAGER_DEFAULT_SESSION_TIMEOUT, CONNECT_SESSION_MANAGER_MAINTENANCE_INTERVAL} +import org.apache.spark.util.ThreadUtils /** * Global tracke
(spark) branch master updated: [SPARK-46400][CORE][SQL] When there are corrupted files in the local maven repo, skip this cache and try again
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new f2a471e9cc75 [SPARK-46400][CORE][SQL] When there are corrupted files in the local maven repo, skip this cache and try again f2a471e9cc75 is described below commit f2a471e9cc752f3826232eedc9025fd156a85965 Author: panbingkun AuthorDate: Wed Jan 31 09:46:07 2024 -0600 [SPARK-46400][CORE][SQL] When there are corrupted files in the local maven repo, skip this cache and try again ### What changes were proposed in this pull request? The pr aims to - fix potential bug(ie: https://github.com/apache/spark/pull/44208) and enhance user experience. - make the code more compliant with standards ### Why are the changes needed? We use the local maven repo as the first-level cache in ivy. The original intention was to reduce the time required to parse and obtain the ar, but when there are corrupted files in the local maven repo,The above mechanism will be directly interrupted and the prompt is very unfriendly, which will greatly confuse the user. Based on the original intention, we should skip the cache directly in similar situations. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manually test. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #44343 from panbingkun/SPARK-46400. Authored-by: panbingkun Signed-off-by: Sean Owen --- .../scala/org/apache/spark/util/MavenUtils.scala | 147 +++-- .../sql/hive/client/IsolatedClientLoader.scala | 4 + 2 files changed, 112 insertions(+), 39 deletions(-) diff --git a/common/utils/src/main/scala/org/apache/spark/util/MavenUtils.scala b/common/utils/src/main/scala/org/apache/spark/util/MavenUtils.scala index 2d7fba6f07d5..65530b7fa473 100644 --- a/common/utils/src/main/scala/org/apache/spark/util/MavenUtils.scala +++ b/common/utils/src/main/scala/org/apache/spark/util/MavenUtils.scala @@ -27,7 +27,7 @@ import org.apache.ivy.Ivy import org.apache.ivy.core.LogOptions import org.apache.ivy.core.module.descriptor.{Artifact, DefaultDependencyDescriptor, DefaultExcludeRule, DefaultModuleDescriptor, ExcludeRule} import org.apache.ivy.core.module.id.{ArtifactId, ModuleId, ModuleRevisionId} -import org.apache.ivy.core.report.ResolveReport +import org.apache.ivy.core.report.{DownloadStatus, ResolveReport} import org.apache.ivy.core.resolve.ResolveOptions import org.apache.ivy.core.retrieve.RetrieveOptions import org.apache.ivy.core.settings.IvySettings @@ -43,8 +43,8 @@ import org.apache.spark.util.ArrayImplicits._ private[spark] object MavenUtils extends Logging { val JAR_IVY_SETTING_PATH_KEY: String = "spark.jars.ivySettings" -// // Exposed for testing -// var printStream = SparkSubmit.printStream + // Exposed for testing + // var printStream = SparkSubmit.printStream // Exposed for testing. // These components are used to make the default exclusion rules for Spark dependencies. @@ -113,7 +113,7 @@ private[spark] object MavenUtils extends Logging { splits(2) != null && splits(2).trim.nonEmpty, s"The version cannot be null or " + s"be whitespace. The version provided is: ${splits(2)}") - new MavenCoordinate(splits(0), splits(1), splits(2)) + MavenCoordinate(splits(0), splits(1), splits(2)) }.toImmutableArraySeq } @@ -128,24 +128,30 @@ private[spark] object MavenUtils extends Logging { } /** - * Extracts maven coordinates from a comma-delimited string + * Create a ChainResolver used by Ivy to search for and resolve dependencies. * * @param defaultIvyUserDir * The default user path for Ivy + * @param useLocalM2AsCache + * Whether to use the local maven repo as a cache * @return * A ChainResolver used by Ivy to search for and resolve dependencies. */ - private[util] def createRepoResolvers(defaultIvyUserDir: File): ChainResolver = { + private[util] def createRepoResolvers( + defaultIvyUserDir: File, + useLocalM2AsCache: Boolean = true): ChainResolver = { // We need a chain resolver if we want to check multiple repositories val cr = new ChainResolver cr.setName("spark-list") -val localM2 = new IBiblioResolver -localM2.setM2compatible(true) -localM2.setRoot(m2Path.toURI.toString) -localM2.setUsepoms(true) -localM2.setName("local-m2-cache") -cr.add(localM2) +if (useLocalM2AsCache) { + val localM2 = new IBiblioResolver + localM2.setM2compatible(true) + localM2.setRoot(m2Path.toURI.toString) + localM2.setUsepoms(true) + localM2.setName("local-m2-cache") + cr.add(l
(spark) branch master updated: [SPARK-45522][BUILD][CORE][SQL][UI] Migrate from Jetty 9 to Jetty 10
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 6c19bf6b48e7 [SPARK-45522][BUILD][CORE][SQL][UI] Migrate from Jetty 9 to Jetty 10 6c19bf6b48e7 is described below commit 6c19bf6b48e7e2ab9937dc2d91ea23dd83abae64 Author: HiuFung Kwok AuthorDate: Wed Jan 31 09:42:16 2024 -0600 [SPARK-45522][BUILD][CORE][SQL][UI] Migrate from Jetty 9 to Jetty 10 ### What changes were proposed in this pull request? This is an upgrade ticket to bump the Jetty version from 9 to 10. This PR aims to bring incremental Jetty upgrades to Spark, as Jetty 9 support already reached EOL. ### Why are the changes needed? Jetty 9 is already beyond EOL, which means that we won't receive any security fix onward for Spark. ### Does this PR introduce _any_ user-facing change? No, SNI host check is now defaulted to true on embedded Jetty, hence set it back to false to maintain backward compatibility. Despite the redirect behaviour changed for trailing /, but modern browser should be able to pick up the 302 status code and perform redirect accordingly, hence there is no impact on user level. ### How was this patch tested? Junit test case. ### Was this patch authored or co-authored using generative AI tooling? No Closes #43765 from HiuKwok/ft-hf-SPARK-45522-jetty-upgradte. Lead-authored-by: HiuFung Kwok Co-authored-by: HiuFung Kwok <37996731+hiuk...@users.noreply.github.com> Signed-off-by: Sean Owen --- LICENSE-binary | 1 - core/pom.xml | 8 +--- .../main/scala/org/apache/spark/SSLOptions.scala | 2 +- .../main/scala/org/apache/spark/TestUtils.scala| 13 + .../scala/org/apache/spark/ui/JettyUtils.scala | 13 ++--- .../test/scala/org/apache/spark/ui/UISuite.scala | 22 +- dev/deps/spark-deps-hadoop-3-hive-2.3 | 4 ++-- dev/test-dependencies.sh | 2 +- pom.xml| 8 +--- .../service/cli/thrift/ThriftHttpCLIService.java | 12 ++-- 10 files changed, 52 insertions(+), 33 deletions(-) diff --git a/LICENSE-binary b/LICENSE-binary index c6f291f11088..2073d85246b6 100644 --- a/LICENSE-binary +++ b/LICENSE-binary @@ -368,7 +368,6 @@ xerces:xercesImpl org.codehaus.jackson:jackson-jaxrs org.codehaus.jackson:jackson-xc org.eclipse.jetty:jetty-client -org.eclipse.jetty:jetty-continuation org.eclipse.jetty:jetty-http org.eclipse.jetty:jetty-io org.eclipse.jetty:jetty-jndi diff --git a/core/pom.xml b/core/pom.xml index c093213bd6b9..f780551fb555 100644 --- a/core/pom.xml +++ b/core/pom.xml @@ -146,11 +146,6 @@ jetty-http compile - - org.eclipse.jetty - jetty-continuation - compile - org.eclipse.jetty jetty-servlet @@ -538,7 +533,7 @@ true true - guava,protobuf-java,jetty-io,jetty-servlet,jetty-servlets,jetty-continuation,jetty-http,jetty-plus,jetty-util,jetty-server,jetty-security,jetty-proxy,jetty-client + guava,protobuf-java,jetty-io,jetty-servlet,jetty-servlets,jetty-http,jetty-plus,jetty-util,jetty-server,jetty-security,jetty-proxy,jetty-client true @@ -558,7 +553,6 @@ org.eclipse.jetty:jetty-http org.eclipse.jetty:jetty-proxy org.eclipse.jetty:jetty-client - org.eclipse.jetty:jetty-continuation org.eclipse.jetty:jetty-servlet org.eclipse.jetty:jetty-servlets org.eclipse.jetty:jetty-plus diff --git a/core/src/main/scala/org/apache/spark/SSLOptions.scala b/core/src/main/scala/org/apache/spark/SSLOptions.scala index 26108d885e4c..ce058cec2686 100644 --- a/core/src/main/scala/org/apache/spark/SSLOptions.scala +++ b/core/src/main/scala/org/apache/spark/SSLOptions.scala @@ -87,7 +87,7 @@ private[spark] case class SSLOptions( /** * Creates a Jetty SSL context factory according to the SSL settings represented by this object. */ - def createJettySslContextFactory(): Option[SslContextFactory] = { + def createJettySslContextFactoryServer(): Option[SslContextFactory.Server] = { if (enabled) { val sslContextFactory = new SslContextFactory.Server() diff --git a/core/src/main/scala/org/apache/spark/TestUtils.scala b/core/src/main/scala/org/apache/spark/TestUtils.scala index e85f98ff55c5..5e3078d7292b 100644 --- a/core/src/main/scala/org/apache/spark/TestUtils.scala +++ b/core/src/main/scala/org/apache/spark/TestUtils.scala @@ -252,6 +252,19 @@ private[spark] object TestUt
(spark) branch master updated: [MINOR][SQL] Use `DecimalType.MINIMUM_ADJUSTED_SCALE` instead of magic number `6` in `Divide` class
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 0c7770f4de56 [MINOR][SQL] Use `DecimalType.MINIMUM_ADJUSTED_SCALE` instead of magic number `6` in `Divide` class 0c7770f4de56 is described below commit 0c7770f4de560ad74e93b0902ab7a6be52c655be Author: longfei.jiang <1251489...@qq.com> AuthorDate: Wed Jan 31 09:40:07 2024 -0600 [MINOR][SQL] Use `DecimalType.MINIMUM_ADJUSTED_SCALE` instead of magic number `6` in `Divide` class ### What changes were proposed in this pull request? Replace magic value `6` with constants `DecimalType.MINIMUM_ADJUSTED_SCALE` ### Why are the changes needed? Magic values are less self-documenting than constant values. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing UT `ArithmeticExpressionSuite#"SPARK-45786: Decimal multiply, divide, remainder, quot"` can provide testing ### Was this patch authored or co-authored using generative AI tooling? No Closes #44941 from jlfsdtc/magic_value. Authored-by: longfei.jiang <1251489...@qq.com> Signed-off-by: Sean Owen --- .../scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala| 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala index a0fb17cec812..9f1b42ad84d3 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala @@ -810,7 +810,7 @@ case class Divide( DecimalType.adjustPrecisionScale(prec, scale) } else { var intDig = min(DecimalType.MAX_SCALE, p1 - s1 + s2) - var decDig = min(DecimalType.MAX_SCALE, max(6, s1 + p2 + 1)) + var decDig = min(DecimalType.MAX_SCALE, max(DecimalType.MINIMUM_ADJUSTED_SCALE, s1 + p2 + 1)) val diff = (intDig + decDig) - DecimalType.MAX_SCALE if (diff > 0) { decDig -= diff / 2 + 1 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark) branch master updated: [SPARK-46100][CORE][PYTHON] Reduce stack depth by replace (string|array).size with (string|array).length
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 7b58fffdeeb [SPARK-46100][CORE][PYTHON] Reduce stack depth by replace (string|array).size with (string|array).length 7b58fffdeeb is described below commit 7b58fffdeeb70524e18ad80ea0aa53e2ac910e2a Author: Jiaan Geng AuthorDate: Sat Nov 25 14:38:34 2023 -0600 [SPARK-46100][CORE][PYTHON] Reduce stack depth by replace (string|array).size with (string|array).length ### What changes were proposed in this pull request? There are a lot of `[string|array].size` called. In fact, the size calls the underlying length, this behavior increase the stack length. We should call `[string|array].length` directly. We also get the compile waring `Replace .size with .length on arrays and strings` This PR just improve the core module. ### Why are the changes needed? Reduce stack depth by replace (string|array).size with (string|array).length ### Does this PR introduce _any_ user-facing change? 'No'. ### How was this patch tested? Exists test cases. ### Was this patch authored or co-authored using generative AI tooling? 'No'. Closes #44011 from beliefer/SPARK-46100. Authored-by: Jiaan Geng Signed-off-by: Sean Owen --- .../org/apache/spark/api/python/PythonRunner.scala | 2 +- .../apache/spark/deploy/master/ui/MasterPage.scala | 4 +- .../apache/spark/executor/ExecutorMetrics.scala| 2 +- .../org/apache/spark/resource/ResourceUtils.scala | 2 +- .../apache/spark/scheduler/TaskDescription.scala | 2 +- .../apache/spark/scheduler/TaskSchedulerImpl.scala | 4 +- .../org/apache/spark/ui/ConsoleProgressBar.scala | 2 +- .../org/apache/spark/util/HadoopFSUtils.scala | 2 +- .../util/io/ChunkedByteBufferFileRegion.scala | 2 +- .../scala/org/apache/spark/CheckpointSuite.scala | 16 ++--- .../scala/org/apache/spark/DistributedSuite.scala | 16 ++--- .../test/scala/org/apache/spark/FileSuite.scala| 2 +- .../org/apache/spark/MapOutputTrackerSuite.scala | 4 +- .../scala/org/apache/spark/PartitioningSuite.scala | 4 +- .../test/scala/org/apache/spark/ShuffleSuite.scala | 2 +- .../spark/deploy/DecommissionWorkerSuite.scala | 2 +- .../org/apache/spark/deploy/SparkSubmitSuite.scala | 4 +- .../deploy/StandaloneDynamicAllocationSuite.scala | 22 +++--- .../spark/deploy/client/AppClientSuite.scala | 6 +- .../deploy/history/FsHistoryProviderSuite.scala| 20 +++--- .../deploy/rest/StandaloneRestSubmitSuite.scala| 2 +- .../input/WholeTextFileRecordReaderSuite.scala | 4 +- .../internal/plugin/PluginContainerSuite.scala | 4 +- .../apache/spark/rdd/AsyncRDDActionsSuite.scala| 2 +- .../apache/spark/rdd/LocalCheckpointSuite.scala| 2 +- .../apache/spark/rdd/PairRDDFunctionsSuite.scala | 44 ++-- .../scala/org/apache/spark/rdd/PipedRDDSuite.scala | 10 +-- .../test/scala/org/apache/spark/rdd/RDDSuite.scala | 80 +++--- .../scala/org/apache/spark/rdd/SortingSuite.scala | 6 +- .../apache/spark/rdd/ZippedPartitionsSuite.scala | 4 +- .../spark/resource/ResourceProfileSuite.scala | 2 +- .../apache/spark/resource/ResourceUtilsSuite.scala | 6 +- .../apache/spark/scheduler/AQEShuffledRDD.scala| 2 +- .../CoarseGrainedSchedulerBackendSuite.scala | 2 +- .../apache/spark/scheduler/DAGSchedulerSuite.scala | 32 - .../apache/spark/scheduler/MapStatusSuite.scala| 2 +- .../scheduler/OutputCommitCoordinatorSuite.scala | 8 +-- .../spark/scheduler/TaskSchedulerImplSuite.scala | 12 ++-- .../spark/scheduler/TaskSetManagerSuite.scala | 4 +- .../KryoSerializerDistributedSuite.scala | 2 +- .../sort/IndexShuffleBlockResolverSuite.scala | 2 +- .../org/apache/spark/storage/DiskStoreSuite.scala | 2 +- .../org/apache/spark/util/FileAppenderSuite.scala | 4 +- .../spark/util/collection/SizeTrackerSuite.scala | 2 +- 44 files changed, 180 insertions(+), 180 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala b/core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala index d6363182606..e6d5a750ea3 100644 --- a/core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala +++ b/core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala @@ -378,7 +378,7 @@ private[spark] abstract class BasePythonRunner[IN, OUT]( resources.foreach { case (k, v) => PythonRDD.writeUTF(k, dataOut) PythonRDD.writeUTF(v.name, dataOut) - dataOut.writeInt(v.addresses.size) + dataOut.writeInt(v.addresses.length) v.addresses.foreach { case addr => PythonRDD.writeUTF(addr, dataOut)
(spark) branch master updated: [SPARK-45687][CORE][SQL][ML][MLLIB][KUBERNETES][EXAMPLES][CONNECT][STRUCTURED STREAMING] Fix `Passing an explicit array value to a Scala varargs method is deprecated`
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 605aa0c299c [SPARK-45687][CORE][SQL][ML][MLLIB][KUBERNETES][EXAMPLES][CONNECT][STRUCTURED STREAMING] Fix `Passing an explicit array value to a Scala varargs method is deprecated` 605aa0c299c is described below commit 605aa0c299c1d88f8a31ba888ac8e6b6203be6c5 Author: Tengfei Huang AuthorDate: Fri Nov 10 08:10:20 2023 -0600 [SPARK-45687][CORE][SQL][ML][MLLIB][KUBERNETES][EXAMPLES][CONNECT][STRUCTURED STREAMING] Fix `Passing an explicit array value to a Scala varargs method is deprecated` ### What changes were proposed in this pull request? Fix the deprecated behavior below: `Passing an explicit array value to a Scala varargs method is deprecated (since 2.13.0) and will result in a defensive copy; Use the more efficient non-copying ArraySeq.unsafeWrapArray or an explicit toIndexedSeq call` For all the use cases, we don't need to make a copy of the array. Explicitly use `ArraySeq.unsafeWrapArray` to do the conversion. ### Why are the changes needed? Eliminate compile warnings and no longer use deprecated scala APIs. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass GA. Fixed all the warning with build: `mvn clean package -DskipTests -Pspark-ganglia-lgpl -Pkinesis-asl -Pdocker-integration-tests -Pyarn -Pkubernetes -Pkubernetes-integration-tests -Phive-thriftserver -Phadoop-cloud` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #43642 from ivoson/SPARK-45687. Authored-by: Tengfei Huang Signed-off-by: Sean Owen --- .../scala/org/apache/spark/sql/KeyValueGroupedDataset.scala | 9 ++--- .../test/scala/org/apache/spark/sql/ColumnTestSuite.scala| 3 ++- .../apache/spark/sql/UserDefinedFunctionE2ETestSuite.scala | 5 - .../spark/sql/connect/planner/SparkConnectPlanner.scala | 3 ++- .../main/scala/org/apache/spark/api/python/PythonRDD.scala | 3 ++- core/src/main/scala/org/apache/spark/executor/Executor.scala | 3 ++- core/src/main/scala/org/apache/spark/rdd/RDD.scala | 3 ++- .../scala/org/apache/spark/examples/graphx/Analytics.scala | 4 ++-- .../scala/org/apache/spark/ml/classification/OneVsRest.scala | 3 ++- .../scala/org/apache/spark/ml/feature/FeatureHasher.scala| 4 +++- .../src/main/scala/org/apache/spark/ml/feature/Imputer.scala | 8 +--- .../main/scala/org/apache/spark/ml/feature/Interaction.scala | 4 +++- .../main/scala/org/apache/spark/ml/feature/RFormula.scala| 6 -- .../scala/org/apache/spark/ml/feature/VectorAssembler.scala | 5 +++-- mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala | 3 ++- .../src/main/scala/org/apache/spark/ml/fpm/PrefixSpan.scala | 3 ++- .../src/main/scala/org/apache/spark/ml/r/KSTestWrapper.scala | 3 ++- .../apache/spark/ml/regression/DecisionTreeRegressor.scala | 3 ++- .../src/main/scala/org/apache/spark/ml/tree/treeModels.scala | 3 ++- .../src/main/scala/org/apache/spark/mllib/util/MLUtils.scala | 12 .../scala/org/apache/spark/ml/feature/ImputerSuite.scala | 12 .../apache/spark/ml/source/image/ImageFileFormatSuite.scala | 3 ++- .../apache/spark/ml/stat/KolmogorovSmirnovTestSuite.scala| 3 ++- mllib/src/test/scala/org/apache/spark/ml/util/MLTest.scala | 6 -- .../deploy/k8s/features/DriverCommandFeatureStepSuite.scala | 2 +- .../apache/spark/sql/catalyst/expressions/generators.scala | 8 ++-- .../sql/catalyst/expressions/UnsafeRowConverterSuite.scala | 4 +++- .../scala/org/apache/spark/sql/DataFrameStatFunctions.scala | 3 ++- .../scala/org/apache/spark/sql/KeyValueGroupedDataset.scala | 8 ++-- .../spark/sql/execution/datasources/jdbc/JDBCRDD.scala | 2 +- .../org/apache/spark/sql/execution/stat/StatFunctions.scala | 3 ++- .../apache/spark/sql/execution/streaming/OffsetSeqLog.scala | 3 ++- .../streaming/continuous/ContinuousRateStreamSource.scala| 3 ++- .../src/test/scala/org/apache/spark/sql/DataFrameSuite.scala | 3 ++- .../src/test/scala/org/apache/spark/sql/DatasetSuite.scala | 6 -- .../src/test/scala/org/apache/spark/sql/GenTPCDSData.scala | 3 ++- .../test/scala/org/apache/spark/sql/ParametersSuite.scala| 9 + .../spark/sql/connector/SimpleWritableDataSource.scala | 4 +++- .../sql/execution/datasources/FileMetadataStructSuite.scala | 3 ++- .../spark/sql/execution/datasources/csv/CSVBenchmark.scala | 7 --- .../scala/org/apache/spark/sql/streaming/StreamSuite.scala | 2 +- .../org/apache/spark/sql/streaming/StreamingQuerySuite.scala | 3 ++- .../org/apache/spark/sql/hive/thriftserver/CliSuite.scala
(spark) branch master updated: [SPARK-45368][SQL] Remove scala2.12 compatibility logic for DoubleType, FloatType, Decimal
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 102daf9d149 [SPARK-45368][SQL] Remove scala2.12 compatibility logic for DoubleType, FloatType, Decimal 102daf9d149 is described below commit 102daf9d1490d12b812be4432c77ce102e82c3bb Author: tangjiafu AuthorDate: Tue Oct 31 08:42:46 2023 -0500 [SPARK-45368][SQL] Remove scala2.12 compatibility logic for DoubleType, FloatType, Decimal ### What changes were proposed in this pull request? Remove scala2.12 compatibility logic for DoubleType, FloatType, Decimal ### Why are the changes needed? Drop Scala 2.12 and make Scala 2.13 by default https://issues.apache.org/jira/browse/SPARK-45368 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? test by ci ### Was this patch authored or co-authored using generative AI tooling? no Closes #43456 from laglangyue/f_SPARK-45368_scala12_dataType. Lead-authored-by: tangjiafu Co-authored-by: laglangyue Signed-off-by: Sean Owen --- sql/api/src/main/scala/org/apache/spark/sql/types/Decimal.scala| 4 +--- sql/api/src/main/scala/org/apache/spark/sql/types/DoubleType.scala | 5 + sql/api/src/main/scala/org/apache/spark/sql/types/FloatType.scala | 5 + 3 files changed, 3 insertions(+), 11 deletions(-) diff --git a/sql/api/src/main/scala/org/apache/spark/sql/types/Decimal.scala b/sql/api/src/main/scala/org/apache/spark/sql/types/Decimal.scala index afe73635a68..3ce0508951f 100644 --- a/sql/api/src/main/scala/org/apache/spark/sql/types/Decimal.scala +++ b/sql/api/src/main/scala/org/apache/spark/sql/types/Decimal.scala @@ -681,9 +681,7 @@ object Decimal { override def toLong(x: Decimal): Long = x.toLong override def fromInt(x: Int): Decimal = new Decimal().set(x) override def compare(x: Decimal, y: Decimal): Int = x.compare(y) -// Added from Scala 2.13; don't override to work in 2.12 -// TODO revisit once Scala 2.12 support is dropped -def parseString(str: String): Option[Decimal] = Try(Decimal(str)).toOption +override def parseString(str: String): Option[Decimal] = Try(Decimal(str)).toOption } /** A [[scala.math.Fractional]] evidence parameter for Decimals. */ diff --git a/sql/api/src/main/scala/org/apache/spark/sql/types/DoubleType.scala b/sql/api/src/main/scala/org/apache/spark/sql/types/DoubleType.scala index d18c7b98af2..bc0ed725cf2 100644 --- a/sql/api/src/main/scala/org/apache/spark/sql/types/DoubleType.scala +++ b/sql/api/src/main/scala/org/apache/spark/sql/types/DoubleType.scala @@ -42,8 +42,6 @@ class DoubleType private() extends FractionalType { @Stable case object DoubleType extends DoubleType { - // Traits below copied from Scala 2.12; not present in 2.13 - // TODO: SPARK-30011 revisit once Scala 2.12 support is dropped trait DoubleIsConflicted extends Numeric[Double] { def plus(x: Double, y: Double): Double = x + y def minus(x: Double, y: Double): Double = x - y @@ -56,8 +54,7 @@ case object DoubleType extends DoubleType { def toDouble(x: Double): Double = x // logic in Numeric base trait mishandles abs(-0.0) override def abs(x: Double): Double = math.abs(x) -// Added from Scala 2.13; don't override to work in 2.12 -def parseString(str: String): Option[Double] = +override def parseString(str: String): Option[Double] = Try(java.lang.Double.parseDouble(str)).toOption } diff --git a/sql/api/src/main/scala/org/apache/spark/sql/types/FloatType.scala b/sql/api/src/main/scala/org/apache/spark/sql/types/FloatType.scala index 978384eebfe..8b54f830d48 100644 --- a/sql/api/src/main/scala/org/apache/spark/sql/types/FloatType.scala +++ b/sql/api/src/main/scala/org/apache/spark/sql/types/FloatType.scala @@ -43,8 +43,6 @@ class FloatType private() extends FractionalType { @Stable case object FloatType extends FloatType { - // Traits below copied from Scala 2.12; not present in 2.13 - // TODO: SPARK-30011 revisit once Scala 2.12 support is dropped trait FloatIsConflicted extends Numeric[Float] { def plus(x: Float, y: Float): Float = x + y def minus(x: Float, y: Float): Float = x - y @@ -57,8 +55,7 @@ case object FloatType extends FloatType { def toDouble(x: Float): Double = x.toDouble // logic in Numeric base trait mishandles abs(-0.0f) override def abs(x: Float): Float = math.abs(x) -// Added from Scala 2.13; don't override to work in 2.12 -def parseString(str: String): Option[Float] = +override def parseString(str: String): Option[Float] = Try(java.lang.Float.parseFloat(str)).toOption } - To unsubscribe, e-mail: c
(spark) branch master updated: [SPARK-45605][CORE][SQL][SS][CONNECT][MLLIB][GRAPHX][DSTREAM][PROTOBUF][EXAMPLES] Replace `s.c.MapOps.mapValues` with `s.c.MapOps.view.mapValues`
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 89ca8b6065e [SPARK-45605][CORE][SQL][SS][CONNECT][MLLIB][GRAPHX][DSTREAM][PROTOBUF][EXAMPLES] Replace `s.c.MapOps.mapValues` with `s.c.MapOps.view.mapValues` 89ca8b6065e is described below commit 89ca8b6065e9f690a492c778262080741d50d94d Author: yangjie01 AuthorDate: Sun Oct 29 09:19:30 2023 -0500 [SPARK-45605][CORE][SQL][SS][CONNECT][MLLIB][GRAPHX][DSTREAM][PROTOBUF][EXAMPLES] Replace `s.c.MapOps.mapValues` with `s.c.MapOps.view.mapValues` ### What changes were proposed in this pull request? This pr replace `s.c.MapOps.mapValues` with `s.c.MapOps.view.mapValues` due to `s.c.MapOps.mapValues` marked as deprecated since Scala 2.13.0: https://github.com/scala/scala/blob/bf45e199e96383b96a6955520d7d2524c78e6e12/src/library/scala/collection/Map.scala#L256-L262 ```scala deprecated("Use .view.mapValues(f). A future version will include a strict version of this method (for now, .view.mapValues(f).toMap).", "2.13.0") def mapValues[W](f: V => W): MapView[K, W] = new MapView.MapValues(this, f) ``` ### Why are the changes needed? Cleanup deprecated API usage. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - Pass GitHub Acitons - Packaged the client, manually tested `DFSReadWriteTest/MiniReadWriteTest/PowerIterationClusteringExample`. ### Was this patch authored or co-authored using generative AI tooling? No Closes #43448 from LuciferYang/SPARK-45605. Lead-authored-by: yangjie01 Co-authored-by: YangJie Signed-off-by: Sean Owen --- .../spark/util/sketch/CountMinSketchSuite.scala| 2 +- .../org/apache/spark/sql/avro/AvroUtils.scala | 1 + .../scala/org/apache/spark/sql/SparkSession.scala | 2 +- .../spark/sql/ClientDataFrameStatSuite.scala | 2 +- .../org/apache/spark/sql/connect/dsl/package.scala | 2 +- .../sql/connect/planner/SparkConnectPlanner.scala | 13 ++ .../sql/kafka010/KafkaMicroBatchSourceSuite.scala | 3 ++- .../apache/spark/sql/kafka010/KafkaTestUtils.scala | 2 +- .../streaming/kafka010/ConsumerStrategy.scala | 6 ++--- .../kafka010/DirectKafkaInputDStream.scala | 2 +- .../kafka010/DirectKafkaStreamSuite.scala | 2 +- .../spark/streaming/kafka010/KafkaTestUtils.scala | 2 +- .../spark/streaming/kinesis/KinesisTestUtils.scala | 2 +- .../kinesis/KPLBasedKinesisTestUtils.scala | 2 +- .../kinesis/KinesisBackedBlockRDDSuite.scala | 4 +-- .../spark/sql/protobuf/utils/ProtobufUtils.scala | 1 + .../org/apache/spark/api/java/JavaPairRDD.scala| 4 +-- .../apache/spark/api/java/JavaSparkContext.scala | 2 +- .../spark/api/python/PythonWorkerFactory.scala | 2 +- .../apache/spark/scheduler/InputFormatInfo.scala | 2 +- .../apache/spark/scheduler/TaskSchedulerImpl.scala | 2 +- .../cluster/CoarseGrainedSchedulerBackend.scala| 2 +- ...plicationEnvironmentInfoWrapperSerializer.scala | 5 ++-- .../ExecutorSummaryWrapperSerializer.scala | 3 ++- .../status/protobuf/JobDataWrapperSerializer.scala | 2 +- .../protobuf/StageDataWrapperSerializer.scala | 6 ++--- .../org/apache/spark/SparkThrowableSuite.scala | 2 +- .../apache/spark/rdd/PairRDDFunctionsSuite.scala | 2 +- .../test/scala/org/apache/spark/rdd/RDDSuite.scala | 1 + .../scheduler/ExecutorResourceInfoSuite.scala | 1 + .../BlockManagerDecommissionIntegrationSuite.scala | 2 +- .../storage/ShuffleBlockFetcherIteratorSuite.scala | 2 +- .../util/collection/ExternalSorterSuite.scala | 2 +- .../apache/spark/examples/DFSReadWriteTest.scala | 1 + .../apache/spark/examples/MiniReadWriteTest.scala | 1 + .../mllib/PowerIterationClusteringExample.scala| 2 +- .../spark/graphx/lib/ShortestPathsSuite.scala | 2 +- .../spark/ml/evaluation/ClusteringMetrics.scala| 1 + .../apache/spark/ml/feature/VectorIndexer.scala| 2 +- .../org/apache/spark/ml/feature/Word2Vec.scala | 2 +- .../apache/spark/ml/tree/impl/RandomForest.scala | 4 +-- .../spark/mllib/clustering/BisectingKMeans.scala | 2 +- .../mllib/linalg/distributed/BlockMatrix.scala | 4 +-- .../apache/spark/mllib/stat/test/ChiSqTest.scala | 1 + .../apache/spark/ml/recommendation/ALSSuite.scala | 8 +++--- .../apache/spark/mllib/feature/Word2VecSuite.scala | 12 - .../org/apache/spark/sql/types/Metadata.scala | 2 +- .../spark/sql/catalyst/analysis/Analyzer.scala | 3 ++- .../catalyst/catalog/ExternalCatalogUtils.scala| 2 +- .../sql/catalyst/catalog/SessionCatalog.scala | 2 +- .../spark/sql/catalyst/expressions/package.scala | 2 +-
(spark) branch master updated: [SPARK-45636][BUILD] Upgrade jersey to 2.41
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 4ae99f9320c [SPARK-45636][BUILD] Upgrade jersey to 2.41 4ae99f9320c is described below commit 4ae99f9320ca29193f7c0d6d54d61e5d3fd0b323 Author: YangJie AuthorDate: Sun Oct 29 09:18:07 2023 -0500 [SPARK-45636][BUILD] Upgrade jersey to 2.41 ### What changes were proposed in this pull request? This pr aims upgrade Jersey from 2.40 to 2.41. ### Why are the changes needed? The new version bring some improvements, like: - https://github.com/eclipse-ee4j/jersey/pull/5350 - https://github.com/eclipse-ee4j/jersey/pull/5365 - https://github.com/eclipse-ee4j/jersey/pull/5436 - https://github.com/eclipse-ee4j/jersey/pull/5296 and some bug fix, like: - https://github.com/eclipse-ee4j/jersey/pull/5359 - https://github.com/eclipse-ee4j/jersey/pull/5405 - https://github.com/eclipse-ee4j/jersey/pull/5423 - https://github.com/eclipse-ee4j/jersey/pull/5435 - https://github.com/eclipse-ee4j/jersey/pull/5445 The full release notes as follows: - https://github.com/eclipse-ee4j/jersey/releases/tag/2.41 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass GitHub Actions ### Was this patch authored or co-authored using generative AI tooling? No Closes #43490 from LuciferYang/SPARK-45636. Lead-authored-by: YangJie Co-authored-by: yangjie01 Signed-off-by: Sean Owen --- dev/deps/spark-deps-hadoop-3-hive-2.3 | 12 ++-- pom.xml | 2 +- 2 files changed, 7 insertions(+), 7 deletions(-) diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 b/dev/deps/spark-deps-hadoop-3-hive-2.3 index c6fa77c84ca..2bfd94b9d46 100644 --- a/dev/deps/spark-deps-hadoop-3-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3-hive-2.3 @@ -122,12 +122,12 @@ jaxb-runtime/2.3.2//jaxb-runtime-2.3.2.jar jcl-over-slf4j/2.0.9//jcl-over-slf4j-2.0.9.jar jdo-api/3.0.1//jdo-api-3.0.1.jar jdom2/2.0.6//jdom2-2.0.6.jar -jersey-client/2.40//jersey-client-2.40.jar -jersey-common/2.40//jersey-common-2.40.jar -jersey-container-servlet-core/2.40//jersey-container-servlet-core-2.40.jar -jersey-container-servlet/2.40//jersey-container-servlet-2.40.jar -jersey-hk2/2.40//jersey-hk2-2.40.jar -jersey-server/2.40//jersey-server-2.40.jar +jersey-client/2.41//jersey-client-2.41.jar +jersey-common/2.41//jersey-common-2.41.jar +jersey-container-servlet-core/2.41//jersey-container-servlet-core-2.41.jar +jersey-container-servlet/2.41//jersey-container-servlet-2.41.jar +jersey-hk2/2.41//jersey-hk2-2.41.jar +jersey-server/2.41//jersey-server-2.41.jar jettison/1.5.4//jettison-1.5.4.jar jetty-util-ajax/9.4.53.v20231009//jetty-util-ajax-9.4.53.v20231009.jar jetty-util/9.4.53.v20231009//jetty-util-9.4.53.v20231009.jar diff --git a/pom.xml b/pom.xml index 6488918326f..71c3044dd42 100644 --- a/pom.xml +++ b/pom.xml @@ -206,7 +206,7 @@ Please don't upgrade the version to 3.0.0+, Because it transitions Jakarta REST API from javax to jakarta package. --> -2.40 +2.41 2.12.5 3.5.2 3.0.0 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark-website) branch asf-site updated: [SPARK-45706][PYTHON][DOCS] Fix the links for Binder builds for Spark 3.5.0
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/spark-website.git The following commit(s) were added to refs/heads/asf-site by this push: new 0da360e961 [SPARK-45706][PYTHON][DOCS] Fix the links for Binder builds for Spark 3.5.0 0da360e961 is described below commit 0da360e9615eda835230931f83a2c4d82165050d Author: Hyukjin Kwon AuthorDate: Fri Oct 27 08:20:42 2023 -0500 [SPARK-45706][PYTHON][DOCS] Fix the links for Binder builds for Spark 3.5.0 This PR cherry-picks https://github.com/apache/spark/pull/43553 into Spark 3.5.0 PySpark documentation to recover the live notebooks Author: Hyukjin Kwon Closes #484 from HyukjinKwon/fix-binder-build. --- site/docs/3.5.0/api/python/getting_started/index.html | 8 site/docs/3.5.0/api/python/index.html | 8 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/site/docs/3.5.0/api/python/getting_started/index.html b/site/docs/3.5.0/api/python/getting_started/index.html index b5e4e54b66..0bd7ae9a7c 100644 --- a/site/docs/3.5.0/api/python/getting_started/index.html +++ b/site/docs/3.5.0/api/python/getting_started/index.html @@ -215,9 +215,9 @@ There are more guides shared with other languages such as at https://spark.apache.org/docs/latest/index.html#where-to-go-from-here";>the Spark documentation. There are live notebooks where you can try PySpark out without any other step: -https://mybinder.org/v2/gh/apache/spark/ce5ddad9903?filepath=python%2Fdocs%2Fsource%2Fgetting_started%2Fquickstart_df.ipynb";>Live Notebook: DataFrame -https://mybinder.org/v2/gh/apache/spark/ce5ddad9903?filepath=python%2Fdocs%2Fsource%2Fgetting_started%2Fquickstart_connect.ipynb";>Live Notebook: Spark Connect -https://mybinder.org/v2/gh/apache/spark/ce5ddad9903?filepath=python%2Fdocs%2Fsource%2Fgetting_started%2Fquickstart_ps.ipynb";>Live Notebook: pandas API on Spark +https://mybinder.org/v2/gh/apache/spark/270861a3cd6?filepath=python%2Fdocs%2Fsource%2Fgetting_started%2Fquickstart_df.ipynb";>Live Notebook: DataFrame +https://mybinder.org/v2/gh/apache/spark/270861a3cd6?filepath=python%2Fdocs%2Fsource%2Fgetting_started%2Fquickstart_connect.ipynb";>Live Notebook: Spark Connect +https://mybinder.org/v2/gh/apache/spark/270861a3cd6?filepath=python%2Fdocs%2Fsource%2Fgetting_started%2Fquickstart_ps.ipynb";>Live Notebook: pandas API on Spark The list below is the contents of this quickstart page: @@ -313,4 +313,4 @@ Created using http://sphinx-doc.org/";>Sphinx 3.0.4. - \ No newline at end of file + diff --git a/site/docs/3.5.0/api/python/index.html b/site/docs/3.5.0/api/python/index.html index faf6e558a5..1c757dc92b 100644 --- a/site/docs/3.5.0/api/python/index.html +++ b/site/docs/3.5.0/api/python/index.html @@ -183,7 +183,7 @@ PySpark Overview¶ Date: Sep 09, 2023 Version: 3.5.0 Useful links: -https://mybinder.org/v2/gh/apache/spark/ce5ddad9903?filepath=python%2Fdocs%2Fsource%2Fgetting_started%2Fquickstart_df.ipynb";>Live Notebook | https://github.com/apache/spark";>GitHub | https://issues.apache.org/jira/projects/SPARK/issues";>Issues | https://github.com/apache/spark/tree/ce5ddad9903/examples/src/main/python";>Examples | [...] +https://mybinder.org/v2/gh/apache/spark/270861a3cd6?filepath=python%2Fdocs%2Fsource%2Fgetting_started%2Fquickstart_df.ipynb";>Live Notebook | https://github.com/apache/spark";>GitHub | https://issues.apache.org/jira/projects/SPARK/issues";>Issues | https://github.com/apache/spark/tree/270861a3cd6/examples/src/main/python";>Examples | [...] PySpark is the Python API for Apache Spark. It enables you to perform real-time, large-scale data processing in a distributed environment using Python. It also provides a PySpark shell for interactively analyzing your data. @@ -237,7 +237,7 @@ Whether you use Python or SQL, the same underlying execution engine is used so you will always leverage the full power of Spark. Quickstart: DataFrame -https://mybinder.org/v2/gh/apache/spark/ce5ddad9903?filepath=python%2Fdocs%2Fsource%2Fgetting_started%2Fquickstart_df.ipynb";>Live Notebook: DataFrame +https://mybinder.org/v2/gh/apache/spark/270861a3cd6?filepath=python%2Fdocs%2Fsource%2Fgetting_started%2Fquickstart_df.ipynb";>Live Notebook: DataFrame Spark SQL API Reference Pandas API on Spark @@ -253,7 +253,7 @@ if you are new to Spark or deciding which API to use, we recommend using PySpark (see Spark SQL and DataFrames). Quickstart: Pandas API on Spark -https://mybinder.org/v2/gh/apache/spark/ce5ddad9903?filepath=python%2Fdocs%2Fsource%2Fgetting_started%2Fquickstart_ps.ipynb";>Live Notebook: pandas API on Spark +https://mybinder.org/v2/gh/apache/spark/270861a3cd6?filepath=python%2Fdocs%2
[spark] branch branch-3.4 updated: [SPARK-40154][PYTHON][DOCS] Correct storage level in Dataframe.cache docstring
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.4 by this push: new ecdb69f3db3 [SPARK-40154][PYTHON][DOCS] Correct storage level in Dataframe.cache docstring ecdb69f3db3 is described below commit ecdb69f3db3370aa7cf6ae8a52130379e465ca73 Author: Paul Staab AuthorDate: Wed Oct 25 07:36:15 2023 -0500 [SPARK-40154][PYTHON][DOCS] Correct storage level in Dataframe.cache docstring ### What changes were proposed in this pull request? Corrects the docstring `DataFrame.cache` to give the correct storage level after it changed with Spark 3.0. It seems that the docstring of `DataFrame.persist` was updated, but `cache` was forgotten. ### Why are the changes needed? The doctoring claims that `cache` uses serialised storage, but it actually uses deserialised storage. I confirmed that this is still the case with Spark 3.5.0 using the example code from the Jira ticket. ### Does this PR introduce _any_ user-facing change? Yes, the docstring changes. ### How was this patch tested? The Github actions workflow succeeded. ### Was this patch authored or co-authored using generative AI tooling? No Closes #43229 from paulstaab/SPARK-40154. Authored-by: Paul Staab Signed-off-by: Sean Owen (cherry picked from commit 94607dd001b133a25dc9865f25b3f9e7f5a5daa3) Signed-off-by: Sean Owen --- python/pyspark/sql/dataframe.py | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/python/pyspark/sql/dataframe.py b/python/pyspark/sql/dataframe.py index 518bc9867d7..14426c51439 100644 --- a/python/pyspark/sql/dataframe.py +++ b/python/pyspark/sql/dataframe.py @@ -1404,7 +1404,7 @@ class DataFrame(PandasMapOpsMixin, PandasConversionMixin): self.rdd.foreachPartition(f) # type: ignore[arg-type] def cache(self) -> "DataFrame": -"""Persists the :class:`DataFrame` with the default storage level (`MEMORY_AND_DISK`). +"""Persists the :class:`DataFrame` with the default storage level (`MEMORY_AND_DISK_DESER`). .. versionadded:: 1.3.0 @@ -1413,7 +1413,7 @@ class DataFrame(PandasMapOpsMixin, PandasConversionMixin): Notes - -The default storage level has changed to `MEMORY_AND_DISK` to match Scala in 2.0. +The default storage level has changed to `MEMORY_AND_DISK_DESER` to match Scala in 3.0. Returns --- - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.5 updated: [SPARK-40154][PYTHON][DOCS] Correct storage level in Dataframe.cache docstring
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new 9e4411e2450 [SPARK-40154][PYTHON][DOCS] Correct storage level in Dataframe.cache docstring 9e4411e2450 is described below commit 9e4411e2450d0503933626207b5e03308c30bc72 Author: Paul Staab AuthorDate: Wed Oct 25 07:36:15 2023 -0500 [SPARK-40154][PYTHON][DOCS] Correct storage level in Dataframe.cache docstring ### What changes were proposed in this pull request? Corrects the docstring `DataFrame.cache` to give the correct storage level after it changed with Spark 3.0. It seems that the docstring of `DataFrame.persist` was updated, but `cache` was forgotten. ### Why are the changes needed? The doctoring claims that `cache` uses serialised storage, but it actually uses deserialised storage. I confirmed that this is still the case with Spark 3.5.0 using the example code from the Jira ticket. ### Does this PR introduce _any_ user-facing change? Yes, the docstring changes. ### How was this patch tested? The Github actions workflow succeeded. ### Was this patch authored or co-authored using generative AI tooling? No Closes #43229 from paulstaab/SPARK-40154. Authored-by: Paul Staab Signed-off-by: Sean Owen (cherry picked from commit 94607dd001b133a25dc9865f25b3f9e7f5a5daa3) Signed-off-by: Sean Owen --- python/pyspark/sql/dataframe.py | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/python/pyspark/sql/dataframe.py b/python/pyspark/sql/dataframe.py index 30ed73d3c47..5707ae2a31f 100644 --- a/python/pyspark/sql/dataframe.py +++ b/python/pyspark/sql/dataframe.py @@ -1485,7 +1485,7 @@ class DataFrame(PandasMapOpsMixin, PandasConversionMixin): self.rdd.foreachPartition(f) # type: ignore[arg-type] def cache(self) -> "DataFrame": -"""Persists the :class:`DataFrame` with the default storage level (`MEMORY_AND_DISK`). +"""Persists the :class:`DataFrame` with the default storage level (`MEMORY_AND_DISK_DESER`). .. versionadded:: 1.3.0 @@ -1494,7 +1494,7 @@ class DataFrame(PandasMapOpsMixin, PandasConversionMixin): Notes - -The default storage level has changed to `MEMORY_AND_DISK` to match Scala in 2.0. +The default storage level has changed to `MEMORY_AND_DISK_DESER` to match Scala in 3.0. Returns --- - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (a073bf38c7d -> 94607dd001b)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from a073bf38c7d [SPARK-45209][CORE][UI] Flame Graph Support For Executor Thread Dump Page add 94607dd001b [SPARK-40154][PYTHON][DOCS] Correct storage level in Dataframe.cache docstring No new revisions were added by this update. Summary of changes: python/pyspark/sql/dataframe.py | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (2709426f0f6 -> 48e207f4a21)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 2709426f0f6 [SPARK-45541][CORE] Add SSLFactory add 48e207f4a21 [SPARK-45610][BUILD][CORE][SQL][SS][CONNECT][GRAPHX][DSTREAM][ML][MLLIB][K8S][YARN][SHELL][PYTHON][R][AVRO][UI][EXAMPLES] Fix the compilation warning "Auto-application to `()` is deprecated" and turn it into a compilation error No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/avro/AvroSuite.scala | 38 ++--- .../execution/benchmark/AvroReadBenchmark.scala| 10 +- .../execution/benchmark/AvroWriteBenchmark.scala | 4 +- .../org/apache/spark/sql/ClientE2ETestSuite.scala | 12 +- .../spark/sql/DataFrameNaFunctionSuite.scala | 2 +- .../sql/UserDefinedFunctionE2ETestSuite.scala | 2 +- .../spark/sql/connect/client/ArtifactManager.scala | 2 +- .../sql/connect/client/GrpcRetryHandler.scala | 2 +- .../execution/ExecuteResponseObserver.scala| 4 +- .../execution/SparkConnectPlanExecution.scala | 2 +- .../sql/connect/planner/SparkConnectPlanner.scala | 12 +- .../sql/connect/service/ExecuteEventsManager.scala | 2 +- .../sql/connect/service/SparkConnectServer.scala | 2 +- .../connect/planner/SparkConnectServiceSuite.scala | 2 +- .../connect/service/AddArtifactsHandlerSuite.scala | 12 +- .../service/ArtifactStatusesHandlerSuite.scala | 2 +- .../service/FetchErrorDetailsHandlerSuite.scala| 2 +- .../connect/service/InterceptorRegistrySuite.scala | 12 +- .../spark/sql/jdbc/DB2IntegrationSuite.scala | 8 +- .../sql/jdbc/MsSqlServerIntegrationSuite.scala | 6 +- .../spark/sql/jdbc/MySQLIntegrationSuite.scala | 4 +- .../spark/sql/jdbc/OracleIntegrationSuite.scala| 8 +- .../spark/sql/jdbc/PostgresIntegrationSuite.scala | 4 +- .../sql/kafka010/KafkaOffsetReaderConsumer.scala | 2 +- .../sql/kafka010/consumer/KafkaDataConsumer.scala | 2 +- .../sql/kafka010/KafkaContinuousSourceSuite.scala | 16 +- .../sql/kafka010/KafkaMicroBatchSourceSuite.scala | 8 +- .../spark/sql/kafka010/KafkaRelationSuite.scala| 36 ++-- .../kafka010/consumer/KafkaDataConsumerSuite.scala | 2 +- .../org/apache/spark/kafka010/KafkaTokenUtil.scala | 4 +- .../kafka010/DirectKafkaInputDStream.scala | 12 +- .../streaming/kafka010/KafkaDataConsumer.scala | 2 +- .../apache/spark/streaming/kafka010/KafkaRDD.scala | 14 +- .../kafka010/DirectKafkaStreamSuite.scala | 14 +- .../kafka010/KafkaDataConsumerSuite.scala | 4 +- .../spark/streaming/kafka010/KafkaRDDSuite.scala | 36 ++-- .../kinesis/KPLBasedKinesisTestUtils.scala | 2 +- .../kinesis/KinesisInputDStreamBuilderSuite.scala | 2 +- .../org/apache/spark/BarrierCoordinator.scala | 2 +- .../org/apache/spark/BarrierTaskContext.scala | 19 ++- .../main/scala/org/apache/spark/Heartbeater.scala | 2 +- .../scala/org/apache/spark/SecurityManager.scala | 4 +- .../main/scala/org/apache/spark/SparkContext.scala | 12 +- .../main/scala/org/apache/spark/TestUtils.scala| 2 +- .../apache/spark/api/java/JavaSparkContext.scala | 2 +- .../org/apache/spark/api/python/PythonRunner.scala | 22 +-- .../scala/org/apache/spark/api/r/BaseRRunner.scala | 2 +- .../org/apache/spark/deploy/JsonProtocol.scala | 2 +- .../apache/spark/deploy/SparkSubmitArguments.scala | 2 +- .../spark/deploy/history/FsHistoryProvider.scala | 2 +- .../apache/spark/deploy/history/HistoryPage.scala | 2 +- .../apache/spark/deploy/worker/CommandUtils.scala | 4 +- .../org/apache/spark/deploy/worker/Worker.scala| 2 +- .../spark/executor/ProcfsMetricsGetter.scala | 4 +- .../apache/spark/input/PortableDataStream.scala| 2 +- .../spark/internal/io/SparkHadoopWriter.scala | 4 +- .../apache/spark/launcher/LauncherBackend.scala| 4 +- .../apache/spark/memory/UnifiedMemoryManager.scala | 2 +- .../org/apache/spark/metrics/MetricsSystem.scala | 4 +- .../org/apache/spark/rdd/AsyncRDDActions.scala | 2 +- .../scala/org/apache/spark/rdd/HadoopRDD.scala | 2 +- .../main/scala/org/apache/spark/rdd/PipedRDD.scala | 4 +- .../apache/spark/rdd/ReliableCheckpointRDD.scala | 6 +- .../org/apache/spark/scheduler/DAGScheduler.scala | 16 +- .../spark/scheduler/StatsReportListener.scala | 2 +- .../scala/org/apache/spark/scheduler/Task.scala| 4 +- .../apache/spark/scheduler/TaskSchedulerImpl.scala | 4 +- .../apache/spark/scheduler/TaskSetManager.scala| 4 +- .../cluster/CoarseGrainedSchedulerBackend.scala| 6 +- .../cluster/StandaloneSchedulerBackend.scala | 2 +- .../apache/spark/serializer/KryoSerializer.scala | 10 +- .../apache/spark/status/AppStatusListener.scala| 10 +- .../apache/spark/status/ElementTrackingS
[spark] branch master updated: [SPARK-45484][SQL][FOLLOWUP][DOCS] Update the document of parquet compression codec
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 4023ec9bb44 [SPARK-45484][SQL][FOLLOWUP][DOCS] Update the document of parquet compression codec 4023ec9bb44 is described below commit 4023ec9bb4471efee36afcec041c114a4b86a2c8 Author: Jiaan Geng AuthorDate: Sat Oct 21 16:39:13 2023 -0500 [SPARK-45484][SQL][FOLLOWUP][DOCS] Update the document of parquet compression codec ### What changes were proposed in this pull request? This PR follows up https://github.com/apache/spark/pull/43310 to update the document of parquet compression codec. ### Why are the changes needed? Update the document of parquet compression codec. ### Does this PR introduce _any_ user-facing change? 'No'. ### How was this patch tested? N/A ### Was this patch authored or co-authored using generative AI tooling? 'No'. Closes #43464 from beliefer/SPARK-45484_followup. Authored-by: Jiaan Geng Signed-off-by: Sean Owen --- docs/sql-data-sources-parquet.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/sql-data-sources-parquet.md b/docs/sql-data-sources-parquet.md index 925e47504e5..c2af58248ea 100644 --- a/docs/sql-data-sources-parquet.md +++ b/docs/sql-data-sources-parquet.md @@ -423,7 +423,7 @@ Data source options of Parquet can be set via: compression snappy -Compression codec to use when saving to file. This can be one of the known case-insensitive shorten names (none, uncompressed, snappy, gzip, lzo, brotli, lz4, and zstd). This will override spark.sql.parquet.compression.codec. +Compression codec to use when saving to file. This can be one of the known case-insensitive shorten names (none, uncompressed, snappy, gzip, lzo, brotli, lz4, lz4_raw, and zstd). This will override spark.sql.parquet.compression.codec. write @@ -484,7 +484,7 @@ Configuration of Parquet can be done using the `setConf` method on `SparkSession Sets the compression codec used when writing Parquet files. If either compression or parquet.compression is specified in the table-specific options/properties, the precedence would be compression, parquet.compression, spark.sql.parquet.compression.codec. Acceptable values include: -none, uncompressed, snappy, gzip, lzo, brotli, lz4, zstd. +none, uncompressed, snappy, gzip, lzo, brotli, lz4, lz4_raw, zstd. Note that brotli requires BrotliCodec to be installed. 1.1.1 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [MINOR] Fix typos
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 920fb673b26 [MINOR] Fix typos 920fb673b26 is described below commit 920fb673b264c0bdcad0426020dedf57d8b11cc7 Author: shuoer86 <129674997+shuoe...@users.noreply.github.com> AuthorDate: Sat Oct 21 16:37:27 2023 -0500 [MINOR] Fix typos Closes #43434 from shuoer86/master. Authored-by: shuoer86 <129674997+shuoe...@users.noreply.github.com> Signed-off-by: Sean Owen --- binder/postBuild| 4 ++-- .../scala/org/apache/spark/sql/connect/service/SessionHolder.scala | 2 +- .../spark/sql/connect/plugin/SparkConnectPluginRegistrySuite.scala | 2 +- core/src/main/scala/org/apache/spark/storage/BlockInfoManager.scala | 2 +- core/src/main/scala/org/apache/spark/storage/DiskBlockManager.scala | 6 +++--- .../main/scala/org/apache/spark/ui/jobs/TaskThreadDumpPage.scala| 2 +- .../scala/org/apache/spark/status/AutoCleanupLiveUIDirSuite.scala | 2 +- docs/sql-ref-syntax-ddl-declare-variable.md | 2 +- 8 files changed, 11 insertions(+), 11 deletions(-) diff --git a/binder/postBuild b/binder/postBuild index 70ae23b3937..b6bdf72324c 100644 --- a/binder/postBuild +++ b/binder/postBuild @@ -38,7 +38,7 @@ else pip install plotly "pandas<2.0.0" "pyspark[sql,ml,mllib,pandas_on_spark]$SPECIFIER$VERSION" fi -# Set 'PYARROW_IGNORE_TIMEZONE' to surpress warnings from PyArrow. +# Set 'PYARROW_IGNORE_TIMEZONE' to suppress warnings from PyArrow. echo "export PYARROW_IGNORE_TIMEZONE=1" >> ~/.profile # Add sbin to PATH to run `start-connect-server.sh`. @@ -50,7 +50,7 @@ echo "export SPARK_HOME=${SPARK_HOME}" >> ~/.profile SPARK_VERSION=$(python -c "import pyspark; print(pyspark.__version__)") echo "export SPARK_VERSION=${SPARK_VERSION}" >> ~/.profile -# Surpress warnings from Spark jobs, and UI progress bar. +# Suppress warnings from Spark jobs, and UI progress bar. mkdir -p ~/.ipython/profile_default/startup echo """from pyspark.sql import SparkSession SparkSession.builder.config('spark.ui.showConsoleProgress', 'false').getOrCreate().sparkContext.setLogLevel('FATAL') diff --git a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SessionHolder.scala b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SessionHolder.scala index 27f471233f1..dcced21f371 100644 --- a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SessionHolder.scala +++ b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SessionHolder.scala @@ -77,7 +77,7 @@ case class SessionHolder(userId: String, sessionId: String, session: SparkSessio private[service] def addExecuteHolder(executeHolder: ExecuteHolder): Unit = { val oldExecute = executions.putIfAbsent(executeHolder.operationId, executeHolder) if (oldExecute != null) { - // the existance of this should alrady be checked by SparkConnectExecutionManager + // the existence of this should alrady be checked by SparkConnectExecutionManager throw new IllegalStateException( s"ExecuteHolder with opId=${executeHolder.operationId} already exists!") } diff --git a/connector/connect/server/src/test/scala/org/apache/spark/sql/connect/plugin/SparkConnectPluginRegistrySuite.scala b/connector/connect/server/src/test/scala/org/apache/spark/sql/connect/plugin/SparkConnectPluginRegistrySuite.scala index ea9ae3ed9d9..e1de6b04d21 100644 --- a/connector/connect/server/src/test/scala/org/apache/spark/sql/connect/plugin/SparkConnectPluginRegistrySuite.scala +++ b/connector/connect/server/src/test/scala/org/apache/spark/sql/connect/plugin/SparkConnectPluginRegistrySuite.scala @@ -226,7 +226,7 @@ class SparkConnectPluginRegistrySuite extends SharedSparkSession with SparkConne } } - test("Emtpy registries are really empty and work") { + test("Empty registries are really empty and work") { assert(SparkConnectPluginRegistry.loadRelationPlugins().isEmpty) assert(SparkConnectPluginRegistry.loadExpressionPlugins().isEmpty) assert(SparkConnectPluginRegistry.loadCommandPlugins().isEmpty) diff --git a/core/src/main/scala/org/apache/spark/storage/BlockInfoManager.scala b/core/src/main/scala/org/apache/spark/storage/BlockInfoManager.scala index f80190c96e8..73e72b7f1df 100644 --- a/core/src/main/scala/org/apache/spark/storage/BlockInfoManager.scala +++ b/core/src/main/scala/org/apache/spark/storage/BlockInfoManager.scala @@ -259,7 +259,7 @@ private[storage] class BlockInfoManager(trackingCacheVisibility:
[spark] branch master updated: [MINOR][DOCS] Fix one typo
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new f1ae56b152b [MINOR][DOCS] Fix one typo f1ae56b152b is described below commit f1ae56b152bdf19246d698b65e553790ad54306b Author: Ruifeng Zheng AuthorDate: Tue Oct 17 13:49:41 2023 -0500 [MINOR][DOCS] Fix one typo ### What changes were proposed in this pull request? Fix one typo ### Why are the changes needed? for doc ### Does this PR introduce _any_ user-facing change? yes ### How was this patch tested? I didn't find other similar typos in this page, so only one fix ### Was this patch authored or co-authored using generative AI tooling? no Closes #43401 from zhengruifeng/minor_typo_connect_overview. Authored-by: Ruifeng Zheng Signed-off-by: Sean Owen --- docs/spark-connect-overview.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/spark-connect-overview.md b/docs/spark-connect-overview.md index 82d84f39ca1..c7bad0994a8 100644 --- a/docs/spark-connect-overview.md +++ b/docs/spark-connect-overview.md @@ -261,7 +261,7 @@ spark-connect-repl --host myhost.com --port 443 --token ABCDEFG The supported list of CLI arguments may be found [here](https://github.com/apache/spark/blob/master/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/SparkConnectClientParser.scala#L48). - Configure programmatically with a connection ctring + Configure programmatically with a connection string The connection may also be programmatically created using _SparkSession#builder_ as in this example: {% highlight scala %} - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-45564][SQL] Simplify 'DataFrameStatFunctions.bloomFilter' with 'BloomFilterAggregate' expression
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 922844fff65 [SPARK-45564][SQL] Simplify 'DataFrameStatFunctions.bloomFilter' with 'BloomFilterAggregate' expression 922844fff65 is described below commit 922844fff65ac38fd93bd0c914dcc7e5cf879996 Author: Ruifeng Zheng AuthorDate: Tue Oct 17 10:11:36 2023 -0500 [SPARK-45564][SQL] Simplify 'DataFrameStatFunctions.bloomFilter' with 'BloomFilterAggregate' expression ### What changes were proposed in this pull request? Simplify 'DataFrameStatFunctions.bloomFilter' function with 'BloomFilterAggregate' expression ### Why are the changes needed? existing implementation was based on RDD, and it can be simplified by dataframe operations ### Does this PR introduce _any_ user-facing change? when the input parameters or datatypes are invalid, throw `AnalysisException` instead of `IllegalArgumentException` ### How was this patch tested? ci ### Was this patch authored or co-authored using generative AI tooling? no Closes #43391 from zhengruifeng/sql_reimpl_stat_bloomFilter. Authored-by: Ruifeng Zheng Signed-off-by: Sean Owen --- .../apache/spark/sql/DataFrameStatFunctions.scala | 68 +- 1 file changed, 14 insertions(+), 54 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala index 9d4f83c53a3..de3b100cd6a 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala @@ -23,6 +23,8 @@ import scala.jdk.CollectionConverters._ import org.apache.spark.annotation.Stable import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.expressions.Literal +import org.apache.spark.sql.catalyst.expressions.aggregate.BloomFilterAggregate import org.apache.spark.sql.execution.stat._ import org.apache.spark.sql.functions.col import org.apache.spark.sql.types._ @@ -535,7 +537,7 @@ final class DataFrameStatFunctions private[sql](df: DataFrame) { * @since 2.0.0 */ def bloomFilter(colName: String, expectedNumItems: Long, fpp: Double): BloomFilter = { -buildBloomFilter(Column(colName), expectedNumItems, -1L, fpp) +bloomFilter(Column(colName), expectedNumItems, fpp) } /** @@ -547,7 +549,8 @@ final class DataFrameStatFunctions private[sql](df: DataFrame) { * @since 2.0.0 */ def bloomFilter(col: Column, expectedNumItems: Long, fpp: Double): BloomFilter = { -buildBloomFilter(col, expectedNumItems, -1L, fpp) +val numBits = BloomFilter.optimalNumOfBits(expectedNumItems, fpp) +bloomFilter(col, expectedNumItems, numBits) } /** @@ -559,7 +562,7 @@ final class DataFrameStatFunctions private[sql](df: DataFrame) { * @since 2.0.0 */ def bloomFilter(colName: String, expectedNumItems: Long, numBits: Long): BloomFilter = { -buildBloomFilter(Column(colName), expectedNumItems, numBits, Double.NaN) +bloomFilter(Column(colName), expectedNumItems, numBits) } /** @@ -571,57 +574,14 @@ final class DataFrameStatFunctions private[sql](df: DataFrame) { * @since 2.0.0 */ def bloomFilter(col: Column, expectedNumItems: Long, numBits: Long): BloomFilter = { -buildBloomFilter(col, expectedNumItems, numBits, Double.NaN) - } - - private def buildBloomFilter(col: Column, expectedNumItems: Long, - numBits: Long, - fpp: Double): BloomFilter = { -val singleCol = df.select(col) -val colType = singleCol.schema.head.dataType - -require(colType == StringType || colType.isInstanceOf[IntegralType], - s"Bloom filter only supports string type and integral types, but got $colType.") - -val updater: (BloomFilter, InternalRow) => Unit = colType match { - // For string type, we can get bytes of our `UTF8String` directly, and call the `putBinary` - // instead of `putString` to avoid unnecessary conversion. - case StringType => (filter, row) => filter.putBinary(row.getUTF8String(0).getBytes) - case ByteType => (filter, row) => filter.putLong(row.getByte(0)) - case ShortType => (filter, row) => filter.putLong(row.getShort(0)) - case IntegerType => (filter, row) => filter.putLong(row.getInt(0)) - case LongType => (filter, row) => filter.putLong(row.getLong(0)) - case _ => -throw new IllegalArgumentException( - s"Bloom filter only supports string type and integral types, " + -s"and does not sup
[spark] branch master updated: [SPARK-45512][CORE][SQL][SS][DSTREAM] Fix compilation warnings related to `other-nullary-override`
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 3b46cc81614 [SPARK-45512][CORE][SQL][SS][DSTREAM] Fix compilation warnings related to `other-nullary-override` 3b46cc81614 is described below commit 3b46cc816143d5bb553e86e8b716c28982cb5748 Author: YangJie AuthorDate: Tue Oct 17 07:34:06 2023 -0500 [SPARK-45512][CORE][SQL][SS][DSTREAM] Fix compilation warnings related to `other-nullary-override` ### What changes were proposed in this pull request? This PR fixes two compilation warnings related to `other-nullary-override` ``` [error] /Users/yangjie01/SourceCode/git/spark-mine-sbt/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/CloseableIterator.scala:36:16: method with a single empty parameter list overrides method hasNext in trait Iterator defined without a parameter list [quickfixable] [error] Applicable -Wconf / nowarn filters for this fatal warning: msg=, cat=other-nullary-override, site=org.apache.spark.sql.connect.client.WrappedCloseableIterator [error] override def hasNext(): Boolean = innerIterator.hasNext [error]^ [error] /Users/yangjie01/SourceCode/git/spark-mine-sbt/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/ExecutePlanResponseReattachableIterator.scala:136:16: method without a parameter list overrides method hasNext in class WrappedCloseableIterator defined with a single empty parameter list [quickfixable] [error] Applicable -Wconf / nowarn filters for this fatal warning: msg=, cat=other-nullary-override, site=org.apache.spark.sql.connect.client.ExecutePlanResponseReattachableIterator [error] override def hasNext: Boolean = synchronized { [error]^ [error] /Users/yangjie01/SourceCode/git/spark-mine-sbt/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/GrpcExceptionConverter.scala:73:20: method without a parameter list overrides method hasNext in class WrappedCloseableIterator defined with a single empty parameter list [quickfixable] [error] Applicable -Wconf / nowarn filters for this fatal warning: msg=, cat=other-nullary-override, site=org.apache.spark.sql.connect.client.GrpcExceptionConverter.convertIterator [error] override def hasNext: Boolean = { [error]^ [error] /Users/yangjie01/SourceCode/git/spark-mine-sbt/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/GrpcRetryHandler.scala:77:18: method without a parameter list overrides method next in class WrappedCloseableIterator defined with a single empty parameter list [quickfixable] [error] Applicable -Wconf / nowarn filters for this fatal warning: msg=, cat=other-nullary-override, site=org.apache.spark.sql.connect.client.GrpcRetryHandler.RetryIterator [error] override def next: U = { [error] ^ [error] /Users/yangjie01/SourceCode/git/spark-mine-sbt/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/GrpcRetryHandler.scala:81:18: method without a parameter list overrides method hasNext in class WrappedCloseableIterator defined with a single empty parameter list [quickfixable] [error] Applicable -Wconf / nowarn filters for this fatal warning: msg=, cat=other-nullary-override, site=org.apache.spark.sql.connect.client.GrpcRetryHandler.RetryIterator [error] override def hasNext: Boolean = { [error] ``` and removes the corresponding suppression rules from the compilation options ``` "-Wconf:cat=other-nullary-override:wv", ``` On the other hand, the code corresponding to the following three suppression rules no longer exists, so the corresponding suppression rules were also cleaned up in this pr. ``` "-Wconf:cat=lint-multiarg-infix:wv", "-Wconf:msg=method with a single empty parameter list overrides method without any parameter list:s", "-Wconf:msg=method without a parameter list overrides a method with a single empty one:s", ``` ### Why are the changes needed? Code clean up. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass GitHub Actions ### Was this patch authored or co-authored using generative AI tooling? No Closes #43332 from LuciferYang/other-nullary-override. Lead-authored-by: YangJie Co-authored-by: yangjie01 Signed-off-by: Sean Owen --- .../org/apache/spark/sql/avro/AvroRowReaderSuite.scala | 10 +- .../spark/sql/connect/client/CloseableIterator.scala | 2 +- .../ExecutePlanResponseReattachableIterator.scala | 4 ++-- .../spark/sql/conne
[spark] branch master updated: [SPARK-45467][CORE] Replace `Proxy.getProxyClass()` with `Proxy.newProxyInstance().getClass`
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new acd5dc499d1 [SPARK-45467][CORE] Replace `Proxy.getProxyClass()` with `Proxy.newProxyInstance().getClass` acd5dc499d1 is described below commit acd5dc499d139ce8b2571a69beab0f971947adb4 Author: YangJie AuthorDate: Wed Oct 11 08:49:09 2023 -0500 [SPARK-45467][CORE] Replace `Proxy.getProxyClass()` with `Proxy.newProxyInstance().getClass` ### What changes were proposed in this pull request? This pr replace `Proxy.getProxyClass()` with `Proxy.newProxyInstance().getClass` to clean up deprecated api usage ref to https://github.com/openjdk/jdk/blob/dfacda488bfbe2e11e8d607a6d08527710286982/src/java.base/share/classes/java/lang/reflect/Proxy.java#L376-L391 ``` * deprecated Proxy classes generated in a named module are encapsulated * and not accessible to code outside its module. * {link Constructor#newInstance(Object...) Constructor.newInstance} * will throw {code IllegalAccessException} when it is called on * an inaccessible proxy class. * Use {link #newProxyInstance(ClassLoader, Class[], InvocationHandler)} * to create a proxy instance instead. * * see Package and Module Membership of Proxy Class * revised 9 */ Deprecated CallerSensitive public static Class getProxyClass(ClassLoader loader, Class... interfaces) throws IllegalArgumentException ``` For the `InvocationHandler`, since the `invoke` method doesn't need to be actually called in the current scenario, but the `InvocationHandler` can't be null, a new `DummyInvocationHandler` has been added as follows: ``` private[spark] object DummyInvocationHandler extends InvocationHandler { override def invoke(proxy: Any, method: Method, args: Array[AnyRef]): AnyRef = { throw new UnsupportedOperationException("Not implemented") } } ``` ### Why are the changes needed? Clean up deprecated API usage. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass GitHub Actions ### Was this patch authored or co-authored using generative AI tooling? No Closes #43291 from LuciferYang/SPARK-45467. Lead-authored-by: YangJie Co-authored-by: yangjie01 Signed-off-by: Sean Owen --- .../main/scala/org/apache/spark/serializer/JavaSerializer.scala | 9 - 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/core/src/main/scala/org/apache/spark/serializer/JavaSerializer.scala b/core/src/main/scala/org/apache/spark/serializer/JavaSerializer.scala index 95d2bdc39e1..856e639fcd9 100644 --- a/core/src/main/scala/org/apache/spark/serializer/JavaSerializer.scala +++ b/core/src/main/scala/org/apache/spark/serializer/JavaSerializer.scala @@ -18,6 +18,7 @@ package org.apache.spark.serializer import java.io._ +import java.lang.reflect.{InvocationHandler, Method, Proxy} import java.nio.ByteBuffer import scala.reflect.ClassTag @@ -79,7 +80,7 @@ private[spark] class JavaDeserializationStream(in: InputStream, loader: ClassLoa // scalastyle:off classforname val resolved = ifaces.map(iface => Class.forName(iface, false, loader)) // scalastyle:on classforname - java.lang.reflect.Proxy.getProxyClass(loader, resolved: _*) + Proxy.newProxyInstance(loader, resolved, DummyInvocationHandler).getClass } } @@ -88,6 +89,12 @@ private[spark] class JavaDeserializationStream(in: InputStream, loader: ClassLoa def close(): Unit = { objIn.close() } } +private[spark] object DummyInvocationHandler extends InvocationHandler { + override def invoke(proxy: Any, method: Method, args: Array[AnyRef]): AnyRef = { +throw new UnsupportedOperationException("Not implemented") + } +} + private object JavaDeserializationStream { val primitiveMappings = Map[String, Class[_]]( - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (11af786b35c -> 97218051308)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 11af786b35c [SPARK-45451][SQL] Make the default storage level of dataset cache configurable add 97218051308 [SPARK-45496][CORE][DSTREAM] Fix the compilation warning related to `other-pure-statement` No new revisions were added by this update. Summary of changes: .../org/apache/spark/scheduler/OutputCommitCoordinatorSuite.scala | 2 +- pom.xml | 3 --- project/SparkBuild.scala | 4 .../org/apache/spark/streaming/util/FileBasedWriteAheadLog.scala | 2 +- 4 files changed, 2 insertions(+), 9 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-45377][CORE] Handle InputStream in NettyLogger
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new cdbb301143d [SPARK-45377][CORE] Handle InputStream in NettyLogger cdbb301143d is described below commit cdbb301143de2e9a0ea525d20867948f49863842 Author: Hasnain Lakhani AuthorDate: Mon Oct 2 08:27:50 2023 -0500 [SPARK-45377][CORE] Handle InputStream in NettyLogger ### What changes were proposed in this pull request? Handle `InputStream`s in the `NettyLogger` so we can print out how many available bytes there are. ### Why are the changes needed? As part of the SSL support we are going to transfer `InputStream`s via Netty, and this functionality makes it easy to see the size of the streams in the log at a glance. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI. Tested as part of the changes in https://github.com/apache/spark/pull/42685 which this is split out of, I observed the logs there. ### Was this patch authored or co-authored using generative AI tooling? No Closes #43165 from hasnain-db/spark-tls-netty-logger. Authored-by: Hasnain Lakhani Signed-off-by: Sean Owen --- .../main/java/org/apache/spark/network/util/NettyLogger.java | 11 +++ 1 file changed, 11 insertions(+) diff --git a/common/network-common/src/main/java/org/apache/spark/network/util/NettyLogger.java b/common/network-common/src/main/java/org/apache/spark/network/util/NettyLogger.java index 9398726a926..f4c0df6239d 100644 --- a/common/network-common/src/main/java/org/apache/spark/network/util/NettyLogger.java +++ b/common/network-common/src/main/java/org/apache/spark/network/util/NettyLogger.java @@ -17,6 +17,9 @@ package org.apache.spark.network.util; +import java.io.IOException; +import java.io.InputStream; + import io.netty.buffer.ByteBuf; import io.netty.buffer.ByteBufHolder; import io.netty.channel.ChannelHandlerContext; @@ -42,6 +45,14 @@ public class NettyLogger { } else if (arg instanceof ByteBufHolder) { return format(ctx, eventName) + " " + ((ByteBufHolder) arg).content().readableBytes() + "B"; + } else if (arg instanceof InputStream) { +int available = -1; +try { + available = ((InputStream) arg).available(); +} catch (IOException ex) { + // Swallow, but return -1 to indicate an error happened +} +return format(ctx, eventName, arg) + " " + available + "B"; } else { return super.format(ctx, eventName, arg); } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-45389][SQL][HIVE] Correct MetaException matching rule on getting partition metadata
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 8b3ad2fc329 [SPARK-45389][SQL][HIVE] Correct MetaException matching rule on getting partition metadata 8b3ad2fc329 is described below commit 8b3ad2fc329e1813366430df7189d27b17133283 Author: Cheng Pan AuthorDate: Mon Oct 2 08:25:51 2023 -0500 [SPARK-45389][SQL][HIVE] Correct MetaException matching rule on getting partition metadata ### What changes were proposed in this pull request? This PR aims to fix the HMS call fallback logic introduced in SPARK-35437. ```patch try { ... hive.getPartitionNames ... hive.getPartitionsByNames } catch { - case ex: InvocationTargetException if ex.getCause.isInstanceOf[MetaException] => + case ex: HiveException if ex.getCause.isInstanceOf[MetaException] => ... } ``` ### Why are the changes needed? Directly method call won't throw `InvocationTargetException`, and check the code of `hive.getPartitionNames` and `hive.getPartitionsByNames`, both of them will wrap a `HiveException` if `MetaException` throws. ### Does this PR introduce _any_ user-facing change? Yes, it should be a bug fix. ### How was this patch tested? Pass GA and code review. (I'm not sure how to construct/simulate a MetaException during the HMS thrift call with the current HMS testing infrastructure) ### Was this patch authored or co-authored using generative AI tooling? No. Closes #43191 from pan3793/SPARK-45389. Authored-by: Cheng Pan Signed-off-by: Sean Owen --- sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala b/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala index 64aa7d2d6fa..9943c0178fc 100644 --- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala +++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala @@ -438,7 +438,7 @@ private[client] class Shim_v2_0 extends Shim with Logging { recordHiveCall() hive.getPartitionsByNames(table, partNames.asJava) } catch { -case ex: InvocationTargetException if ex.getCause.isInstanceOf[MetaException] => +case ex: HiveException if ex.getCause.isInstanceOf[MetaException] => logWarning("Caught Hive MetaException attempting to get partition metadata by " + "filter from client side. Falling back to fetching all partition metadata", ex) recordHiveCall() - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.5 updated: [MINOR][DOCS] Fix Python code sample for StreamingQueryListener: Reporting Metrics programmatically using Asynchronous APIs
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new 845e4f6c5bc [MINOR][DOCS] Fix Python code sample for StreamingQueryListener: Reporting Metrics programmatically using Asynchronous APIs 845e4f6c5bc is described below commit 845e4f6c5bcf3a368ee78757f3a74b390cdce5c0 Author: Peter Kaszt AuthorDate: Mon Oct 2 07:48:56 2023 -0500 [MINOR][DOCS] Fix Python code sample for StreamingQueryListener: Reporting Metrics programmatically using Asynchronous APIs Fix Python language code sample in the docs for _StreamingQueryListener_: Reporting Metrics programmatically using Asynchronous APIs section. ### What changes were proposed in this pull request? The code sample in the [Reporting Metrics programmatically using Asynchronous APIs](https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#reporting-metrics-programmatically-using-asynchronous-apis) section was this: ``` spark = ... class Listener(StreamingQueryListener): def onQueryStarted(self, event): print("Query started: " + queryStarted.id) def onQueryProgress(self, event): println("Query terminated: " + queryTerminated.id) def onQueryTerminated(self, event): println("Query made progress: " + queryProgress.progress) spark.streams.addListener(Listener()) ``` Which is not a proper Python code, and has QueryProgress and QueryTerminated prints mixed. Proposed change/fix: ``` spark = ... class Listener(StreamingQueryListener): def onQueryStarted(self, event): print("Query started: " + queryStarted.id) def onQueryProgress(self, event): print("Query made progress: " + queryProgress.progress) def onQueryTerminated(self, event): print("Query terminated: " + queryTerminated.id) spark.streams.addListener(Listener()) ``` ### Why are the changes needed? To fix docimentation errors. ### Does this PR introduce _any_ user-facing change? Yes. -> Sample python code snippet is fixed in docs (see above). ### How was this patch tested? Checked with github's .md preview, and built the docs according to the readme. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #43190 from kasztp/master. Authored-by: Peter Kaszt Signed-off-by: Sean Owen (cherry picked from commit d708fd7b68bf0c9964e861cb2c81818d17d7136e) Signed-off-by: Sean Owen --- docs/structured-streaming-programming-guide.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/structured-streaming-programming-guide.md b/docs/structured-streaming-programming-guide.md index 76a22621a0e..3e87c45a349 100644 --- a/docs/structured-streaming-programming-guide.md +++ b/docs/structured-streaming-programming-guide.md @@ -3831,10 +3831,10 @@ class Listener(StreamingQueryListener): print("Query started: " + queryStarted.id) def onQueryProgress(self, event): -println("Query terminated: " + queryTerminated.id) +print("Query made progress: " + queryProgress.progress) def onQueryTerminated(self, event): -println("Query made progress: " + queryProgress.progress) + print("Query terminated: " + queryTerminated.id) spark.streams.addListener(Listener()) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [MINOR][DOCS] Fix Python code sample for StreamingQueryListener: Reporting Metrics programmatically using Asynchronous APIs
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new d708fd7b68b [MINOR][DOCS] Fix Python code sample for StreamingQueryListener: Reporting Metrics programmatically using Asynchronous APIs d708fd7b68b is described below commit d708fd7b68bf0c9964e861cb2c81818d17d7136e Author: Peter Kaszt AuthorDate: Mon Oct 2 07:48:56 2023 -0500 [MINOR][DOCS] Fix Python code sample for StreamingQueryListener: Reporting Metrics programmatically using Asynchronous APIs Fix Python language code sample in the docs for _StreamingQueryListener_: Reporting Metrics programmatically using Asynchronous APIs section. ### What changes were proposed in this pull request? The code sample in the [Reporting Metrics programmatically using Asynchronous APIs](https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#reporting-metrics-programmatically-using-asynchronous-apis) section was this: ``` spark = ... class Listener(StreamingQueryListener): def onQueryStarted(self, event): print("Query started: " + queryStarted.id) def onQueryProgress(self, event): println("Query terminated: " + queryTerminated.id) def onQueryTerminated(self, event): println("Query made progress: " + queryProgress.progress) spark.streams.addListener(Listener()) ``` Which is not a proper Python code, and has QueryProgress and QueryTerminated prints mixed. Proposed change/fix: ``` spark = ... class Listener(StreamingQueryListener): def onQueryStarted(self, event): print("Query started: " + queryStarted.id) def onQueryProgress(self, event): print("Query made progress: " + queryProgress.progress) def onQueryTerminated(self, event): print("Query terminated: " + queryTerminated.id) spark.streams.addListener(Listener()) ``` ### Why are the changes needed? To fix docimentation errors. ### Does this PR introduce _any_ user-facing change? Yes. -> Sample python code snippet is fixed in docs (see above). ### How was this patch tested? Checked with github's .md preview, and built the docs according to the readme. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #43190 from kasztp/master. Authored-by: Peter Kaszt Signed-off-by: Sean Owen --- docs/structured-streaming-programming-guide.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/structured-streaming-programming-guide.md b/docs/structured-streaming-programming-guide.md index 70e763be0d7..774422a9cd9 100644 --- a/docs/structured-streaming-programming-guide.md +++ b/docs/structured-streaming-programming-guide.md @@ -3837,10 +3837,10 @@ class Listener(StreamingQueryListener): print("Query started: " + queryStarted.id) def onQueryProgress(self, event): -println("Query terminated: " + queryTerminated.id) +print("Query made progress: " + queryProgress.progress) def onQueryTerminated(self, event): -println("Query made progress: " + queryProgress.progress) + print("Query terminated: " + queryTerminated.id) spark.streams.addListener(Listener()) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-45338][SQL][FOLLOWUP] Remove useless `toSeq`
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 5c4aef4d4ca [SPARK-45338][SQL][FOLLOWUP] Remove useless `toSeq` 5c4aef4d4ca is described below commit 5c4aef4d4caf753ce9c45d07472df67479371738 Author: Jia Fan AuthorDate: Thu Sep 28 19:10:03 2023 -0500 [SPARK-45338][SQL][FOLLOWUP] Remove useless `toSeq` ### What changes were proposed in this pull request? This is a follow up PR for #43126 , remove useless invoke `toSeq` ### Why are the changes needed? Remove useless convert. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? exist test ### Was this patch authored or co-authored using generative AI tooling? No Closes #43172 from Hisoka-X/SPARK-45338-followup-remove-toseq. Authored-by: Jia Fan Signed-off-by: Sean Owen --- .../apache/spark/sql/hive/thriftserver/SparkGetColumnsOperation.scala | 2 +- .../apache/spark/sql/hive/thriftserver/SparkGetFunctionsOperation.scala | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkGetColumnsOperation.scala b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkGetColumnsOperation.scala index 990a7162ea4..5dd8caf3f22 100644 --- a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkGetColumnsOperation.scala +++ b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkGetColumnsOperation.scala @@ -87,7 +87,7 @@ private[hive] class SparkGetColumnsOperation( }.toMap if (isAuthV2Enabled) { - val privObjs = getPrivObjs(db2Tabs).toSeq.asJava + val privObjs = getPrivObjs(db2Tabs).asJava authorizeMetaGets(HiveOperationType.GET_COLUMNS, privObjs, cmdStr) } diff --git a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkGetFunctionsOperation.scala b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkGetFunctionsOperation.scala index 7fa492befa0..53a94a128c0 100644 --- a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkGetFunctionsOperation.scala +++ b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkGetFunctionsOperation.scala @@ -68,7 +68,7 @@ private[hive] class SparkGetFunctionsOperation( if (isAuthV2Enabled) { // authorize this call on the schema objects val privObjs = -HivePrivilegeObjectUtils.getHivePrivDbObjects(matchingDbs.toSeq.asJava) +HivePrivilegeObjectUtils.getHivePrivDbObjects(matchingDbs.asJava) authorizeMetaGets(HiveOperationType.GET_FUNCTIONS, privObjs, cmdStr) } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-44895][CORE][UI] Add 'daemon', 'priority' for ThreadStackTrace
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 6341310711e [SPARK-44895][CORE][UI] Add 'daemon', 'priority' for ThreadStackTrace 6341310711e is described below commit 6341310711ee0e3edbdd42aaeaf806cad4edefb5 Author: Kent Yao AuthorDate: Thu Sep 28 18:04:03 2023 -0500 [SPARK-44895][CORE][UI] Add 'daemon', 'priority' for ThreadStackTrace ### What changes were proposed in this pull request? Since version 9, Java has supported the 'daemon' and 'priority' fields in ThreadInfo. In this PR, we extract them from ThreadInfo to ThreadStackTrace ### Why are the changes needed? more information for thread pages in UI and rest APIs ### Does this PR introduce _any_ user-facing change? yes, ThreadStackTrace changes ### How was this patch tested? new tests ### Was this patch authored or co-authored using generative AI tooling? no Closes #43095 from yaooqinn/SPARK-44895. Authored-by: Kent Yao Signed-off-by: Sean Owen --- .../main/scala/org/apache/spark/status/api/v1/api.scala| 10 ++ core/src/main/scala/org/apache/spark/util/Utils.scala | 4 +++- .../test/scala/org/apache/spark/ui/UISeleniumSuite.scala | 14 ++ 3 files changed, 23 insertions(+), 5 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/status/api/v1/api.scala b/core/src/main/scala/org/apache/spark/status/api/v1/api.scala index 3e4e2f17a77..7a0c69e2948 100644 --- a/core/src/main/scala/org/apache/spark/status/api/v1/api.scala +++ b/core/src/main/scala/org/apache/spark/status/api/v1/api.scala @@ -540,19 +540,21 @@ case class ThreadStackTrace( lockName: Option[String], lockOwnerName: Option[String], suspended: Boolean, -inNative: Boolean) { +inNative: Boolean, +isDaemon: Boolean, +priority: Int) { /** * Returns a string representation of this thread stack trace * w.r.t java.lang.management.ThreadInfo(JDK 8)'s toString. * - * TODO(SPARK-44895): Considering 'daemon', 'priority' from higher JDKs - * * TODO(SPARK-44896): Also considering adding information os_prio, cpu, elapsed, tid, nid, etc., * from the jstack tool */ override def toString: String = { -val sb = new StringBuilder(s""""$threadName" Id=$threadId $threadState""") +val daemon = if (isDaemon) " daemon" else "" +val sb = new StringBuilder( + s""""$threadName"$daemon prio=$priority Id=$threadId $threadState""") lockName.foreach(lock => sb.append(s" on $lock")) lockOwnerName.foreach { owner => sb.append(s"""owned by "$owner"""") diff --git a/core/src/main/scala/org/apache/spark/util/Utils.scala b/core/src/main/scala/org/apache/spark/util/Utils.scala index 48dfbecb7cd..dcffa99dc64 100644 --- a/core/src/main/scala/org/apache/spark/util/Utils.scala +++ b/core/src/main/scala/org/apache/spark/util/Utils.scala @@ -2196,7 +2196,9 @@ private[spark] object Utils Option(threadInfo.getLockName), Option(threadInfo.getLockOwnerName), threadInfo.isSuspended, - threadInfo.isInNative) + threadInfo.isInNative, + threadInfo.isDaemon, + threadInfo.getPriority) } /** diff --git a/core/src/test/scala/org/apache/spark/ui/UISeleniumSuite.scala b/core/src/test/scala/org/apache/spark/ui/UISeleniumSuite.scala index dd9927d7ba1..7e74cc9287f 100644 --- a/core/src/test/scala/org/apache/spark/ui/UISeleniumSuite.scala +++ b/core/src/test/scala/org/apache/spark/ui/UISeleniumSuite.scala @@ -885,6 +885,20 @@ class UISeleniumSuite extends SparkFunSuite with WebBrowser with Matchers { } } + test("SPARK-44895: Add 'daemon', 'priority' for ThreadStackTrace") { +withSpark(newSparkContext()) { sc => + val uiThreads = getJson(sc.ui.get, "executors/driver/threads") +.children +.filter(v => (v \ "threadName").extract[String].matches("SparkUI-\\d+")) + val priority = Thread.currentThread().getPriority + + uiThreads.foreach { v => +assert((v \ "isDaemon").extract[Boolean]) +assert((v \ "priority").extract[Int] === priority) + } +} + } + def goToUi(sc: SparkContext, path: String): Unit = { goToUi(sc.ui.get, path) } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-45364][INFRA][BUILD] Clean up the unnecessary Scala 2.12 logical in SparkBuild
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 187e9a85175 [SPARK-45364][INFRA][BUILD] Clean up the unnecessary Scala 2.12 logical in SparkBuild 187e9a85175 is described below commit 187e9a851758c0e9cec11edab2bc07d6f4404001 Author: panbingkun AuthorDate: Thu Sep 28 08:36:08 2023 -0500 [SPARK-45364][INFRA][BUILD] Clean up the unnecessary Scala 2.12 logical in SparkBuild ### What changes were proposed in this pull request? The pr aims to clean up the unnecessary Scala 2.12 logical in SparkBuild. ### Why are the changes needed? Spark 4.0 no longer supports Scala 2.12. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #43158 from panbingkun/SPARK-45364. Authored-by: panbingkun Signed-off-by: Sean Owen --- project/SparkBuild.scala | 5 + 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/project/SparkBuild.scala b/project/SparkBuild.scala index 85ffda304bc..13c92142d46 100644 --- a/project/SparkBuild.scala +++ b/project/SparkBuild.scala @@ -352,10 +352,7 @@ object SparkBuild extends PomBuild { "org.apache.spark.util.collection" ).mkString(":"), "-doc-title", "Spark " + version.value.replaceAll("-SNAPSHOT", "") + " ScalaDoc" -) ++ { - // Do not attempt to scaladoc javadoc comments under 2.12 since it can't handle inner classes - if (scalaBinaryVersion.value == "2.12") Seq("-no-java-comments") else Seq.empty -}, +), // disable Mima check for all modules, // to be enabled in specific ones that have previous artifacts - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (6a9d35f766d -> c5967310740)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 6a9d35f766d [SPARK-45354][SQL] Resolve functions bottom-up add c5967310740 [SPARK-2][MESOS] Remove Mesos support No new revisions were added by this update. Summary of changes: .github/labeler.yml| 3 - .github/workflows/benchmark.yml| 2 +- .github/workflows/build_and_test.yml | 4 +- .github/workflows/maven_test.yml | 12 +- LICENSE-binary | 3 +- NOTICE-binary | 3 - R/pkg/tests/fulltests/test_sparkR.R| 6 +- README.md | 2 +- assembly/pom.xml | 10 - .../shuffle/protocol/BlockTransferMessage.java | 4 - .../shuffle/protocol/mesos/RegisterDriver.java | 77 -- .../protocol/mesos/ShuffleServiceHeartbeat.java| 53 -- conf/spark-env.sh.template | 1 - .../main/scala/org/apache/spark/SparkConf.scala| 2 +- .../main/scala/org/apache/spark/SparkContext.scala | 14 +- .../apache/spark/api/java/JavaSparkContext.scala | 10 +- .../org/apache/spark/deploy/PythonRunner.scala | 1 - .../org/apache/spark/deploy/SparkSubmit.scala | 73 +- .../apache/spark/deploy/SparkSubmitArguments.scala | 8 +- .../spark/deploy/history/HistoryServer.scala | 2 +- .../spark/deploy/rest/RestSubmissionClient.scala | 4 +- .../org/apache/spark/deploy/security/README.md | 2 +- .../scala/org/apache/spark/executor/Executor.scala | 5 +- .../org/apache/spark/internal/config/package.scala | 9 +- .../org/apache/spark/metrics/MetricsSystem.scala | 3 - .../apache/spark/resource/ResourceProfile.scala| 4 +- .../apache/spark/scheduler/SchedulerBackend.scala | 2 +- .../apache/spark/scheduler/TaskSchedulerImpl.scala | 2 +- .../apache/spark/scheduler/TaskSetManager.scala| 1 - .../cluster/CoarseGrainedSchedulerBackend.scala| 5 +- .../main/scala/org/apache/spark/util/Utils.scala | 13 +- .../org/apache/spark/SecurityManagerSuite.scala| 2 +- .../org/apache/spark/deploy/SparkSubmitSuite.scala | 22 - .../deploy/rest/StandaloneRestSubmitSuite.scala| 6 - dev/create-release/release-build.sh| 2 +- dev/create-release/releaseutils.py | 1 - dev/deps/spark-deps-hadoop-3-hive-2.3 | 1 - dev/lint-java | 2 +- dev/mima | 2 +- dev/sbt-checkstyle | 2 +- dev/scalastyle | 2 +- dev/sparktestsupport/modules.py| 8 - dev/test-dependencies.sh | 2 +- docs/_config.yml | 3 +- docs/_layouts/global.html | 1 - docs/building-spark.md | 6 +- docs/cluster-overview.md | 8 +- docs/configuration.md | 34 +- docs/core-migration-guide.md | 2 + docs/hardware-provisioning.md | 3 +- docs/index.md | 3 - docs/job-scheduling.md | 23 +- docs/monitoring.md | 8 - docs/rdd-programming-guide.md | 2 +- docs/running-on-mesos.md | 901 docs/security.md | 26 +- docs/spark-standalone.md | 2 +- docs/streaming-programming-guide.md| 21 +- docs/submitting-applications.md| 16 - .../spark/launcher/AbstractCommandBuilder.java | 1 - .../spark/launcher/SparkClassCommandBuilder.java | 13 +- .../launcher/SparkSubmitCommandBuilderSuite.java | 4 - pom.xml| 7 - project/SparkBuild.scala | 4 +- python/README.md | 2 +- python/docs/source/user_guide/python_packaging.rst | 2 +- python/pyspark/context.py | 2 +- .../scala/org/apache/spark/repl/ReplSuite.scala| 24 - .../deploy/k8s/features/LocalDirsFeatureStep.scala | 2 +- resource-managers/mesos/pom.xml| 128 --- .../mesos/MesosExternalBlockStoreClient.java | 124 --- ...g.apache.spark.scheduler.ExternalClusterManager | 18 - .../deploy/mesos/MesosClusterDispatcher.scala | 136 .../mesos/MesosClusterDispatcherArguments.scala| 149 .../deploy/mesos/MesosDriverDescription.scala | 70 -- .../deploy/mesos/MesosExternalShuffleService.scala
[spark] branch master updated: [SPARK-44539][BUILD] Upgrade RoaringBitmap to 1.0.0
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 8399dd321af [SPARK-44539][BUILD] Upgrade RoaringBitmap to 1.0.0 8399dd321af is described below commit 8399dd321afce0cb0051501de55da296595fdf53 Author: panbingkun AuthorDate: Wed Sep 27 11:53:53 2023 -0500 [SPARK-44539][BUILD] Upgrade RoaringBitmap to 1.0.0 ### What changes were proposed in this pull request? - The pr aims to upgrade RoaringBitmap from 0.9.45 to 1.0.0. - From version 1.0.0, the `ArraysShim` class has been moved from `shims-x.x.x.jar` jar to `RoaringBitmap-x.x.x.jar` jar, so we no longer need to rely on it. ### Why are the changes needed? - The newest brings some improvments, eg: Add zero-garbage deserialiser for ByteBuffer to RoaringBitmap by shikharid in https://github.com/RoaringBitmap/RoaringBitmap/pull/650 More specialized method for value decrementation by xtonik in https://github.com/RoaringBitmap/RoaringBitmap/pull/640 Duplicated small array sort routine by xtonik in https://github.com/RoaringBitmap/RoaringBitmap/pull/638 Avoid intermediate byte array creation by xtonik in https://github.com/RoaringBitmap/RoaringBitmap/pull/635 Useless back and forth BD bytes conversion by xtonik in https://github.com/RoaringBitmap/RoaringBitmap/pull/636 - The full release notes: https://github.com/RoaringBitmap/RoaringBitmap/releases/tag/1.0.0 https://github.com/RoaringBitmap/RoaringBitmap/releases/tag/0.9.49 https://github.com/RoaringBitmap/RoaringBitmap/releases/tag/0.9.48 https://github.com/RoaringBitmap/RoaringBitmap/releases/tag/0.9.47 https://github.com/RoaringBitmap/RoaringBitmap/releases/tag/0.9.46 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. Closes #42143 from panbingkun/SPARK-44539. Authored-by: panbingkun Signed-off-by: Sean Owen --- core/benchmarks/MapStatusesConvertBenchmark-jdk21-results.txt | 6 +++--- core/benchmarks/MapStatusesConvertBenchmark-results.txt | 8 dev/deps/spark-deps-hadoop-3-hive-2.3 | 3 +-- pom.xml | 2 +- 4 files changed, 9 insertions(+), 10 deletions(-) diff --git a/core/benchmarks/MapStatusesConvertBenchmark-jdk21-results.txt b/core/benchmarks/MapStatusesConvertBenchmark-jdk21-results.txt index 48dbc8e0241..416aaf5b7aa 100644 --- a/core/benchmarks/MapStatusesConvertBenchmark-jdk21-results.txt +++ b/core/benchmarks/MapStatusesConvertBenchmark-jdk21-results.txt @@ -6,8 +6,8 @@ OpenJDK 64-Bit Server VM 21+35 on Linux 5.15.0-1046-azure Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz MapStatuses Convert: Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative -Num Maps: 5 Fetch partitions:500813900 129 0.0 812807240.0 1.0X -Num Maps: 5 Fetch partitions:1000 2226 2238 17 0.0 2226321250.0 0.4X -Num Maps: 5 Fetch partitions:1500 3149 3300 133 0.0 3148506179.0 0.3X +Num Maps: 5 Fetch partitions:500899949 74 0.0 898941184.0 1.0X +Num Maps: 5 Fetch partitions:1000 1947 2043 115 0.0 1947362412.0 0.5X +Num Maps: 5 Fetch partitions:1500 3079 3122 75 0.0 3078809212.0 0.3X diff --git a/core/benchmarks/MapStatusesConvertBenchmark-results.txt b/core/benchmarks/MapStatusesConvertBenchmark-results.txt index 5ed55c839eb..bd87f4876e4 100644 --- a/core/benchmarks/MapStatusesConvertBenchmark-results.txt +++ b/core/benchmarks/MapStatusesConvertBenchmark-results.txt @@ -3,11 +3,11 @@ MapStatuses Convert Benchmark OpenJDK 64-Bit Server VM 17.0.8+7-LTS on Linux 5.15.0-1046-azure -Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz +Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz MapStatuses Convert: Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative -Num Maps: 5 Fetch partitions:500 1127 1138 13 0.0 1127479807.0 1.0X -Num Maps: 5 Fetch partitions:1000 2146 2183 49 0.0 2146214882.0 0.5X -Num Maps
[spark] branch master updated: [SPARK-45343][DOCS] Clarify behavior of multiLine in CSV options
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new ab92cae78e3 [SPARK-45343][DOCS] Clarify behavior of multiLine in CSV options ab92cae78e3 is described below commit ab92cae78e3cdf58ba96b0b98e7958287c2d5cd1 Author: Bill Schneider AuthorDate: Wed Sep 27 08:25:02 2023 -0500 [SPARK-45343][DOCS] Clarify behavior of multiLine in CSV options ### What changes were proposed in this pull request? this is a documentation-only change to clarify CSV `multiLine` option: https://issues.apache.org/jira/browse/SPARK-45343 ### Why are the changes needed? documentation clarity ### Does this PR introduce _any_ user-facing change? Documentation only ### How was this patch tested? N/A, documentation only ### Was this patch authored or co-authored using generative AI tooling? Documentation only Closes #43132 from wrschneider/SPARK-45343-csv-multiline-doc-clarification. Lead-authored-by: Bill Schneider Co-authored-by: Bill Schneider Signed-off-by: Sean Owen --- docs/sql-data-sources-csv.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/sql-data-sources-csv.md b/docs/sql-data-sources-csv.md index 31167f55143..721563d1681 100644 --- a/docs/sql-data-sources-csv.md +++ b/docs/sql-data-sources-csv.md @@ -213,7 +213,7 @@ Data source options of CSV can be set via: multiLine false -Parse one record, which may span multiple lines, per file. CSV built-in functions ignore this option. +Allows a row to span multiple lines, by parsing line breaks within quoted values as part of the value itself. CSV built-in functions ignore this option. read - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (9f817379c68 -> b7763a7eae2)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 9f817379c68 [SPARK-45341][CORE] Correct the title level in the comments of KVStore.java to make `sbt doc` run successfully with Java 17 add b7763a7eae2 [SPARK-45338][CORE][SQL] Replace `scala.collection.JavaConverters` to `scala.jdk.CollectionConverters` No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/ErrorClassesJSONReader.scala | 2 +- .../src/main/scala/org/apache/spark/SparkException.scala | 2 +- .../scala/org/apache/spark/SparkThrowableHelper.scala| 2 +- .../main/scala/org/apache/spark/internal/Logging.scala | 2 +- .../org/apache/spark/sql/avro/AvroDeserializer.scala | 2 +- .../org/apache/spark/sql/avro/AvroOutputWriter.scala | 2 +- .../scala/org/apache/spark/sql/avro/AvroSerializer.scala | 2 +- .../main/scala/org/apache/spark/sql/avro/AvroUtils.scala | 2 +- .../org/apache/spark/sql/avro/SchemaConverters.scala | 2 +- .../main/scala/org/apache/spark/sql/avro/functions.scala | 2 +- .../scala/org/apache/spark/sql/v2/avro/AvroScan.scala| 2 +- .../scala/org/apache/spark/sql/v2/avro/AvroTable.scala | 2 +- .../org/apache/spark/sql/avro/AvroFunctionsSuite.scala | 2 +- .../test/scala/org/apache/spark/sql/avro/AvroSuite.scala | 2 +- .../jvm/src/main/scala/org/apache/spark/sql/Column.scala | 2 +- .../org/apache/spark/sql/DataFrameNaFunctions.scala | 2 +- .../scala/org/apache/spark/sql/DataFrameReader.scala | 2 +- .../org/apache/spark/sql/DataFrameStatFunctions.scala| 2 +- .../scala/org/apache/spark/sql/DataFrameWriter.scala | 2 +- .../scala/org/apache/spark/sql/DataFrameWriterV2.scala | 2 +- .../src/main/scala/org/apache/spark/sql/Dataset.scala| 2 +- .../org/apache/spark/sql/KeyValueGroupedDataset.scala| 2 +- .../org/apache/spark/sql/RelationalGroupedDataset.scala | 2 +- .../main/scala/org/apache/spark/sql/SparkSession.scala | 2 +- .../main/scala/org/apache/spark/sql/avro/functions.scala | 2 +- .../scala/org/apache/spark/sql/catalog/Catalog.scala | 2 +- .../spark/sql/expressions/UserDefinedFunction.scala | 2 +- .../org/apache/spark/sql/expressions/WindowSpec.scala| 2 +- .../src/main/scala/org/apache/spark/sql/functions.scala | 2 +- .../scala/org/apache/spark/sql/protobuf/functions.scala | 2 +- .../apache/spark/sql/streaming/DataStreamReader.scala| 2 +- .../apache/spark/sql/streaming/DataStreamWriter.scala| 2 +- .../org/apache/spark/sql/streaming/StreamingQuery.scala | 2 +- .../spark/sql/streaming/StreamingQueryManager.scala | 2 +- .../scala/org/apache/spark/sql/streaming/progress.scala | 2 +- .../scala/org/apache/spark/sql/ClientE2ETestSuite.scala | 2 +- .../scala/org/apache/spark/sql/ColumnTestSuite.scala | 2 +- .../org/apache/spark/sql/DataFrameNaFunctionSuite.scala | 2 +- .../scala/org/apache/spark/sql/FunctionTestSuite.scala | 2 +- .../org/apache/spark/sql/PlanGenerationTestSuite.scala | 2 +- .../spark/sql/UserDefinedFunctionE2ETestSuite.scala | 2 +- .../apache/spark/sql/connect/client/ArtifactSuite.scala | 2 +- .../sql/connect/client/SparkConnectClientSuite.scala | 2 +- .../spark/sql/streaming/ClientStreamingQuerySuite.scala | 2 +- .../spark/sql/connect/client/ArtifactManager.scala | 2 +- .../apache/spark/sql/connect/client/ClassFinder.scala| 2 +- .../connect/client/CustomSparkConnectBlockingStub.scala | 2 +- .../client/ExecutePlanResponseReattachableIterator.scala | 2 +- .../sql/connect/client/GrpcExceptionConverter.scala | 2 +- .../spark/sql/connect/client/SparkConnectClient.scala| 2 +- .../sql/connect/client/arrow/ArrowEncoderUtils.scala | 2 +- .../spark/sql/connect/client/arrow/ArrowSerializer.scala | 2 +- .../sql/connect/common/LiteralValueProtoConverter.scala | 2 +- .../org/apache/spark/sql/connect/common/ProtoUtils.scala | 2 +- .../org/apache/spark/sql/connect/common/UdfUtils.scala | 2 +- .../apache/spark/sql/connect/SparkConnectPlugin.scala| 2 +- .../connect/artifact/SparkConnectArtifactManager.scala | 2 +- .../scala/org/apache/spark/sql/connect/dsl/package.scala | 2 +- .../connect/execution/SparkConnectPlanExecution.scala| 2 +- .../spark/sql/connect/planner/SparkConnectPlanner.scala | 2 +- .../connect/planner/StreamingForeachBatchHelper.scala| 2 +- .../apache/spark/sql/connect/service/ExecuteHolder.scala | 2 +- .../apache/spark/sql/connect/service/SessionHolder.scala | 2 +- .../sql/connect/service/SparkConnectAnalyzeHandler.scala | 2 +- .../service/SparkConnectArtifactStatusesHandler.scala| 2 +- .../sql/connect/service/SparkConnectConfigHandler.scala | 2 +- .../connect/service/SparkConnectExecutionManager.scala | 2 +- .../connect/service/SparkConnectInterruptHandler.scala | 2
[spark] branch master updated: [SPARK-45341][CORE] Correct the title level in the comments of KVStore.java to make `sbt doc` run successfully with Java 17
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 9f817379c68 [SPARK-45341][CORE] Correct the title level in the comments of KVStore.java to make `sbt doc` run successfully with Java 17 9f817379c68 is described below commit 9f817379c68e551680e60900f1d61b70e1b62960 Author: yangjie01 AuthorDate: Wed Sep 27 08:21:05 2023 -0500 [SPARK-45341][CORE] Correct the title level in the comments of KVStore.java to make `sbt doc` run successfully with Java 17 ### What changes were proposed in this pull request? This pr aims to correct the title level in the comments of `KVStore.java` to make `sbt doc` run successfully with Java 17. ### Why are the changes needed? Make the `sbt doc` command execute successfully with Java 17 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - Manually check. run `build/sbt clean doc -Phadoop-3 -Phadoop-cloud -Pmesos -Pyarn -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive -Pvolcano` **Before** ``` [error] /Users/yangjie01/SourceCode/git/spark-mine-sbt/Picked up JAVA_TOOL_OPTIONS:-Duser.language=en [error] Loading source file /Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBTypeInfo.java... [error] Loading source file /Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/src/main/java/org/apache/spark/util/kvstore/ArrayWrappers.java... [error] Loading source file /Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/src/main/java/org/apache/spark/util/kvstore/KVIndex.java... [error] Loading source file /Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/src/main/java/org/apache/spark/util/kvstore/InMemoryStore.java... [error] Loading source file /Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBIterator.java... [error] Loading source file /Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/src/main/java/org/apache/spark/util/kvstore/RocksDB.java... [error] Loading source file /Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/src/main/java/org/apache/spark/util/kvstore/RocksDBTypeInfo.java... [error] Loading source file /Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/src/main/java/org/apache/spark/util/kvstore/UnsupportedStoreVersionException.java... [error] Loading source file /Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDB.java... [error] Loading source file /Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/src/main/java/org/apache/spark/util/kvstore/KVStoreIterator.java... [error] Loading source file /Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/src/main/java/org/apache/spark/util/kvstore/KVStore.java... [error] Loading source file /Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/src/main/java/org/apache/spark/util/kvstore/KVStoreView.java... [error] Loading source file /Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/src/main/java/org/apache/spark/util/kvstore/KVTypeInfo.java... [error] Loading source file /Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/src/main/java/org/apache/spark/util/kvstore/RocksDBIterator.java... [error] Loading source file /Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/src/main/java/org/apache/spark/util/kvstore/KVStoreSerializer.java... [error] Constructing Javadoc information... [error] Building index for all the packages and classes... [error] Standard Doclet version 17.0.8+7-LTS [error] Building tree for all the packages and classes... [error] /Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/src/main/java/org/apache/spark/util/kvstore/KVStore.java:32:1: error: heading used out of sequence: , compared to implicit preceding heading: [error] * Serialization [error]^Generating /Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/target/scala-2.13/api/org/apache/spark/util/kvstore/InMemoryStore.html... [error] Generating /Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/target/scala-2.13/api/org/apache/spark/util/kvstore/KVIndex.html... [error] Generating /Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/target/scala-2.13/api/org/apache/spark/util/kvstore/KVStore.html... [error] Generating /Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/target/scala-2.13/api/org/apache/spark/util/kvstore/KVStoreIterator.html... [error] Generating /Users/yangjie01/SourceCode/git/spark-mine-sbt/common
[spark] branch master updated: [SPARK-45334][SQL] Remove misleading comment in parquetSchemaConverter
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 7e8aafd2c0f [SPARK-45334][SQL] Remove misleading comment in parquetSchemaConverter 7e8aafd2c0f is described below commit 7e8aafd2c0f1f6fcd03a69afe2b85fd3fda95d20 Author: lanmengran1 AuthorDate: Tue Sep 26 21:01:02 2023 -0500 [SPARK-45334][SQL] Remove misleading comment in parquetSchemaConverter ### What changes were proposed in this pull request? Remove one line of comment, the detail info is described in JIRA https://issues.apache.org/jira/browse/SPARK-45334 ### Why are the changes needed? The comment is outdated and misleading. - the parquet-hive module has been removed from the parquet-mr project https://issues.apache.org/jira/browse/PARQUET-1676 - Hive always uses "array_element" as the name ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? No need ### Was this patch authored or co-authored using generative AI tooling? No Closes #43119 from amoylan2/remove_misleading_comment_in_parquetSchemaConverter. Authored-by: lanmengran1 Signed-off-by: Sean Owen --- .../spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala | 1 - 1 file changed, 1 deletion(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala index 9c9e7ce729c..eedd165278a 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala @@ -646,7 +646,6 @@ class SparkToParquetSchemaConverter( .buildGroup(repetition).as(LogicalTypeAnnotation.listType()) .addField(Types .buildGroup(REPEATED) -// "array" is the name chosen by parquet-hive (1.7.0 and prior version) .addField(convertField(StructField("array", elementType, nullable))) .named("bag")) .named(field.name) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-44366][BUILD] Upgrade antlr4 to 4.13.1
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 13cd291c354 [SPARK-44366][BUILD] Upgrade antlr4 to 4.13.1 13cd291c354 is described below commit 13cd291c3549467dfd5d10a665e2d6a577f35bcb Author: yangjie01 AuthorDate: Tue Sep 26 11:14:21 2023 -0500 [SPARK-44366][BUILD] Upgrade antlr4 to 4.13.1 ### What changes were proposed in this pull request? This pr is aims upgrade `antlr4` from 4.9.3 to 4.13.1 ### Why are the changes needed? After 4.10, antlr4 is using Java 11 for the source code and the compiled .class files for the ANTLR tool. There are some bug fix and Improvements after 4.9.3: - https://github.com/antlr/antlr4/pull/3399 - https://github.com/antlr/antlr4/issues/1105 - https://github.com/antlr/antlr4/issues/2788 - https://github.com/antlr/antlr4/pull/3957 - https://github.com/antlr/antlr4/pull/4394 The full release notes as follows: - https://github.com/antlr/antlr4/releases/tag/4.13.1 - https://github.com/antlr/antlr4/releases/tag/4.13.0 - https://github.com/antlr/antlr4/releases/tag/4.12.0 - https://github.com/antlr/antlr4/releases/tag/4.11.1 - https://github.com/antlr/antlr4/releases/tag/4.11.0 - https://github.com/antlr/antlr4/releases/tag/4.10.1 - https://github.com/antlr/antlr4/releases/tag/4.10 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass GitHub Actions ### Was this patch authored or co-authored using generative AI tooling? No Closes #43075 from LuciferYang/antlr4-4131. Authored-by: yangjie01 Signed-off-by: Sean Owen --- dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +- pom.xml | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 b/dev/deps/spark-deps-hadoop-3-hive-2.3 index 206361e1efa..5c17d727b0a 100644 --- a/dev/deps/spark-deps-hadoop-3-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3-hive-2.3 @@ -12,7 +12,7 @@ aliyun-java-sdk-ram/3.1.0//aliyun-java-sdk-ram-3.1.0.jar aliyun-sdk-oss/3.13.0//aliyun-sdk-oss-3.13.0.jar annotations/17.0.0//annotations-17.0.0.jar antlr-runtime/3.5.2//antlr-runtime-3.5.2.jar -antlr4-runtime/4.9.3//antlr4-runtime-4.9.3.jar +antlr4-runtime/4.13.1//antlr4-runtime-4.13.1.jar aopalliance-repackaged/2.6.1//aopalliance-repackaged-2.6.1.jar arpack/3.0.3//arpack-3.0.3.jar arpack_combined_all/0.1//arpack_combined_all-0.1.jar diff --git a/pom.xml b/pom.xml index 5fd3e173857..1d0ab387900 100644 --- a/pom.xml +++ b/pom.xml @@ -212,7 +212,7 @@ 3.0.0 0.12.0 -4.9.3 +4.13.1 1.1 4.12.1 4.12.0 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-45248][CORE] Set the timeout for spark ui server
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 273a375cd314 [SPARK-45248][CORE] Set the timeout for spark ui server 273a375cd314 is described below commit 273a375cd314fbf52b5f2538526374f6b24fb2cf Author: chenyu <119398199+chenyu-opensou...@users.noreply.github.com> AuthorDate: Mon Sep 25 22:38:27 2023 -0500 [SPARK-45248][CORE] Set the timeout for spark ui server **What changes were proposed in this pull request?** The PR supports to set the timeout for spark ui server. **Why are the changes needed?** It can avoid slow HTTP Denial of Service Attack because the jetty server's timeout is 30 for deafult. **Does this PR introduce any user-facing change?** No **How was this patch tested?** Manual review **Was this patch authored or co-authored using generative AI tooling?** No Closes #43078 from chenyu-opensource/branch-SPARK-45248-new. Authored-by: chenyu <119398199+chenyu-opensou...@users.noreply.github.com> Signed-off-by: Sean Owen --- core/src/main/scala/org/apache/spark/ui/JettyUtils.scala | 2 ++ 1 file changed, 2 insertions(+) diff --git a/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala b/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala index 9582bdbf5264..22adcbc32ed8 100644 --- a/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala +++ b/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala @@ -296,6 +296,8 @@ private[spark] object JettyUtils extends Logging { connector.setPort(port) connector.setHost(hostName) connector.setReuseAddress(!Utils.isWindows) + // spark-45248: set the idle timeout to prevent slow DoS +connector.setIdleTimeout(8000) // Currently we only use "SelectChannelConnector" // Limit the max acceptor number to 8 so that we don't waste a lot of threads - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.3 updated: [SPARK-45286][DOCS] Add back Matomo analytics
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch branch-3.3 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.3 by this push: new 9a28200f6e4 [SPARK-45286][DOCS] Add back Matomo analytics 9a28200f6e4 is described below commit 9a28200f6e461c4929dd6e05b6dd55fe984c0924 Author: Sean Owen AuthorDate: Sun Sep 24 14:17:55 2023 -0500 [SPARK-45286][DOCS] Add back Matomo analytics ### What changes were proposed in this pull request? Add analytics to doc pages using the ASF's Matomo service ### Why are the changes needed? We had previously removed Google Analytics from the website and release docs, per ASF policy: https://github.com/apache/spark/pull/36310 We just restored analytics using the ASF-hosted Matomo service on the website: https://github.com/apache/spark-website/commit/a1548627b48a62c2e51870d1488ca3e09397bd30 This change would put the same new tracking code back into the release docs. It would let us see what docs and resources are most used, I suppose. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? N/A ### Was this patch authored or co-authored using generative AI tooling? No Closes #43063 from srowen/SPARK-45286. Authored-by: Sean Owen Signed-off-by: Sean Owen (cherry picked from commit a881438114ea3e8e918d981ef89ed1ab956d6fca) Signed-off-by: Sean Owen --- docs/_layouts/global.html | 19 +++ 1 file changed, 19 insertions(+) diff --git a/docs/_layouts/global.html b/docs/_layouts/global.html index d4463922766..2d139f5e0fb 100755 --- a/docs/_layouts/global.html +++ b/docs/_layouts/global.html @@ -33,6 +33,25 @@ https://cdn.jsdelivr.net/npm/docsearch.js@2/dist/cdn/docsearch.min.css"; /> +{% production %} + + +var _paq = window._paq = window._paq || []; +/* tracker methods like "setCustomDimension" should be called before "trackPageView" */ +_paq.push(["disableCookies"]); +_paq.push(['trackPageView']); +_paq.push(['enableLinkTracking']); +(function() { + var u="<a rel="nofollow" href="https://analytics.apache.org/"">https://analytics.apache.org/"</a>;; + _paq.push(['setTrackerUrl', u+'matomo.php']); + _paq.push(['setSiteId', '40']); + var d=document, g=d.createElement('script'), s=d.getElementsByTagName('script')[0]; + g.async=true; g.src=u+'matomo.js'; s.parentNode.insertBefore(g,s); +})(); + + +{% endproduction %} +
[spark] branch branch-3.4 updated: [SPARK-45286][DOCS] Add back Matomo analytics
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.4 by this push: new 20924aa581a [SPARK-45286][DOCS] Add back Matomo analytics 20924aa581a is described below commit 20924aa581a2c5c49ec700689f1888dd7db79e6b Author: Sean Owen AuthorDate: Sun Sep 24 14:17:55 2023 -0500 [SPARK-45286][DOCS] Add back Matomo analytics ### What changes were proposed in this pull request? Add analytics to doc pages using the ASF's Matomo service ### Why are the changes needed? We had previously removed Google Analytics from the website and release docs, per ASF policy: https://github.com/apache/spark/pull/36310 We just restored analytics using the ASF-hosted Matomo service on the website: https://github.com/apache/spark-website/commit/a1548627b48a62c2e51870d1488ca3e09397bd30 This change would put the same new tracking code back into the release docs. It would let us see what docs and resources are most used, I suppose. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? N/A ### Was this patch authored or co-authored using generative AI tooling? No Closes #43063 from srowen/SPARK-45286. Authored-by: Sean Owen Signed-off-by: Sean Owen (cherry picked from commit a881438114ea3e8e918d981ef89ed1ab956d6fca) Signed-off-by: Sean Owen --- docs/_layouts/global.html | 19 +++ 1 file changed, 19 insertions(+) diff --git a/docs/_layouts/global.html b/docs/_layouts/global.html index d4463922766..2d139f5e0fb 100755 --- a/docs/_layouts/global.html +++ b/docs/_layouts/global.html @@ -33,6 +33,25 @@ https://cdn.jsdelivr.net/npm/docsearch.js@2/dist/cdn/docsearch.min.css"; /> +{% production %} + + +var _paq = window._paq = window._paq || []; +/* tracker methods like "setCustomDimension" should be called before "trackPageView" */ +_paq.push(["disableCookies"]); +_paq.push(['trackPageView']); +_paq.push(['enableLinkTracking']); +(function() { + var u="<a rel="nofollow" href="https://analytics.apache.org/"">https://analytics.apache.org/"</a>;; + _paq.push(['setTrackerUrl', u+'matomo.php']); + _paq.push(['setSiteId', '40']); + var d=document, g=d.createElement('script'), s=d.getElementsByTagName('script')[0]; + g.async=true; g.src=u+'matomo.js'; s.parentNode.insertBefore(g,s); +})(); + + +{% endproduction %} +
[spark] branch branch-3.5 updated: [SPARK-45286][DOCS] Add back Matomo analytics
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new 609306ff5da [SPARK-45286][DOCS] Add back Matomo analytics 609306ff5da is described below commit 609306ff5daa8ff7c2212088d33c0911ad0f4989 Author: Sean Owen AuthorDate: Sun Sep 24 14:17:55 2023 -0500 [SPARK-45286][DOCS] Add back Matomo analytics ### What changes were proposed in this pull request? Add analytics to doc pages using the ASF's Matomo service ### Why are the changes needed? We had previously removed Google Analytics from the website and release docs, per ASF policy: https://github.com/apache/spark/pull/36310 We just restored analytics using the ASF-hosted Matomo service on the website: https://github.com/apache/spark-website/commit/a1548627b48a62c2e51870d1488ca3e09397bd30 This change would put the same new tracking code back into the release docs. It would let us see what docs and resources are most used, I suppose. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? N/A ### Was this patch authored or co-authored using generative AI tooling? No Closes #43063 from srowen/SPARK-45286. Authored-by: Sean Owen Signed-off-by: Sean Owen (cherry picked from commit a881438114ea3e8e918d981ef89ed1ab956d6fca) Signed-off-by: Sean Owen --- docs/_layouts/global.html | 19 +++ 1 file changed, 19 insertions(+) diff --git a/docs/_layouts/global.html b/docs/_layouts/global.html index 9b7c4692461..8c4435fdf31 100755 --- a/docs/_layouts/global.html +++ b/docs/_layouts/global.html @@ -32,6 +32,25 @@ https://cdn.jsdelivr.net/npm/docsearch.js@2/dist/cdn/docsearch.min.css"; /> +{% production %} + + +var _paq = window._paq = window._paq || []; +/* tracker methods like "setCustomDimension" should be called before "trackPageView" */ +_paq.push(["disableCookies"]); +_paq.push(['trackPageView']); +_paq.push(['enableLinkTracking']); +(function() { + var u="<a rel="nofollow" href="https://analytics.apache.org/"">https://analytics.apache.org/"</a>;; + _paq.push(['setTrackerUrl', u+'matomo.php']); + _paq.push(['setSiteId', '40']); + var d=document, g=d.createElement('script'), s=d.getElementsByTagName('script')[0]; + g.async=true; g.src=u+'matomo.js'; s.parentNode.insertBefore(g,s); +})(); + + +{% endproduction %} +
[spark] branch master updated: [SPARK-45286][DOCS] Add back Matomo analytics
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new a881438114e [SPARK-45286][DOCS] Add back Matomo analytics a881438114e is described below commit a881438114ea3e8e918d981ef89ed1ab956d6fca Author: Sean Owen AuthorDate: Sun Sep 24 14:17:55 2023 -0500 [SPARK-45286][DOCS] Add back Matomo analytics ### What changes were proposed in this pull request? Add analytics to doc pages using the ASF's Matomo service ### Why are the changes needed? We had previously removed Google Analytics from the website and release docs, per ASF policy: https://github.com/apache/spark/pull/36310 We just restored analytics using the ASF-hosted Matomo service on the website: https://github.com/apache/spark-website/commit/a1548627b48a62c2e51870d1488ca3e09397bd30 This change would put the same new tracking code back into the release docs. It would let us see what docs and resources are most used, I suppose. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? N/A ### Was this patch authored or co-authored using generative AI tooling? No Closes #43063 from srowen/SPARK-45286. Authored-by: Sean Owen Signed-off-by: Sean Owen --- docs/_layouts/global.html | 19 +++ 1 file changed, 19 insertions(+) diff --git a/docs/_layouts/global.html b/docs/_layouts/global.html index e857efad6f0..c2f05cfd6bb 100755 --- a/docs/_layouts/global.html +++ b/docs/_layouts/global.html @@ -32,6 +32,25 @@ https://cdn.jsdelivr.net/npm/docsearch.js@2/dist/cdn/docsearch.min.css"; /> +{% production %} + + +var _paq = window._paq = window._paq || []; +/* tracker methods like "setCustomDimension" should be called before "trackPageView" */ +_paq.push(["disableCookies"]); +_paq.push(['trackPageView']); +_paq.push(['enableLinkTracking']); +(function() { + var u="<a rel="nofollow" href="https://analytics.apache.org/"">https://analytics.apache.org/"</a>;; + _paq.push(['setTrackerUrl', u+'matomo.php']); + _paq.push(['setSiteId', '40']); + var d=document, g=d.createElement('script'), s=d.getElementsByTagName('script')[0]; + g.async=true; g.src=u+'matomo.js'; s.parentNode.insertBefore(g,s); +})(); + + +{% endproduction %} +
[spark] branch master updated: [SPARK-45148][BUILD] Upgrade scalatest related dependencies to the 3.2.17 series
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new c1c58698d3d [SPARK-45148][BUILD] Upgrade scalatest related dependencies to the 3.2.17 series c1c58698d3d is described below commit c1c58698d3d6b1447045fad592f8dfb0395989d1 Author: yangjie01 AuthorDate: Mon Sep 18 10:01:47 2023 -0500 [SPARK-45148][BUILD] Upgrade scalatest related dependencies to the 3.2.17 series ### What changes were proposed in this pull request? This pr aims upgrade `scalatest` related test dependencies to 3.2.17: - scalatest: upgrade scalatest to 3.2.17 - scalatestplus - scalacheck: upgrade to `scalacheck-1-17` 3.2.17.0 - mockito: upgrade to `mockito-4-11` to 3.2.17.0 - selenium: uprade to `selenium-4-12` to 3.2.17.0 and `selenium-java` to 4.12.1, `htmlunit-driver` to 4.12.0, byte-buddy and byte-buddy-agent to 1.14.5 ### Why are the changes needed? The release notes as follows: - scalatest:https://github.com/scalatest/scalatest/releases/tag/release-3.2.17 - scalatestplus - scalacheck-1-17: https://github.com/scalatest/scalatestplus-scalacheck/releases/tag/release-3.2.17.0-for-scalacheck-1.17 - mockito-4-11: https://github.com/scalatest/scalatestplus-mockito/releases/tag/release-3.2.17.0-for-mockito-4.11 - selenium-4-12: https://github.com/scalatest/scalatestplus-selenium/releases/tag/release-3.2.17.0-for-selenium-4.12 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - Pass GitHub Actions - Manual test: - ChromeUISeleniumSuite - RocksDBBackendChromeUIHistoryServerSuite ``` build/sbt -Dguava.version=32.1.2-jre -Dspark.test.webdriver.chrome.driver=/Users/yangjie01/Tools/chromedriver -Dtest.default.exclude.tags="" -Phive -Phive-thriftserver "core/testOnly org.apache.spark.ui.ChromeUISeleniumSuite" build/sbt -Dguava.version=32.1.2-jre -Dspark.test.webdriver.chrome.driver=/Users/yangjie01/Tools/chromedriver -Dtest.default.exclude.tags="" -Phive -Phive-thriftserver "core/testOnly org.apache.spark.deploy.history.RocksDBBackendChromeUIHistoryServerSuite" ``` ``` [info] ChromeUISeleniumSuite: [info] - SPARK-31534: text for tooltip should be escaped (1 second, 809 milliseconds) [info] - SPARK-31882: Link URL for Stage DAGs should not depend on paged table. (604 milliseconds) [info] - SPARK-31886: Color barrier execution mode RDD correctly (252 milliseconds) [info] - Search text for paged tables should not be saved (1 second, 309 milliseconds) [info] Run completed in 6 seconds, 116 milliseconds. [info] Total number of tests run: 4 [info] Suites: completed 1, aborted 0 [info] Tests: succeeded 4, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. ``` ``` [info] RocksDBBackendChromeUIHistoryServerSuite: [info] - ajax rendered relative links are prefixed with uiRoot (spark.ui.proxyBase) (1 second, 615 milliseconds) [info] Run completed in 5 seconds, 130 milliseconds. [info] Total number of tests run: 1 [info] Suites: completed 1, aborted 0 [info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. [success] Total time: 27 s, completed 2023-9-14 11:29:27 ``` ### Was this patch authored or co-authored using generative AI tooling? No Closes #42906 from LuciferYang/SPARK-45148. Lead-authored-by: yangjie01 Co-authored-by: YangJie Signed-off-by: Sean Owen --- pom.xml | 20 ++-- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/pom.xml b/pom.xml index 779f9e64f1d..971cb07ea40 100644 --- a/pom.xml +++ b/pom.xml @@ -214,8 +214,8 @@ 4.9.3 1.1 -4.9.1 -4.9.1 +4.12.1 +4.12.0 2.70.0 3.1.0 1.1.0 @@ -413,7 +413,7 @@ org.scalatestplus - selenium-4-9_${scala.binary.version} + selenium-4-12_${scala.binary.version} test @@ -1137,25 +1137,25 @@ org.scalatest scalatest_${scala.binary.version} -3.2.16 +3.2.17 test org.scalatestplus scalacheck-1-17_${scala.binary.version} -3.2.16.0 +3.2.17.0 test org.scalatestplus mockito-4-11_${scala.binary.version} -3.2.16.0 +3.2.17.0 test org.scalatestplus -selenium-4-9_${scala.binary.version} -3.2.16.0 +selenium-4-12_${scala.binary.version} +3.2.17.0 test @@ -1173,13 +1173,13 @@ net.byteb
[spark-website] branch asf-site updated: [SPARK-45195] Update examples with docker official image
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/spark-website.git The following commit(s) were added to refs/heads/asf-site by this push: new 6b10f7fd85 [SPARK-45195] Update examples with docker official image 6b10f7fd85 is described below commit 6b10f7fd85327f97cc12bede9ce5c60a744d9063 Author: Ruifeng Zheng AuthorDate: Mon Sep 18 07:31:46 2023 -0500 [SPARK-45195] Update examples with docker official image 1, add `docker run` commands for PySpark and SparkR; 2, switch to docker official image for SQL, Scala and Java; refer to https://hub.docker.com/_/spark also manually checked all the commands, e,g,: ``` ruifeng.zhengx:~$ docker run -it --rm spark:python3 /opt/spark/bin/pyspark Python 3.8.10 (default, May 26 2023, 14:05:08) [GCC 9.4.0] on linux Type "help", "copyright", "credits" or "license" for more information. Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 23/09/18 06:02:30 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Welcome to __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /__ / .__/\_,_/_/ /_/\_\ version 3.5.0 /_/ Using Python version 3.8.10 (default, May 26 2023 14:05:08) Spark context Web UI available at http://4861f70118ab:4040 Spark context available as 'sc' (master = local[*], app id = local-1695016951087). SparkSession available as 'spark'. >>> spark.range(0, 10).show() +---+ | id| +---+ | 0| | 1| | 2| | 3| | 4| | 5| | 6| | 7| | 8| | 9| +---+ ``` Author: Ruifeng Zheng Closes #477 from zhengruifeng/offical_image. --- index.md| 12 +++- site/index.html | 12 +++- 2 files changed, 14 insertions(+), 10 deletions(-) diff --git a/index.md b/index.md index a41e4c9e81..ada6242742 100644 --- a/index.md +++ b/index.md @@ -88,11 +88,13 @@ navigation: Run now -Installing with 'pip' +Install with 'pip' or try offical image $ pip install pyspark $ pyspark +$ +$ docker run -it --rm spark:python3 /opt/spark/bin/pyspark Run now -$ docker run -it --rm apache/spark /opt/spark/bin/spark-sql +$ docker run -it --rm spark /opt/spark/bin/spark-sql spark-sql> @@ -175,7 +177,7 @@ FROM json.`logs.json` Run now -$ docker run -it --rm apache/spark /opt/spark/bin/spark-shell +$ docker run -it --rm spark /opt/spark/bin/spark-shell scala> @@ -193,7 +195,7 @@ df.where("age > 21") Run now -$ docker run -it --rm apache/spark /opt/spark/bin/spark-shell +$ docker run -it --rm spark /opt/spark/bin/spark-shell scala> @@ -210,7 +212,7 @@ df.where("age > 21") Run now -$ SPARK-HOME/bin/sparkR +$ docker run -it --rm spark:r /opt/spark/bin/sparkR > diff --git a/site/index.html b/site/index.html index e1b0b7e416..3ccc7104ce 100644 --- a/site/index.html +++ b/site/index.html @@ -213,11 +213,13 @@ Run now -Installing with 'pip' +Install with 'pip' or try offical image $ pip install pyspark $ pyspark +$ +$ docker run -it --rm spark:python3 /opt/spark/bin/pyspark @@ -273,7 +275,7 @@ Run now -$ docker run -it --rm apache/spark /opt/spark/bin/spark-sql +$ docker run -it --rm spark /opt/spark/bin/spark-sql spark-sql> @@ -293,7 +295,7 @@ Run now
[spark] branch branch-3.3 updated: [SPARK-45127][DOCS] Exclude README.md from document build
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch branch-3.3 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.3 by this push: new 6dcab1fe0f6 [SPARK-45127][DOCS] Exclude README.md from document build 6dcab1fe0f6 is described below commit 6dcab1fe0f64458d76060e38fad974d6b84c4ff7 Author: panbingkun AuthorDate: Sat Sep 16 09:04:38 2023 -0500 [SPARK-45127][DOCS] Exclude README.md from document build ### What changes were proposed in this pull request? The pr aims to exclude `README.md` from document build. ### Why are the changes needed? - Currently, our document `README.html` does not have any CSS style applied to it, as shown below: https://spark.apache.org/docs/latest/README.html https://github.com/apache/spark/assets/15246973/1dfe5f69-30d9-4ce4-8d82-1bba5e721ccd";> **If we do not intend to display the above page to users, we should remove it during the document build process.** - As we saw in the project `spark-website`, it has already set the following configuration: https://github.com/apache/spark-website/blob/642d1fb834817014e1799e73882d53650c1c1662/_config.yml#L7 https://github.com/apache/spark/assets/15246973/421b7be5-4ece-407e-9d49-8e7487b74a47";> Let's stay consistent. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - Manually test. After this pr, the README.html file will no longer be generated ``` (base) panbingkun:~/Developer/spark/spark-community/docs/_site$ls -al README.html ls: README.html: No such file or directory ``` - Pass GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #42883 from panbingkun/SPARK-45127. Authored-by: panbingkun Signed-off-by: Sean Owen (cherry picked from commit 804f741453fb146b5261084fa3baf26631badb79) Signed-off-by: Sean Owen --- docs/_config.yml | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/_config.yml b/docs/_config.yml index 82da6b4ddff..c0f752a6155 100644 --- a/docs/_config.yml +++ b/docs/_config.yml @@ -46,3 +46,5 @@ DOCSEARCH_SCRIPT: | }); permalink: 404.html + +exclude: ['README.md'] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.4 updated: [SPARK-45127][DOCS] Exclude README.md from document build
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.4 by this push: new 12020943ec9 [SPARK-45127][DOCS] Exclude README.md from document build 12020943ec9 is described below commit 12020943ec95ce3e5dc3aeb2e3ae201cf25e0233 Author: panbingkun AuthorDate: Sat Sep 16 09:04:38 2023 -0500 [SPARK-45127][DOCS] Exclude README.md from document build ### What changes were proposed in this pull request? The pr aims to exclude `README.md` from document build. ### Why are the changes needed? - Currently, our document `README.html` does not have any CSS style applied to it, as shown below: https://spark.apache.org/docs/latest/README.html https://github.com/apache/spark/assets/15246973/1dfe5f69-30d9-4ce4-8d82-1bba5e721ccd";> **If we do not intend to display the above page to users, we should remove it during the document build process.** - As we saw in the project `spark-website`, it has already set the following configuration: https://github.com/apache/spark-website/blob/642d1fb834817014e1799e73882d53650c1c1662/_config.yml#L7 https://github.com/apache/spark/assets/15246973/421b7be5-4ece-407e-9d49-8e7487b74a47";> Let's stay consistent. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - Manually test. After this pr, the README.html file will no longer be generated ``` (base) panbingkun:~/Developer/spark/spark-community/docs/_site$ls -al README.html ls: README.html: No such file or directory ``` - Pass GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #42883 from panbingkun/SPARK-45127. Authored-by: panbingkun Signed-off-by: Sean Owen (cherry picked from commit 804f741453fb146b5261084fa3baf26631badb79) Signed-off-by: Sean Owen --- docs/_config.yml | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/_config.yml b/docs/_config.yml index c0c54b50e80..cb7ce91fa57 100644 --- a/docs/_config.yml +++ b/docs/_config.yml @@ -46,3 +46,5 @@ DOCSEARCH_SCRIPT: | }); permalink: 404.html + +exclude: ['README.md'] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.5 updated: [SPARK-45127][DOCS] Exclude README.md from document build
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new a3f50e74250 [SPARK-45127][DOCS] Exclude README.md from document build a3f50e74250 is described below commit a3f50e742506e07473c281255d1b13ab8ae78cd6 Author: panbingkun AuthorDate: Sat Sep 16 09:04:38 2023 -0500 [SPARK-45127][DOCS] Exclude README.md from document build ### What changes were proposed in this pull request? The pr aims to exclude `README.md` from document build. ### Why are the changes needed? - Currently, our document `README.html` does not have any CSS style applied to it, as shown below: https://spark.apache.org/docs/latest/README.html https://github.com/apache/spark/assets/15246973/1dfe5f69-30d9-4ce4-8d82-1bba5e721ccd";> **If we do not intend to display the above page to users, we should remove it during the document build process.** - As we saw in the project `spark-website`, it has already set the following configuration: https://github.com/apache/spark-website/blob/642d1fb834817014e1799e73882d53650c1c1662/_config.yml#L7 https://github.com/apache/spark/assets/15246973/421b7be5-4ece-407e-9d49-8e7487b74a47";> Let's stay consistent. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - Manually test. After this pr, the README.html file will no longer be generated ``` (base) panbingkun:~/Developer/spark/spark-community/docs/_site$ls -al README.html ls: README.html: No such file or directory ``` - Pass GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #42883 from panbingkun/SPARK-45127. Authored-by: panbingkun Signed-off-by: Sean Owen (cherry picked from commit 804f741453fb146b5261084fa3baf26631badb79) Signed-off-by: Sean Owen --- docs/_config.yml | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/_config.yml b/docs/_config.yml index afe015b2972..e346833722b 100644 --- a/docs/_config.yml +++ b/docs/_config.yml @@ -46,3 +46,5 @@ DOCSEARCH_SCRIPT: | }); permalink: 404.html + +exclude: ['README.md'] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-45127][DOCS] Exclude README.md from document build
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 804f741453f [SPARK-45127][DOCS] Exclude README.md from document build 804f741453f is described below commit 804f741453fb146b5261084fa3baf26631badb79 Author: panbingkun AuthorDate: Sat Sep 16 09:04:38 2023 -0500 [SPARK-45127][DOCS] Exclude README.md from document build ### What changes were proposed in this pull request? The pr aims to exclude `README.md` from document build. ### Why are the changes needed? - Currently, our document `README.html` does not have any CSS style applied to it, as shown below: https://spark.apache.org/docs/latest/README.html https://github.com/apache/spark/assets/15246973/1dfe5f69-30d9-4ce4-8d82-1bba5e721ccd";> **If we do not intend to display the above page to users, we should remove it during the document build process.** - As we saw in the project `spark-website`, it has already set the following configuration: https://github.com/apache/spark-website/blob/642d1fb834817014e1799e73882d53650c1c1662/_config.yml#L7 https://github.com/apache/spark/assets/15246973/421b7be5-4ece-407e-9d49-8e7487b74a47";> Let's stay consistent. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - Manually test. After this pr, the README.html file will no longer be generated ``` (base) panbingkun:~/Developer/spark/spark-community/docs/_site$ls -al README.html ls: README.html: No such file or directory ``` - Pass GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #42883 from panbingkun/SPARK-45127. Authored-by: panbingkun Signed-off-by: Sean Owen --- docs/_config.yml | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/_config.yml b/docs/_config.yml index 8c256af5bb3..fcc50d22e2e 100644 --- a/docs/_config.yml +++ b/docs/_config.yml @@ -46,3 +46,5 @@ DOCSEARCH_SCRIPT: | }); permalink: 404.html + +exclude: ['README.md'] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (ab46dc048ba -> 33979829db9)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from ab46dc048ba [SPARK-44872][CONNECT][FOLLOWUP] Deflake ReattachableExecuteSuite and increase retry buffer add 33979829db9 [SPARK-45146][DOCS] Update the default value of 'spark.executor.logs.rolling.strategy' No new revisions were added by this update. Summary of changes: docs/configuration.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.3 updated: [SPARK-45146][DOCS] Update the default value of 'spark.submit.deployMode'
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch branch-3.3 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.3 by this push: new 9ee184ad5cf [SPARK-45146][DOCS] Update the default value of 'spark.submit.deployMode' 9ee184ad5cf is described below commit 9ee184ad5cf1ea808143cffd6fa982ca8ef503fe Author: chenyu-opensource <119398199+chenyu-opensou...@users.noreply.github.com> AuthorDate: Wed Sep 13 08:48:14 2023 -0500 [SPARK-45146][DOCS] Update the default value of 'spark.submit.deployMode' **What changes were proposed in this pull request?** The PR updates the default value of 'spark.submit.deployMode' in configuration.html on the website **Why are the changes needed?** The default value of 'spark.submit.deployMode' is 'client', but the website is wrong. **Does this PR introduce any user-facing change?** No **How was this patch tested?** It doesn't need to. **Was this patch authored or co-authored using generative AI tooling?** No Closes #42902 from chenyu-opensource/branch-SPARK-45146. Authored-by: chenyu-opensource <119398199+chenyu-opensou...@users.noreply.github.com> Signed-off-by: Sean Owen (cherry picked from commit 076cb7aabac2f0ff11ca77ca530b7b8db5310a5e) Signed-off-by: Sean Owen --- docs/configuration.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/configuration.md b/docs/configuration.md index cb1f5212439..9e243635baf 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -394,7 +394,7 @@ of the most common options to set are: spark.submit.deployMode - (none) + client The deploy mode of Spark driver program, either "client" or "cluster", Which means to launch driver program locally ("client") - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.4 updated: [SPARK-45146][DOCS] Update the default value of 'spark.submit.deployMode'
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.4 by this push: new 7544bdb12d1 [SPARK-45146][DOCS] Update the default value of 'spark.submit.deployMode' 7544bdb12d1 is described below commit 7544bdb12d1d0449aaa7e7a5f8124a5cf662712f Author: chenyu-opensource <119398199+chenyu-opensou...@users.noreply.github.com> AuthorDate: Wed Sep 13 08:48:14 2023 -0500 [SPARK-45146][DOCS] Update the default value of 'spark.submit.deployMode' **What changes were proposed in this pull request?** The PR updates the default value of 'spark.submit.deployMode' in configuration.html on the website **Why are the changes needed?** The default value of 'spark.submit.deployMode' is 'client', but the website is wrong. **Does this PR introduce any user-facing change?** No **How was this patch tested?** It doesn't need to. **Was this patch authored or co-authored using generative AI tooling?** No Closes #42902 from chenyu-opensource/branch-SPARK-45146. Authored-by: chenyu-opensource <119398199+chenyu-opensou...@users.noreply.github.com> Signed-off-by: Sean Owen (cherry picked from commit 076cb7aabac2f0ff11ca77ca530b7b8db5310a5e) Signed-off-by: Sean Owen --- docs/configuration.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/configuration.md b/docs/configuration.md index f099cea7eb9..d61f726130b 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -394,7 +394,7 @@ of the most common options to set are: spark.submit.deployMode - (none) + client The deploy mode of Spark driver program, either "client" or "cluster", Which means to launch driver program locally ("client") - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-45146][DOCS] Update the default value of 'spark.submit.deployMode'
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 076cb7aabac [SPARK-45146][DOCS] Update the default value of 'spark.submit.deployMode' 076cb7aabac is described below commit 076cb7aabac2f0ff11ca77ca530b7b8db5310a5e Author: chenyu-opensource <119398199+chenyu-opensou...@users.noreply.github.com> AuthorDate: Wed Sep 13 08:48:14 2023 -0500 [SPARK-45146][DOCS] Update the default value of 'spark.submit.deployMode' **What changes were proposed in this pull request?** The PR updates the default value of 'spark.submit.deployMode' in configuration.html on the website **Why are the changes needed?** The default value of 'spark.submit.deployMode' is 'client', but the website is wrong. **Does this PR introduce any user-facing change?** No **How was this patch tested?** It doesn't need to. **Was this patch authored or co-authored using generative AI tooling?** No Closes #42902 from chenyu-opensource/branch-SPARK-45146. Authored-by: chenyu-opensource <119398199+chenyu-opensou...@users.noreply.github.com> Signed-off-by: Sean Owen --- docs/configuration.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/configuration.md b/docs/configuration.md index 6f7e12555e8..3ca9b704eba 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -394,7 +394,7 @@ of the most common options to set are: spark.submit.deployMode - (none) + client The deploy mode of Spark driver program, either "client" or "cluster", Which means to launch driver program locally ("client") - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.5 updated: [SPARK-45146][DOCS] Update the default value of 'spark.submit.deployMode'
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new e72ae794e69 [SPARK-45146][DOCS] Update the default value of 'spark.submit.deployMode' e72ae794e69 is described below commit e72ae794e69d8182291655d023aee903a913571b Author: chenyu-opensource <119398199+chenyu-opensou...@users.noreply.github.com> AuthorDate: Wed Sep 13 08:48:14 2023 -0500 [SPARK-45146][DOCS] Update the default value of 'spark.submit.deployMode' **What changes were proposed in this pull request?** The PR updates the default value of 'spark.submit.deployMode' in configuration.html on the website **Why are the changes needed?** The default value of 'spark.submit.deployMode' is 'client', but the website is wrong. **Does this PR introduce any user-facing change?** No **How was this patch tested?** It doesn't need to. **Was this patch authored or co-authored using generative AI tooling?** No Closes #42902 from chenyu-opensource/branch-SPARK-45146. Authored-by: chenyu-opensource <119398199+chenyu-opensou...@users.noreply.github.com> Signed-off-by: Sean Owen (cherry picked from commit 076cb7aabac2f0ff11ca77ca530b7b8db5310a5e) Signed-off-by: Sean Owen --- docs/configuration.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/configuration.md b/docs/configuration.md index dfded480c99..1139beb6646 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -394,7 +394,7 @@ of the most common options to set are: spark.submit.deployMode - (none) + client The deploy mode of Spark driver program, either "client" or "cluster", Which means to launch driver program locally ("client") - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-45111][BUILD] Upgrade maven to 3.9.4
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 169aa4bee95 [SPARK-45111][BUILD] Upgrade maven to 3.9.4 169aa4bee95 is described below commit 169aa4bee950e2249d853f00b4e5fca67edfaa80 Author: yangjie01 AuthorDate: Mon Sep 11 10:59:57 2023 -0500 [SPARK-45111][BUILD] Upgrade maven to 3.9.4 ### What changes were proposed in this pull request? This PR aims to upgrade Maven to 3.8.8 from 3.9.4. ### Why are the changes needed? The new version [lift JDK minimum to JDK 8](https://issues.apache.org/jira/browse/MNG-7452) and [make the build work on JDK 20](https://issues.apache.org/jira/browse/MNG-7743) . It also brings a series of bug fixes, such as [Fix deadlock during forked lifecycle executions](https://issues.apache.org/jira/browse/MNG-7487), along with a number of new optimizations like [Profile activation by packaging](https://issues.apache.org/jira/browse/MNG-6609). On the other hand, the new version re [...] For other updates, refer to the corresponding release notes: - https://maven.apache.org/docs/3.9.0/release-notes.html | https://github.com/apache/maven/releases/tag/maven-3.9.0 - https://maven.apache.org/docs/3.9.1/release-notes.html | https://github.com/apache/maven/releases/tag/maven-3.9.1 - https://maven.apache.org/docs/3.9.2/release-notes.html | https://github.com/apache/maven/releases/tag/maven-3.9.2 - https://maven.apache.org/docs/3.9.3/release-notes.html | https://github.com/apache/maven/releases/tag/maven-3.9.3 - https://maven.apache.org/docs/3.9.4/release-notes.html | https://github.com/apache/maven/releases/tag/maven-3.9.4 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - Pass GitHub Actions - Manual test : run `build/mvn -version` wll trigger download `apache-maven-3.9.4-bin.tar.gz` ``` exec: curl --silent --show-error -L https://www.apache.org/dyn/closer.lua/maven/maven-3/3.9.4/binaries/apache-maven-3.9.4-bin.tar.gz?action=download ``` ### Was this patch authored or co-authored using generative AI tooling? No Closes #42827 from LuciferYang/maven-394. Authored-by: yangjie01 Signed-off-by: Sean Owen --- dev/appveyor-install-dependencies.ps1 | 2 +- docs/building-spark.md| 2 +- pom.xml | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/dev/appveyor-install-dependencies.ps1 b/dev/appveyor-install-dependencies.ps1 index db154cd51da..682d388bdf9 100644 --- a/dev/appveyor-install-dependencies.ps1 +++ b/dev/appveyor-install-dependencies.ps1 @@ -81,7 +81,7 @@ if (!(Test-Path $tools)) { # == Maven # Push-Location $tools # -# $mavenVer = "3.8.8" +# $mavenVer = "3.9.4" # Start-FileDownload "https://archive.apache.org/dist/maven/maven-3/$mavenVer/binaries/apache-maven-$mavenVer-bin.zip"; "maven.zip" # # # extract diff --git a/docs/building-spark.md b/docs/building-spark.md index 4b8e70655d5..bbbc51d8c22 100644 --- a/docs/building-spark.md +++ b/docs/building-spark.md @@ -27,7 +27,7 @@ license: | ## Apache Maven The Maven-based build is the build of reference for Apache Spark. -Building Spark using Maven requires Maven 3.8.8 and Java 8/11/17. +Building Spark using Maven requires Maven 3.9.4 and Java 8/11/17. Spark requires Scala 2.12/2.13; support for Scala 2.11 was removed in Spark 3.0.0. ### Setting up Maven's Memory Usage diff --git a/pom.xml b/pom.xml index a61d603fe1c..02920c0ae74 100644 --- a/pom.xml +++ b/pom.xml @@ -115,7 +115,7 @@ 1.8 ${java.version} ${java.version} -3.8.8 +3.9.4 3.1.0 spark 9.5 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (c8fa821a873 -> 445c5417ea1)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from c8fa821a873 [SPARK-44866][SQL] Add `SnowflakeDialect` to handle BOOLEAN type correctly add 445c5417ea1 [SPARK-45105][DOCS] Make hyperlinks in documents clickable No new revisions were added by this update. Summary of changes: docs/running-on-mesos.md | 4 ++-- docs/running-on-yarn.md | 6 +++--- 2 files changed, 5 insertions(+), 5 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark-website] branch asf-site updated: make hyperlinks clickable & fix link
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/spark-website.git The following commit(s) were added to refs/heads/asf-site by this push: new 642d1fb834 make hyperlinks clickable & fix link 642d1fb834 is described below commit 642d1fb834817014e1799e73882d53650c1c1662 Author: panbingkun AuthorDate: Sat Sep 9 08:24:02 2023 -0500 make hyperlinks clickable & fix link The pr aims to: - make hyperlinks clickable to improve document usability - fix some link to reduce one jump. Author: panbingkun Closes #475 from panbingkun/make_hyperlinks_clickable. --- README.md | 2 +- committers.md | 9 +++-- developer-tools.md| 2 +- release-process.md| 16 ++-- security.md | 2 +- site/committers.html | 9 +++-- site/developer-tools.html | 2 +- site/release-process.html | 16 ++-- site/security.html| 2 +- 9 files changed, 31 insertions(+), 29 deletions(-) diff --git a/README.md b/README.md index ea34048ae7..3e6492c921 100644 --- a/README.md +++ b/README.md @@ -7,7 +7,7 @@ Building the site requires [Jekyll](http://jekyllrb.com/docs) The easiest way to install the right version of these tools is using [Bundler](https://bundler.io/) and running `bundle install` in this directory. -See also https://github.com/apache/spark/blob/master/docs/README.md +See also [https://github.com/apache/spark/blob/master/docs/README.md](https://github.com/apache/spark/blob/master/docs/README.md) A site build will update the directories and files in the `site` directory with the generated files. Using Jekyll via `bundle exec jekyll` locks it to the right version. diff --git a/committers.md b/committers.md index 2431d73f84..a555424026 100644 --- a/committers.md +++ b/committers.md @@ -197,8 +197,8 @@ origin g...@github.com:[your username]/spark.git (push) For the `apache` repo, you will need to set up command-line authentication to GitHub. This may include setting up an SSH key and/or personal access token. See: -- https://help.github.com/articles/connecting-to-github-with-ssh/ -- https://help.github.com/articles/creating-a-personal-access-token-for-the-command-line/ +- [https://docs.github.com/en/authentication/connecting-to-github-with-ssh](https://docs.github.com/en/authentication/connecting-to-github-with-ssh) +- [https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens) To check whether the necessary write access are already granted please visit [GitBox](https://gitbox.apache.org/setup/). @@ -219,10 +219,7 @@ Then, in a separate window, modify the code and push a commit. Run `git rebase - You can verify the result is one change with `git log`. Then resume the script in the other window. Also, please remember to set Assignee on JIRAs where applicable when they are resolved. The script -can do this automatically in most cases. However where the contributor is not yet a part of the -Contributors group for the Spark project in ASF JIRA, it won't work until they are added. Ask -an admin to add the person to Contributors at -https://issues.apache.org/jira/plugins/servlet/project-config/SPARK/roles . +can do this automatically in most cases. Once a PR is merged please leave a comment on the PR stating which branch(es) it has been merged with. diff --git a/developer-tools.md b/developer-tools.md index 73e708116e..59850dbe19 100644 --- a/developer-tools.md +++ b/developer-tools.md @@ -193,7 +193,7 @@ Please check other available options via `python/run-tests[-with-coverage] --hel Although GitHub Action provide both K8s unit test and integration test coverage, you can run it locally. For example, Volcano batch scheduler integration test should be done manually. Please refer the integration test documentation for the detail. -https://github.com/apache/spark/blob/master/resource-managers/kubernetes/integration-tests/README.md +[https://github.com/apache/spark/blob/master/resource-managers/kubernetes/integration-tests/README.md](https://github.com/apache/spark/blob/master/resource-managers/kubernetes/integration-tests/README.md) Testing with GitHub actions workflow diff --git a/release-process.md b/release-process.md index 101db9a8b3..87a1ab6778 100644 --- a/release-process.md +++ b/release-process.md @@ -31,9 +31,9 @@ The release manager role in Spark means you are responsible for a few different If you are a new Release Manager, you can read up on the process from the followings: -- release signing https://www.apache.org/dev/release-signing.html -- gpg for signing https://www.apache.org/dev/openpgp.html -- svn https://www.a
[spark] branch master updated: [SPARK-44732][XML][FOLLOWUP] Partial backport of spark-xml "Shortcut common type inference cases to fail fast"
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new a37c265371d [SPARK-44732][XML][FOLLOWUP] Partial backport of spark-xml "Shortcut common type inference cases to fail fast" a37c265371d is described below commit a37c265371dc861fa478dd63deaa38a86415fe3b Author: Sean Owen AuthorDate: Thu Sep 7 15:21:36 2023 -0700 [SPARK-44732][XML][FOLLOWUP] Partial backport of spark-xml "Shortcut common type inference cases to fail fast" ### What changes were proposed in this pull request? Partial back-port of https://github.com/databricks/spark-xml/commit/994e357f7666956b5d0e63627716b2c092d9abbd?diff=split from spark-xml ### Why are the changes needed? Though no more development was intended on spark-xml, there was a non-trivial improvement to inference speed that I committed anyway to resolve a customer issue. Part of it can be 'backported' here to sync the code. I attached this as a follow-up to the main code port JIRA. There is still, in general, no intent to commit more to spark-xml in the meantime unless it's significantly important. ### Does this PR introduce _any_ user-facing change? No, this should only speed up schema inference without behavior change. ### How was this patch tested? Tested in spark-xml, and will be tested by tests here too Closes #42844 from srowen/SPARK-44732.2. Authored-by: Sean Owen Signed-off-by: Sean Owen --- .../org/apache/spark/sql/catalyst/xml/TypeCast.scala | 16 1 file changed, 16 insertions(+) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/xml/TypeCast.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/xml/TypeCast.scala index a00f372da7f..b065dd41f28 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/xml/TypeCast.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/xml/TypeCast.scala @@ -155,6 +155,12 @@ private[sql] object TypeCast { } else { value } +// A little shortcut to avoid trying many formatters in the common case that +// the input isn't a double. All built-in formats will start with a digit or period. +if (signSafeValue.isEmpty || + !(Character.isDigit(signSafeValue.head) || signSafeValue.head == '.')) { + return false +} // Rule out strings ending in D or F, as they will parse as double but should be disallowed if (value.nonEmpty && (value.last match { case 'd' | 'D' | 'f' | 'F' => true @@ -171,6 +177,11 @@ private[sql] object TypeCast { } else { value } +// A little shortcut to avoid trying many formatters in the common case that +// the input isn't a number. All built-in formats will start with a digit. +if (signSafeValue.isEmpty || !Character.isDigit(signSafeValue.head)) { + return false +} (allCatch opt signSafeValue.toInt).isDefined } @@ -180,6 +191,11 @@ private[sql] object TypeCast { } else { value } +// A little shortcut to avoid trying many formatters in the common case that +// the input isn't a number. All built-in formats will start with a digit. +if (signSafeValue.isEmpty || !Character.isDigit(signSafeValue.head)) { + return false +} (allCatch opt signSafeValue.toLong).isDefined } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (0e6e15ca633 -> b8b58e0b95b)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 0e6e15ca633 [SPARK-45080][SS] Explicitly call out support for columnar in DSv2 streaming data sources add b8b58e0b95b [SPARK-45077][UI] Upgrade dagre-d3.js from 0.4.3 to 0.6.4 No new revisions were added by this update. Summary of changes: .../org/apache/spark/ui/static/dagre-d3.min.js | 4836 +++- 1 file changed, 4829 insertions(+), 7 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-45067][BUILD] Upgrade slf4j to 2.0.9
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 85d1c7f3a5d [SPARK-45067][BUILD] Upgrade slf4j to 2.0.9 85d1c7f3a5d is described below commit 85d1c7f3a5dd0a9162d93b80812a193d8ccfef18 Author: yangjie01 AuthorDate: Mon Sep 4 09:15:44 2023 -0500 [SPARK-45067][BUILD] Upgrade slf4j to 2.0.9 ### What changes were proposed in this pull request? This pr aims upgrade slf4j from 2.0.7 to 2.0.9 ### Why are the changes needed? The release notes as follows: - https://www.slf4j.org/news.html#2.0.9 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass GitHub Actions ### Was this patch authored or co-authored using generative AI tooling? No Closes #42796 from LuciferYang/SPARK-45067. Authored-by: yangjie01 Signed-off-by: Sean Owen --- dev/deps/spark-deps-hadoop-3-hive-2.3 | 6 +++--- pom.xml | 2 +- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 b/dev/deps/spark-deps-hadoop-3-hive-2.3 index 59164c1f8f4..652127a9bb8 100644 --- a/dev/deps/spark-deps-hadoop-3-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3-hive-2.3 @@ -118,7 +118,7 @@ javassist/3.29.2-GA//javassist-3.29.2-GA.jar javax.jdo/3.2.0-m3//javax.jdo-3.2.0-m3.jar javolution/5.5.1//javolution-5.5.1.jar jaxb-runtime/2.3.2//jaxb-runtime-2.3.2.jar -jcl-over-slf4j/2.0.7//jcl-over-slf4j-2.0.7.jar +jcl-over-slf4j/2.0.9//jcl-over-slf4j-2.0.9.jar jdo-api/3.0.1//jdo-api-3.0.1.jar jdom2/2.0.6//jdom2-2.0.6.jar jersey-client/2.40//jersey-client-2.40.jar @@ -141,7 +141,7 @@ json4s-jackson_2.12/3.7.0-M11//json4s-jackson_2.12-3.7.0-M11.jar json4s-scalap_2.12/3.7.0-M11//json4s-scalap_2.12-3.7.0-M11.jar jsr305/3.0.0//jsr305-3.0.0.jar jta/1.1//jta-1.1.jar -jul-to-slf4j/2.0.7//jul-to-slf4j-2.0.7.jar +jul-to-slf4j/2.0.9//jul-to-slf4j-2.0.9.jar kryo-shaded/4.0.2//kryo-shaded-4.0.2.jar kubernetes-client-api/6.8.1//kubernetes-client-api-6.8.1.jar kubernetes-client/6.8.1//kubernetes-client-6.8.1.jar @@ -233,7 +233,7 @@ scala-parser-combinators_2.12/2.3.0//scala-parser-combinators_2.12-2.3.0.jar scala-reflect/2.12.18//scala-reflect-2.12.18.jar scala-xml_2.12/2.2.0//scala-xml_2.12-2.2.0.jar shims/0.9.45//shims-0.9.45.jar -slf4j-api/2.0.7//slf4j-api-2.0.7.jar +slf4j-api/2.0.9//slf4j-api-2.0.9.jar snakeyaml-engine/2.6//snakeyaml-engine-2.6.jar snakeyaml/2.0//snakeyaml-2.0.jar snappy-java/1.1.10.3//snappy-java-1.1.10.3.jar diff --git a/pom.xml b/pom.xml index efd1c6ffdb9..a61d603fe1c 100644 --- a/pom.xml +++ b/pom.xml @@ -119,7 +119,7 @@ 3.1.0 spark 9.5 -2.0.7 +2.0.9 2.20.0 3.3.6 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-44890][BUILD] Update miswritten remarks
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new ba1c2f3b383 [SPARK-44890][BUILD] Update miswritten remarks ba1c2f3b383 is described below commit ba1c2f3b38396c01739375d6e83ac84b581d951e Author: chenyu-opensource <119398199+chenyu-opensou...@users.noreply.github.com> AuthorDate: Mon Sep 4 09:12:33 2023 -0500 [SPARK-44890][BUILD] Update miswritten remarks ### What changes were proposed in this pull request? The PR updates miswritten remarks in pom.xml ### Why are the changes needed? More accurate and standardized ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? It doesn't need to. It is annotation information that does not affect actual operation ### Was this patch authored or co-authored using generative AI tooling? No Closes #42598 from chenyu-opensource/master. Authored-by: chenyu-opensource <119398199+chenyu-opensou...@users.noreply.github.com> Signed-off-by: Sean Owen --- pom.xml | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/pom.xml b/pom.xml index 8edc3fd550c..efd1c6ffdb9 100644 --- a/pom.xml +++ b/pom.xml @@ -153,7 +153,7 @@ 2.5.1 2.0.8 4.2.19 @@ -175,7 +175,7 @@ 2.12.18 2.12 2.2.0 - + 4.8.0 false 2.16.0 @@ -204,7 +204,7 @@ 3.1.9 2.40 2.12.5 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.5 updated: [SPARK-45042][BUILD][3.5] Upgrade jetty to 9.4.52.v20230823
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new 950b2f29105 [SPARK-45042][BUILD][3.5] Upgrade jetty to 9.4.52.v20230823 950b2f29105 is described below commit 950b2f29105cd66355eef10503a93d678087c79e Author: panbingkun AuthorDate: Mon Sep 4 09:01:50 2023 -0500 [SPARK-45042][BUILD][3.5] Upgrade jetty to 9.4.52.v20230823 ### What changes were proposed in this pull request? The pr aims to Upgrade jetty from 9.4.51.v20230217 to 9.4.52.v20230823. (Backport to Spark 3.5.0) ### Why are the changes needed? - This is a release of the https://github.com/eclipse/jetty.project/issues/7958 that was sponsored by a [support contract from Webtide.com](mailto:saleswebtide.com) - The newest version fix a possible security issue: This release provides a workaround for Security Advisory https://github.com/advisories/GHSA-58qw-p7qm-5rvh - The release note as follows: https://github.com/eclipse/jetty.project/releases/tag/jetty-9.4.52.v20230823 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #42795 from panbingkun/branch-3.5_SPARK-45042. Authored-by: panbingkun Signed-off-by: Sean Owen --- dev/deps/spark-deps-hadoop-3-hive-2.3 | 4 ++-- pom.xml | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 b/dev/deps/spark-deps-hadoop-3-hive-2.3 index b6aba589d5f..1d02f8dba56 100644 --- a/dev/deps/spark-deps-hadoop-3-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3-hive-2.3 @@ -130,8 +130,8 @@ jersey-container-servlet/2.40//jersey-container-servlet-2.40.jar jersey-hk2/2.40//jersey-hk2-2.40.jar jersey-server/2.40//jersey-server-2.40.jar jettison/1.1//jettison-1.1.jar -jetty-util-ajax/9.4.51.v20230217//jetty-util-ajax-9.4.51.v20230217.jar -jetty-util/9.4.51.v20230217//jetty-util-9.4.51.v20230217.jar +jetty-util-ajax/9.4.52.v20230823//jetty-util-ajax-9.4.52.v20230823.jar +jetty-util/9.4.52.v20230823//jetty-util-9.4.52.v20230823.jar jline/2.14.6//jline-2.14.6.jar joda-time/2.12.5//joda-time-2.12.5.jar jodd-core/3.5.2//jodd-core-3.5.2.jar diff --git a/pom.xml b/pom.xml index 154ca4005f6..8fc4b89a78c 100644 --- a/pom.xml +++ b/pom.xml @@ -143,7 +143,7 @@ 1.13.1 1.9.1 shaded-protobuf -9.4.51.v20230217 +9.4.52.v20230823 4.0.3 0.10.0
[spark] branch master updated: [SPARK-44956][BUILD] Upgrade Jekyll to 4.3.2 & Webrick to 1.8.1
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 967aac1171a [SPARK-44956][BUILD] Upgrade Jekyll to 4.3.2 & Webrick to 1.8.1 967aac1171a is described below commit 967aac1171a49c8e98c992512487d77c2b1c4565 Author: panbingkun AuthorDate: Sat Sep 2 08:19:38 2023 -0500 [SPARK-44956][BUILD] Upgrade Jekyll to 4.3.2 & Webrick to 1.8.1 ### What changes were proposed in this pull request? The pr aims to upgrade - Jekyll from 4.2.1 to 4.3.2. - Webrick from 1.7 to 1.8.1. ### Why are the changes needed? 1.The `4.2.1` version was released on Sep 27, 2021, and it has been 2 years since now. 2.Jekyll 4.3.2 was released in `Jan 21, 2023`, which includes the fix of a regression bug. - https://github.com/jekyll/jekyll/releases/tag/v4.3.2 - https://github.com/jekyll/jekyll/releases/tag/v4.3.1 - https://github.com/jekyll/jekyll/releases/tag/v4.3.0 Fix regression in Convertible module from v4.2.0 (https://github.com/jekyll/jekyll/pull/8786) - https://github.com/jekyll/jekyll/releases/tag/v4.2.2 3.The webrick newest version include some big fixed. https://github.com/ruby/webrick/releases/tag/v1.8.1 https://github.com/ruby/webrick/releases/tag/v1.8.0 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - Pass GA. - Manually test. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #42669 from panbingkun/SPARK-44956. Authored-by: panbingkun Signed-off-by: Sean Owen --- docs/Gemfile | 4 ++-- docs/Gemfile.lock | 62 +-- 2 files changed, 35 insertions(+), 31 deletions(-) diff --git a/docs/Gemfile b/docs/Gemfile index 6c352012964..6c676037116 100644 --- a/docs/Gemfile +++ b/docs/Gemfile @@ -18,7 +18,7 @@ source "https://rubygems.org"; gem "ffi", "1.15.5" -gem "jekyll", "4.2.1" +gem "jekyll", "4.3.2" gem "rouge", "3.26.0" gem "jekyll-redirect-from", "0.16.0" -gem "webrick", "1.7" +gem "webrick", "1.8.1" diff --git a/docs/Gemfile.lock b/docs/Gemfile.lock index 6654e6c47c6..eda31f85747 100644 --- a/docs/Gemfile.lock +++ b/docs/Gemfile.lock @@ -1,74 +1,78 @@ GEM remote: https://rubygems.org/ specs: -addressable (2.8.0) - public_suffix (>= 2.0.2, < 5.0) +addressable (2.8.5) + public_suffix (>= 2.0.2, < 6.0) colorator (1.1.0) -concurrent-ruby (1.1.9) -em-websocket (0.5.2) +concurrent-ruby (1.2.2) +em-websocket (0.5.3) eventmachine (>= 0.12.9) - http_parser.rb (~> 0.6.0) + http_parser.rb (~> 0) eventmachine (1.2.7) ffi (1.15.5) forwardable-extended (2.6.0) -http_parser.rb (0.6.0) -i18n (1.8.11) +google-protobuf (3.24.2) +http_parser.rb (0.8.0) +i18n (1.14.1) concurrent-ruby (~> 1.0) -jekyll (4.2.1) +jekyll (4.3.2) addressable (~> 2.4) colorator (~> 1.0) em-websocket (~> 0.5) i18n (~> 1.0) - jekyll-sass-converter (~> 2.0) + jekyll-sass-converter (>= 2.0, < 4.0) jekyll-watch (~> 2.0) - kramdown (~> 2.3) + kramdown (~> 2.3, >= 2.3.1) kramdown-parser-gfm (~> 1.0) liquid (~> 4.0) - mercenary (~> 0.4.0) + mercenary (>= 0.3.6, < 0.5) pathutil (~> 0.9) - rouge (~> 3.0) + rouge (>= 3.0, < 5.0) safe_yaml (~> 1.0) - terminal-table (~> 2.0) + terminal-table (>= 1.8, < 4.0) + webrick (~> 1.7) jekyll-redirect-from (0.16.0) jekyll (>= 3.3, < 5.0) -jekyll-sass-converter (2.1.0) - sassc (> 2.0.1, < 3.0) +jekyll-sass-converter (3.0.0) + sass-embedded (~> 1.54) jekyll-watch (2.2.1) listen (~> 3.0) -kramdown (2.3.1) +kramdown (2.4.0) rexml kramdown-parser-gfm (1.1.0) kramdown (~> 2.0) -liquid (4.0.3) -listen (3.7.0) +liquid (4.0.4) +listen (3.8.0) rb-fsevent (~> 0.10, >= 0.10.3) rb-inotify (~> 0.9, >= 0.9.10) mercenary (0.4.0) pathutil (0.16.2) forwardable-extended (~> 2.6) -public_suffix (4.0.6) -rb-fsevent (0.11.0) +public_suffix (5.0.3) +rake (13.0.6) +rb-fsevent (0.11.2) rb-inotify (0.10.1) ffi (~> 1.0) -rexml (3.2.5) +rexml (3.2.6) rouge (3.26.0) safe_yaml (1.0.5) -sassc (2.4.0) - ffi (~> 1.9) -terminal-table (2.0.0) - unicode-display_width (~> 1.1, >= 1.1.1) -
[spark] branch master updated: [SPARK-45043][BUILD] Upgrade `scalafmt` to 3.7.13
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 82d54fc8924 [SPARK-45043][BUILD] Upgrade `scalafmt` to 3.7.13 82d54fc8924 is described below commit 82d54fc8924618777992ee9a4d939b1fb336f20d Author: panbingkun AuthorDate: Sat Sep 2 08:18:43 2023 -0500 [SPARK-45043][BUILD] Upgrade `scalafmt` to 3.7.13 ### What changes were proposed in this pull request? The pr aims to upgrade `scalafmt` from 3.7.5 to 3.7.13. ### Why are the changes needed? 1.The newest version include some bug fixed, eg: - FormatWriter: accumulate align shift correctly (https://github.com/scalameta/scalafmt/pull/3615) - Indents: ignore fewerBraces if indentation is 1 (https://github.com/scalameta/scalafmt/pull/3592) - RemoveScala3OptionalBraces: handle infix on rbrace (https://github.com/scalameta/scalafmt/pull/3576) 2.The full release notes: https://github.com/scalameta/scalafmt/releases/tag/v3.7.13 https://github.com/scalameta/scalafmt/releases/tag/v3.7.12 https://github.com/scalameta/scalafmt/releases/tag/v3.7.11 https://github.com/scalameta/scalafmt/releases/tag/v3.7.10 https://github.com/scalameta/scalafmt/releases/tag/v3.7.9 https://github.com/scalameta/scalafmt/releases/tag/v3.7.8 https://github.com/scalameta/scalafmt/releases/tag/v3.7.7 https://github.com/scalameta/scalafmt/releases/tag/v3.7.6 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #42764 from panbingkun/SPARK-45043. Authored-by: panbingkun Signed-off-by: Sean Owen --- dev/.scalafmt.conf | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/dev/.scalafmt.conf b/dev/.scalafmt.conf index c3b26002a76..721dec28990 100644 --- a/dev/.scalafmt.conf +++ b/dev/.scalafmt.conf @@ -32,4 +32,4 @@ fileOverride { runner.dialect = scala213 } } -version = 3.7.5 +version = 3.7.13 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-44782][INFRA] Adjust PR template to Generative Tooling Guidance recommendations
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 2e2f5e9c28b [SPARK-44782][INFRA] Adjust PR template to Generative Tooling Guidance recommendations 2e2f5e9c28b is described below commit 2e2f5e9c28b4e88171949006937c094304581738 Author: zero323 AuthorDate: Fri Aug 18 21:13:36 2023 -0500 [SPARK-44782][INFRA] Adjust PR template to Generative Tooling Guidance recommendations ### What changes were proposed in this pull request? This PR adds _Was this patch authored or co-authored using generative AI tooling?_ section to the PR template. ### Why are the changes needed? To reflect recommendations of the [ASF Generative Tooling Guidance](https://www.apache.org/legal/generative-tooling.html). ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manual inspection. Closes #42469 from zero323/SPARK-44782. Authored-by: zero323 Signed-off-by: Sean Owen --- .github/PULL_REQUEST_TEMPLATE | 9 + 1 file changed, 9 insertions(+) diff --git a/.github/PULL_REQUEST_TEMPLATE b/.github/PULL_REQUEST_TEMPLATE index 1548696a3ca..a80bf21312a 100644 --- a/.github/PULL_REQUEST_TEMPLATE +++ b/.github/PULL_REQUEST_TEMPLATE @@ -47,3 +47,12 @@ If it was tested in a way different from regular unit tests, please clarify how If tests were not added, please describe why they were not added and/or why it was difficult to add. If benchmark tests were added, please run the benchmarks in GitHub Actions for the consistent environment, and the instructions could accord to: https://spark.apache.org/developer-tools.html#github-workflow-benchmarks. --> + + +### Was this patch authored or co-authored using generative AI tooling? + - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.3 updated: [SPARK-44813][INFRA] The Jira Python misses our assignee when it searches users again
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch branch-3.3 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.3 by this push: new 7e7c41bf100 [SPARK-44813][INFRA] The Jira Python misses our assignee when it searches users again 7e7c41bf100 is described below commit 7e7c41bf1007ca05ffc3d818d34d75570d234a6d Author: Kent Yao AuthorDate: Fri Aug 18 10:02:43 2023 -0500 [SPARK-44813][INFRA] The Jira Python misses our assignee when it searches users again ### What changes were proposed in this pull request? This PR creates an alternative to the assign_issue function in jira.client.JIRA. The original one has an issue that it will search users again and only choose the assignee from 20 candidates. If it's unmatched, it picks the head blindly. For example, ```python >>> assignee = asf_jira.user("yao") >>> "SPARK-44801" 'SPARK-44801' >>> asf_jira.assign_issue(issue.key, assignee.name) Traceback (most recent call last): File "", line 1, in NameError: name 'issue' is not defined >>> asf_jira.assign_issue("SPARK-44801", assignee.name) Traceback (most recent call last): File "", line 1, in File "/Users/hzyaoqin/python/lib/python3.11/site-packages/jira/client.py", line 123, in wrapper result = func(*arg_list, **kwargs) ^ File "/Users/hzyaoqin/python/lib/python3.11/site-packages/jira/client.py", line 1891, in assign_issue self._session.put(url, data=json.dumps(payload)) File "/Users/hzyaoqin/python/lib/python3.11/site-packages/requests/sessions.py", line 649, in put return self.request("PUT", url, data=data, **kwargs) ^ File "/Users/hzyaoqin/python/lib/python3.11/site-packages/jira/resilientsession.py", line 246, in request elif raise_on_error(response, **processed_kwargs): File "/Users/hzyaoqin/python/lib/python3.11/site-packages/jira/resilientsession.py", line 71, in raise_on_error raise JIRAError( jira.exceptions.JIRAError: JiraError HTTP 400 url: https://issues.apache.org/jira/rest/api/latest/issue/SPARK-44801/assignee response text = {"errorMessages":[],"errors":{"assignee":"User 'airhot' cannot be assigned issues."}} ``` The Jira userid 'yao' fails to return my JIRA profile as a candidate(20 in total) to match. So, 'airhot' from the head replaces me as an assignee. ### Why are the changes needed? bugfix for merge_spark_pr ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? test locally ```python >>> def assign_issue(client: jira.client.JIRA, issue: int, assignee: str) -> bool: ... """Assign an issue to a user. ... ... Args: ... issue (Union[int, str]): the issue ID or key to assign ... assignee (str): the user to assign the issue to. None will set it to unassigned. -1 will set it to Automatic. ... ... Returns: ... bool ... """ ... url = getattr(client, "_get_latest_url")(f"issue/{issue}/assignee") ... payload = {"name": assignee} ... getattr(client, "_session").put(url, data=json.dumps(payload)) ... return True ... >>> >>> assign_issue(asf_jira, "SPARK-44801", "yao") True ``` Closes #42496 from yaooqinn/SPARK-44813. Authored-by: Kent Yao Signed-off-by: Sean Owen (cherry picked from commit 00255bc63b1a3bbe80bedc639b88d4a8e3f88f72) Signed-off-by: Sean Owen --- dev/merge_spark_pr.py | 15 ++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/dev/merge_spark_pr.py b/dev/merge_spark_pr.py index e21a39a6881..8555abe9bd0 100755 --- a/dev/merge_spark_pr.py +++ b/dev/merge_spark_pr.py @@ -372,7 +372,7 @@ def choose_jira_assignee(issue, asf_jira): except BaseException: # assume it's a user id, and try to assign (might fail, we just prompt again) assignee = asf_jira.user(raw_assignee) -asf_jira.assign_issue(issue.key, assignee.name) +assign_issue(issue.key, assignee.name) return assignee except KeyboardInterrupt: raise @@ -381,6 +381,19 @@ def choose_jira_as
[spark] branch branch-3.4 updated: [SPARK-44813][INFRA] The Jira Python misses our assignee when it searches users again
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.4 by this push: new 3c5e57d886b [SPARK-44813][INFRA] The Jira Python misses our assignee when it searches users again 3c5e57d886b is described below commit 3c5e57d886b81808370353781bfce2b2ce20a473 Author: Kent Yao AuthorDate: Fri Aug 18 10:02:43 2023 -0500 [SPARK-44813][INFRA] The Jira Python misses our assignee when it searches users again ### What changes were proposed in this pull request? This PR creates an alternative to the assign_issue function in jira.client.JIRA. The original one has an issue that it will search users again and only choose the assignee from 20 candidates. If it's unmatched, it picks the head blindly. For example, ```python >>> assignee = asf_jira.user("yao") >>> "SPARK-44801" 'SPARK-44801' >>> asf_jira.assign_issue(issue.key, assignee.name) Traceback (most recent call last): File "", line 1, in NameError: name 'issue' is not defined >>> asf_jira.assign_issue("SPARK-44801", assignee.name) Traceback (most recent call last): File "", line 1, in File "/Users/hzyaoqin/python/lib/python3.11/site-packages/jira/client.py", line 123, in wrapper result = func(*arg_list, **kwargs) ^ File "/Users/hzyaoqin/python/lib/python3.11/site-packages/jira/client.py", line 1891, in assign_issue self._session.put(url, data=json.dumps(payload)) File "/Users/hzyaoqin/python/lib/python3.11/site-packages/requests/sessions.py", line 649, in put return self.request("PUT", url, data=data, **kwargs) ^ File "/Users/hzyaoqin/python/lib/python3.11/site-packages/jira/resilientsession.py", line 246, in request elif raise_on_error(response, **processed_kwargs): File "/Users/hzyaoqin/python/lib/python3.11/site-packages/jira/resilientsession.py", line 71, in raise_on_error raise JIRAError( jira.exceptions.JIRAError: JiraError HTTP 400 url: https://issues.apache.org/jira/rest/api/latest/issue/SPARK-44801/assignee response text = {"errorMessages":[],"errors":{"assignee":"User 'airhot' cannot be assigned issues."}} ``` The Jira userid 'yao' fails to return my JIRA profile as a candidate(20 in total) to match. So, 'airhot' from the head replaces me as an assignee. ### Why are the changes needed? bugfix for merge_spark_pr ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? test locally ```python >>> def assign_issue(client: jira.client.JIRA, issue: int, assignee: str) -> bool: ... """Assign an issue to a user. ... ... Args: ... issue (Union[int, str]): the issue ID or key to assign ... assignee (str): the user to assign the issue to. None will set it to unassigned. -1 will set it to Automatic. ... ... Returns: ... bool ... """ ... url = getattr(client, "_get_latest_url")(f"issue/{issue}/assignee") ... payload = {"name": assignee} ... getattr(client, "_session").put(url, data=json.dumps(payload)) ... return True ... >>> >>> assign_issue(asf_jira, "SPARK-44801", "yao") True ``` Closes #42496 from yaooqinn/SPARK-44813. Authored-by: Kent Yao Signed-off-by: Sean Owen (cherry picked from commit 00255bc63b1a3bbe80bedc639b88d4a8e3f88f72) Signed-off-by: Sean Owen --- dev/merge_spark_pr.py | 15 ++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/dev/merge_spark_pr.py b/dev/merge_spark_pr.py index 1621432c01c..8a5b6ebe8ef 100755 --- a/dev/merge_spark_pr.py +++ b/dev/merge_spark_pr.py @@ -372,7 +372,7 @@ def choose_jira_assignee(issue, asf_jira): except BaseException: # assume it's a user id, and try to assign (might fail, we just prompt again) assignee = asf_jira.user(raw_assignee) -asf_jira.assign_issue(issue.key, assignee.name) +assign_issue(issue.key, assignee.name) return assignee except KeyboardInterrupt: raise @@ -381,6 +381,19 @@ def choose_jira_as
[spark] branch branch-3.5 updated: [SPARK-44813][INFRA] The Jira Python misses our assignee when it searches users again
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new f7dd0a95727 [SPARK-44813][INFRA] The Jira Python misses our assignee when it searches users again f7dd0a95727 is described below commit f7dd0a95727259ff4b7a2f849798f8a93cf78b69 Author: Kent Yao AuthorDate: Fri Aug 18 10:02:43 2023 -0500 [SPARK-44813][INFRA] The Jira Python misses our assignee when it searches users again ### What changes were proposed in this pull request? This PR creates an alternative to the assign_issue function in jira.client.JIRA. The original one has an issue that it will search users again and only choose the assignee from 20 candidates. If it's unmatched, it picks the head blindly. For example, ```python >>> assignee = asf_jira.user("yao") >>> "SPARK-44801" 'SPARK-44801' >>> asf_jira.assign_issue(issue.key, assignee.name) Traceback (most recent call last): File "", line 1, in NameError: name 'issue' is not defined >>> asf_jira.assign_issue("SPARK-44801", assignee.name) Traceback (most recent call last): File "", line 1, in File "/Users/hzyaoqin/python/lib/python3.11/site-packages/jira/client.py", line 123, in wrapper result = func(*arg_list, **kwargs) ^ File "/Users/hzyaoqin/python/lib/python3.11/site-packages/jira/client.py", line 1891, in assign_issue self._session.put(url, data=json.dumps(payload)) File "/Users/hzyaoqin/python/lib/python3.11/site-packages/requests/sessions.py", line 649, in put return self.request("PUT", url, data=data, **kwargs) ^ File "/Users/hzyaoqin/python/lib/python3.11/site-packages/jira/resilientsession.py", line 246, in request elif raise_on_error(response, **processed_kwargs): File "/Users/hzyaoqin/python/lib/python3.11/site-packages/jira/resilientsession.py", line 71, in raise_on_error raise JIRAError( jira.exceptions.JIRAError: JiraError HTTP 400 url: https://issues.apache.org/jira/rest/api/latest/issue/SPARK-44801/assignee response text = {"errorMessages":[],"errors":{"assignee":"User 'airhot' cannot be assigned issues."}} ``` The Jira userid 'yao' fails to return my JIRA profile as a candidate(20 in total) to match. So, 'airhot' from the head replaces me as an assignee. ### Why are the changes needed? bugfix for merge_spark_pr ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? test locally ```python >>> def assign_issue(client: jira.client.JIRA, issue: int, assignee: str) -> bool: ... """Assign an issue to a user. ... ... Args: ... issue (Union[int, str]): the issue ID or key to assign ... assignee (str): the user to assign the issue to. None will set it to unassigned. -1 will set it to Automatic. ... ... Returns: ... bool ... """ ... url = getattr(client, "_get_latest_url")(f"issue/{issue}/assignee") ... payload = {"name": assignee} ... getattr(client, "_session").put(url, data=json.dumps(payload)) ... return True ... >>> >>> assign_issue(asf_jira, "SPARK-44801", "yao") True ``` Closes #42496 from yaooqinn/SPARK-44813. Authored-by: Kent Yao Signed-off-by: Sean Owen (cherry picked from commit 00255bc63b1a3bbe80bedc639b88d4a8e3f88f72) Signed-off-by: Sean Owen --- dev/merge_spark_pr.py | 15 ++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/dev/merge_spark_pr.py b/dev/merge_spark_pr.py index bc51b8af2eb..37488557fea 100755 --- a/dev/merge_spark_pr.py +++ b/dev/merge_spark_pr.py @@ -373,7 +373,7 @@ def choose_jira_assignee(issue, asf_jira): except BaseException: # assume it's a user id, and try to assign (might fail, we just prompt again) assignee = asf_jira.user(raw_assignee) -asf_jira.assign_issue(issue.key, assignee.name) +assign_issue(issue.key, assignee.name) return assignee except KeyboardInterrupt: raise @@ -382,6 +382,19 @@ def choose_jira_as
[spark] branch master updated: [SPARK-44813][INFRA] The Jira Python misses our assignee when it searches users again
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 00255bc63b1 [SPARK-44813][INFRA] The Jira Python misses our assignee when it searches users again 00255bc63b1 is described below commit 00255bc63b1a3bbe80bedc639b88d4a8e3f88f72 Author: Kent Yao AuthorDate: Fri Aug 18 10:02:43 2023 -0500 [SPARK-44813][INFRA] The Jira Python misses our assignee when it searches users again ### What changes were proposed in this pull request? This PR creates an alternative to the assign_issue function in jira.client.JIRA. The original one has an issue that it will search users again and only choose the assignee from 20 candidates. If it's unmatched, it picks the head blindly. For example, ```python >>> assignee = asf_jira.user("yao") >>> "SPARK-44801" 'SPARK-44801' >>> asf_jira.assign_issue(issue.key, assignee.name) Traceback (most recent call last): File "", line 1, in NameError: name 'issue' is not defined >>> asf_jira.assign_issue("SPARK-44801", assignee.name) Traceback (most recent call last): File "", line 1, in File "/Users/hzyaoqin/python/lib/python3.11/site-packages/jira/client.py", line 123, in wrapper result = func(*arg_list, **kwargs) ^ File "/Users/hzyaoqin/python/lib/python3.11/site-packages/jira/client.py", line 1891, in assign_issue self._session.put(url, data=json.dumps(payload)) File "/Users/hzyaoqin/python/lib/python3.11/site-packages/requests/sessions.py", line 649, in put return self.request("PUT", url, data=data, **kwargs) ^ File "/Users/hzyaoqin/python/lib/python3.11/site-packages/jira/resilientsession.py", line 246, in request elif raise_on_error(response, **processed_kwargs): File "/Users/hzyaoqin/python/lib/python3.11/site-packages/jira/resilientsession.py", line 71, in raise_on_error raise JIRAError( jira.exceptions.JIRAError: JiraError HTTP 400 url: https://issues.apache.org/jira/rest/api/latest/issue/SPARK-44801/assignee response text = {"errorMessages":[],"errors":{"assignee":"User 'airhot' cannot be assigned issues."}} ``` The Jira userid 'yao' fails to return my JIRA profile as a candidate(20 in total) to match. So, 'airhot' from the head replaces me as an assignee. ### Why are the changes needed? bugfix for merge_spark_pr ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? test locally ```python >>> def assign_issue(client: jira.client.JIRA, issue: int, assignee: str) -> bool: ... """Assign an issue to a user. ... ... Args: ... issue (Union[int, str]): the issue ID or key to assign ... assignee (str): the user to assign the issue to. None will set it to unassigned. -1 will set it to Automatic. ... ... Returns: ... bool ... """ ... url = getattr(client, "_get_latest_url")(f"issue/{issue}/assignee") ... payload = {"name": assignee} ... getattr(client, "_session").put(url, data=json.dumps(payload)) ... return True ... >>> >>> assign_issue(asf_jira, "SPARK-44801", "yao") True ``` Closes #42496 from yaooqinn/SPARK-44813. Authored-by: Kent Yao Signed-off-by: Sean Owen --- dev/merge_spark_pr.py | 15 ++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/dev/merge_spark_pr.py b/dev/merge_spark_pr.py index 27d0afe80ed..213798e5a1a 100755 --- a/dev/merge_spark_pr.py +++ b/dev/merge_spark_pr.py @@ -394,7 +394,7 @@ def choose_jira_assignee(issue, asf_jira): except BaseException: # assume it's a user id, and try to assign (might fail, we just prompt again) assignee = asf_jira.user(raw_assignee) -asf_jira.assign_issue(issue.key, assignee.name) +assign_issue(issue.key, assignee.name) return assignee except KeyboardInterrupt: raise @@ -403,6 +403,19 @@ def choose_jira_assignee(issue, asf_jira): print("Error assigning JIRA, try again (or leave blank and fix man
[spark-website] branch asf-site updated: Add note on generative tooling to developer tools
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/spark-website.git The following commit(s) were added to refs/heads/asf-site by this push: new fc89ca1ed2 Add note on generative tooling to developer tools fc89ca1ed2 is described below commit fc89ca1ed20551c66dc31ebbe28664d12689bd13 Author: zero323 AuthorDate: Mon Aug 14 21:04:28 2023 -0500 Add note on generative tooling to developer tools This PR adds notes on generative tooling and link to the relevant ASF policy. As requested in comments to https://github.com/apache/spark/pull/42469 Author: zero323 Closes #472 from zero323/SPARK-44782-generative-tooling-notes. --- developer-tools.md| 9 + site/developer-tools.html | 9 + 2 files changed, 18 insertions(+) diff --git a/developer-tools.md b/developer-tools.md index e0a1844ae7..73e708116e 100644 --- a/developer-tools.md +++ b/developer-tools.md @@ -549,3 +549,12 @@ When running Spark tests through SBT, add `javaOptions in Test += "-agentpath:/p to `SparkBuild.scala` to launch the tests with the YourKit profiler agent enabled. The platform-specific paths to the profiler agents are listed in the https://www.yourkit.com/docs/java/help/agent.jsp";>YourKit documentation. + +Generative tooling usage + +In general, the ASF allows contributions co-authored using generative AI tools. However, there are several considerations when you submit a patch containing generated content. + +Foremost, you are required to disclose usage of such tool. Furthermore, you are responsible for ensuring that the terms and conditions of the tool in question are +compatible with usage in an Open Source project and inclusion of the generated content doesn't pose a risk of copyright violation. + +Please refer to https://www.apache.org/legal/generative-tooling.html";>The ASF Generative Tooling Guidance for details and developments. diff --git a/site/developer-tools.html b/site/developer-tools.html index a43786ff91..de94619481 100644 --- a/site/developer-tools.html +++ b/site/developer-tools.html @@ -657,6 +657,15 @@ to SparkBuild.scala to The platform-specific paths to the profiler agents are listed in the https://www.yourkit.com/docs/java/help/agent.jsp";>YourKit documentation. +Generative tooling usage + +In general, the ASF allows contributions co-authored using generative AI tools. However, there are several considerations when you submit a patch containing generated content. + +Foremost, you are required to disclose usage of such tool. Furthermore, you are responsible for ensuring that the terms and conditions of the tool in question are +compatible with usage in an Open Source project and inclusion of the generated content doesn’t pose a risk of copyright violation. + +Please refer to https://www.apache.org/legal/generative-tooling.html";>The ASF Generative Tooling Guidance for details and developments. + - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark-website] branch asf-site updated: Added IOMETE to powered by Spark docs
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/spark-website.git The following commit(s) were added to refs/heads/asf-site by this push: new 61c79d6c34 Added IOMETE to powered by Spark docs 61c79d6c34 is described below commit 61c79d6c34c151586a2bb02be1d0c4d86627ce31 Author: Fuad Musayev AuthorDate: Thu Aug 10 21:27:35 2023 -0500 Added IOMETE to powered by Spark docs Added IOMETE Data Lakehouse platform to Powered by Spark docs. Author: Fuad Musayev Closes #471 from fmusayev/powered-by-iomete. --- powered-by.md| 1 + site/powered-by.html | 1 + 2 files changed, 2 insertions(+) diff --git a/powered-by.md b/powered-by.md index 048108882b..8b2cfa4df1 100644 --- a/powered-by.md +++ b/powered-by.md @@ -131,6 +131,7 @@ and external data sources, driving holistic and actionable insights. - http://www.infoobjects.com";>InfoObjects - Award winning Big Data consulting company with focus on Spark and Hadoop - http://en.inspur.com";>Inspur +- https://iomete.com";>IOMETE - IOMETE offers a modern Cloud-Prem Data Lakehouse platform, extending cloud-like experience to on-premise and private clouds. Utilizing Apache Spark as the query engine, we enable running Spark Jobs and ML applications on AWS, Azure, GCP, or On-Prem. Discover more at https://iomete.com";>IOMETE. - http://www.sehir.edu.tr/en/";>Istanbul Sehir University - http://www.kenshoo.com/";>Kenshoo - Digital marketing solutions and predictive media optimization diff --git a/site/powered-by.html b/site/powered-by.html index de8eb55ce2..aa07b10347 100644 --- a/site/powered-by.html +++ b/site/powered-by.html @@ -319,6 +319,7 @@ environments or on bare-metal infrastructures. http://en.inspur.com";>Inspur + https://iomete.com";>IOMETE - IOMETE offers a modern Cloud-Prem Data Lakehouse platform, extending cloud-like experience to on-premise and private clouds. Utilizing Apache Spark as the query engine, we enable running Spark Jobs and ML applications on AWS, Azure, GCP, or On-Prem. Discover more at https://iomete.com";>IOMETE. http://www.sehir.edu.tr/en/";>Istanbul Sehir University http://www.kenshoo.com/";>Kenshoo - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-44650][CORE] `spark.executor.defaultJavaOptions` Check illegal java options
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 41a2a7daeee [SPARK-44650][CORE] `spark.executor.defaultJavaOptions` Check illegal java options 41a2a7daeee is described below commit 41a2a7daeee0a25d39f30364a694becf54ab37e7 Author: sychen AuthorDate: Sun Aug 6 08:24:40 2023 -0500 [SPARK-44650][CORE] `spark.executor.defaultJavaOptions` Check illegal java options ### What changes were proposed in this pull request? ### Why are the changes needed? Command ```bash ./bin/spark-shell --conf spark.executor.extraJavaOptions='-Dspark.foo=bar' ``` Error ``` spark.executor.extraJavaOptions is not allowed to set Spark options (was '-Dspark.foo=bar'). Set them directly on a SparkConf or in a properties file when using ./bin/spark-submit. ``` Command ```bash ./bin/spark-shell --conf spark.executor.defaultJavaOptions='-Dspark.foo=bar' ``` Start up normally. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? local test & add UT ``` ./bin/spark-shell --conf spark.executor.defaultJavaOptions='-Dspark.foo=bar' ``` ``` spark.executor.defaultJavaOptions is not allowed to set Spark options (was '-Dspark.foo=bar'). Set them directly on a SparkConf or in a properties file when using ./bin/spark-submit. ``` Closes #42313 from cxzl25/SPARK-44650. Authored-by: sychen Signed-off-by: Sean Owen --- .../main/scala/org/apache/spark/SparkConf.scala| 25 +++--- .../scala/org/apache/spark/SparkConfSuite.scala| 14 2 files changed, 27 insertions(+), 12 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/SparkConf.scala b/core/src/main/scala/org/apache/spark/SparkConf.scala index 813a14acd19..8c054d24b10 100644 --- a/core/src/main/scala/org/apache/spark/SparkConf.scala +++ b/core/src/main/scala/org/apache/spark/SparkConf.scala @@ -503,8 +503,6 @@ class SparkConf(loadDefaults: Boolean) extends Cloneable with Logging with Seria logWarning(msg) } -val executorOptsKey = EXECUTOR_JAVA_OPTIONS.key - // Used by Yarn in 1.1 and before sys.props.get("spark.driver.libraryPath").foreach { value => val warning = @@ -518,16 +516,19 @@ class SparkConf(loadDefaults: Boolean) extends Cloneable with Logging with Seria } // Validate spark.executor.extraJavaOptions -getOption(executorOptsKey).foreach { javaOpts => - if (javaOpts.contains("-Dspark")) { -val msg = s"$executorOptsKey is not allowed to set Spark options (was '$javaOpts'). " + - "Set them directly on a SparkConf or in a properties file when using ./bin/spark-submit." -throw new Exception(msg) - } - if (javaOpts.contains("-Xmx")) { -val msg = s"$executorOptsKey is not allowed to specify max heap memory settings " + - s"(was '$javaOpts'). Use spark.executor.memory instead." -throw new Exception(msg) +Seq(EXECUTOR_JAVA_OPTIONS.key, "spark.executor.defaultJavaOptions").foreach { executorOptsKey => + getOption(executorOptsKey).foreach { javaOpts => +if (javaOpts.contains("-Dspark")) { + val msg = s"$executorOptsKey is not allowed to set Spark options (was '$javaOpts'). " + +"Set them directly on a SparkConf or in a properties file " + +"when using ./bin/spark-submit." + throw new Exception(msg) +} +if (javaOpts.contains("-Xmx")) { + val msg = s"$executorOptsKey is not allowed to specify max heap memory settings " + +s"(was '$javaOpts'). Use spark.executor.memory instead." + throw new Exception(msg) +} } } diff --git a/core/src/test/scala/org/apache/spark/SparkConfSuite.scala b/core/src/test/scala/org/apache/spark/SparkConfSuite.scala index 74fd7816221..75e22e1418b 100644 --- a/core/src/test/scala/org/apache/spark/SparkConfSuite.scala +++ b/core/src/test/scala/org/apache/spark/SparkConfSuite.scala @@ -498,6 +498,20 @@ class SparkConfSuite extends SparkFunSuite with LocalSparkContext with ResetSyst } } } + + test("SPARK-44650: spark.executor.defaultJavaOptions Check illegal java options") { +val conf = new SparkConf() +conf.validateSettings() +conf.set(EXECUTOR_JAVA_OPTIONS.key, "-Dspark.foo=bar") +intercept[Exception] { + conf.validateSettings() +} +conf.remove(EXECUTOR_JAVA_OPTIONS.key)
[spark] branch branch-3.5 updated: [MINOR][DOC] Fix a typo in ResolveReferencesInUpdate scaladoc
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new 1ad71ffc33d [MINOR][DOC] Fix a typo in ResolveReferencesInUpdate scaladoc 1ad71ffc33d is described below commit 1ad71ffc33ddf0861f62e389a5e8ad438f9afb26 Author: Sergii Druzkin <65374769+sdruz...@users.noreply.github.com> AuthorDate: Thu Aug 3 18:52:44 2023 -0500 [MINOR][DOC] Fix a typo in ResolveReferencesInUpdate scaladoc ### What changes were proposed in this pull request? Fixed a typo in the ResolveReferencesInUpdate documentation. ### Why are the changes needed? ### Does this PR introduce any user-facing change? No ### How was this patch tested? CI Closes #42322 from sdruzkin/master. Authored-by: Sergii Druzkin <65374769+sdruz...@users.noreply.github.com> Signed-off-by: Sean Owen (cherry picked from commit 52a9002fa2383bd9b26c77e62e0c6bcd46f8944b) Signed-off-by: Sean Owen --- .../apache/spark/sql/catalyst/analysis/ResolveReferencesInUpdate.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveReferencesInUpdate.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveReferencesInUpdate.scala index cebc1e25f92..ead323ce985 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveReferencesInUpdate.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveReferencesInUpdate.scala @@ -25,7 +25,7 @@ import org.apache.spark.sql.errors.QueryCompilationErrors /** * A virtual rule to resolve [[UnresolvedAttribute]] in [[UpdateTable]]. It's only used by the real * rule `ResolveReferences`. The column resolution order for [[UpdateTable]] is: - * 1. Resolves the column to `AttributeReference`` with the output of the child plan. This + * 1. Resolves the column to `AttributeReference` with the output of the child plan. This *includes metadata columns as well. * 2. Resolves the column to a literal function which is allowed to be invoked without braces, e.g. *`SELECT col, current_date FROM t`. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [MINOR][DOC] Fix a typo in ResolveReferencesInUpdate scaladoc
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 52a9002fa23 [MINOR][DOC] Fix a typo in ResolveReferencesInUpdate scaladoc 52a9002fa23 is described below commit 52a9002fa2383bd9b26c77e62e0c6bcd46f8944b Author: Sergii Druzkin <65374769+sdruz...@users.noreply.github.com> AuthorDate: Thu Aug 3 18:52:44 2023 -0500 [MINOR][DOC] Fix a typo in ResolveReferencesInUpdate scaladoc ### What changes were proposed in this pull request? Fixed a typo in the ResolveReferencesInUpdate documentation. ### Why are the changes needed? ### Does this PR introduce any user-facing change? No ### How was this patch tested? CI Closes #42322 from sdruzkin/master. Authored-by: Sergii Druzkin <65374769+sdruz...@users.noreply.github.com> Signed-off-by: Sean Owen --- .../apache/spark/sql/catalyst/analysis/ResolveReferencesInUpdate.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveReferencesInUpdate.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveReferencesInUpdate.scala index cebc1e25f92..ead323ce985 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveReferencesInUpdate.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveReferencesInUpdate.scala @@ -25,7 +25,7 @@ import org.apache.spark.sql.errors.QueryCompilationErrors /** * A virtual rule to resolve [[UnresolvedAttribute]] in [[UpdateTable]]. It's only used by the real * rule `ResolveReferences`. The column resolution order for [[UpdateTable]] is: - * 1. Resolves the column to `AttributeReference`` with the output of the child plan. This + * 1. Resolves the column to `AttributeReference` with the output of the child plan. This *includes metadata columns as well. * 2. Resolves the column to a literal function which is allowed to be invoked without braces, e.g. *`SELECT col, current_date FROM t`. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.5 updated: [SPARK-44604][BUILD] Upgrade Netty to 4.1.96.Final
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new 7b68ccd1cb4 [SPARK-44604][BUILD] Upgrade Netty to 4.1.96.Final 7b68ccd1cb4 is described below commit 7b68ccd1cb48c38052b0458c5192d5ffcfc97409 Author: panbingkun AuthorDate: Tue Aug 1 08:55:27 2023 -0500 [SPARK-44604][BUILD] Upgrade Netty to 4.1.96.Final ### What changes were proposed in this pull request? The pr aims to upgrade Netty from 4.1.93.Final to 4.1.96.Final. ### Why are the changes needed? 1.Netty 4.1.93.Final VS 4.1.96.Final https://github.com/netty/netty/compare/netty-4.1.93.Final...netty-4.1.96.Final 2.Netty newest version Fix a possible security issue: ([CVE-2023-34462](https://github.com/netty/netty/security/advisories/GHSA-6mjq-h674-j845)) when using SniHandler. 3.Netty full release notes: https://netty.io/news/2023/07/27/4-1-96-Final.html https://netty.io/news/2023/07/20/4-1-95-Final.html https://netty.io/news/2023/06/19/4-1-94-Final.html ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. Closes #42232 from panbingkun/SPARK-44604. Authored-by: panbingkun Signed-off-by: Sean Owen (cherry picked from commit 8053d5f16541edb8e17cbc50684abae69187ff5a) Signed-off-by: Sean Owen --- dev/deps/spark-deps-hadoop-3-hive-2.3 | 36 +-- pom.xml | 6 +- 2 files changed, 19 insertions(+), 23 deletions(-) diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 b/dev/deps/spark-deps-hadoop-3-hive-2.3 index beae2232202..566f7c9a3ea 100644 --- a/dev/deps/spark-deps-hadoop-3-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3-hive-2.3 @@ -183,24 +183,24 @@ metrics-jmx/4.2.19//metrics-jmx-4.2.19.jar metrics-json/4.2.19//metrics-json-4.2.19.jar metrics-jvm/4.2.19//metrics-jvm-4.2.19.jar minlog/1.3.0//minlog-1.3.0.jar -netty-all/4.1.93.Final//netty-all-4.1.93.Final.jar -netty-buffer/4.1.93.Final//netty-buffer-4.1.93.Final.jar -netty-codec-http/4.1.93.Final//netty-codec-http-4.1.93.Final.jar -netty-codec-http2/4.1.93.Final//netty-codec-http2-4.1.93.Final.jar -netty-codec-socks/4.1.93.Final//netty-codec-socks-4.1.93.Final.jar -netty-codec/4.1.93.Final//netty-codec-4.1.93.Final.jar -netty-common/4.1.93.Final//netty-common-4.1.93.Final.jar -netty-handler-proxy/4.1.93.Final//netty-handler-proxy-4.1.93.Final.jar -netty-handler/4.1.93.Final//netty-handler-4.1.93.Final.jar -netty-resolver/4.1.93.Final//netty-resolver-4.1.93.Final.jar -netty-transport-classes-epoll/4.1.93.Final//netty-transport-classes-epoll-4.1.93.Final.jar -netty-transport-classes-kqueue/4.1.93.Final//netty-transport-classes-kqueue-4.1.93.Final.jar -netty-transport-native-epoll/4.1.93.Final/linux-aarch_64/netty-transport-native-epoll-4.1.93.Final-linux-aarch_64.jar -netty-transport-native-epoll/4.1.93.Final/linux-x86_64/netty-transport-native-epoll-4.1.93.Final-linux-x86_64.jar -netty-transport-native-kqueue/4.1.93.Final/osx-aarch_64/netty-transport-native-kqueue-4.1.93.Final-osx-aarch_64.jar -netty-transport-native-kqueue/4.1.93.Final/osx-x86_64/netty-transport-native-kqueue-4.1.93.Final-osx-x86_64.jar -netty-transport-native-unix-common/4.1.93.Final//netty-transport-native-unix-common-4.1.93.Final.jar -netty-transport/4.1.93.Final//netty-transport-4.1.93.Final.jar +netty-all/4.1.96.Final//netty-all-4.1.96.Final.jar +netty-buffer/4.1.96.Final//netty-buffer-4.1.96.Final.jar +netty-codec-http/4.1.96.Final//netty-codec-http-4.1.96.Final.jar +netty-codec-http2/4.1.96.Final//netty-codec-http2-4.1.96.Final.jar +netty-codec-socks/4.1.96.Final//netty-codec-socks-4.1.96.Final.jar +netty-codec/4.1.96.Final//netty-codec-4.1.96.Final.jar +netty-common/4.1.96.Final//netty-common-4.1.96.Final.jar +netty-handler-proxy/4.1.96.Final//netty-handler-proxy-4.1.96.Final.jar +netty-handler/4.1.96.Final//netty-handler-4.1.96.Final.jar +netty-resolver/4.1.96.Final//netty-resolver-4.1.96.Final.jar +netty-transport-classes-epoll/4.1.96.Final//netty-transport-classes-epoll-4.1.96.Final.jar +netty-transport-classes-kqueue/4.1.96.Final//netty-transport-classes-kqueue-4.1.96.Final.jar +netty-transport-native-epoll/4.1.96.Final/linux-aarch_64/netty-transport-native-epoll-4.1.96.Final-linux-aarch_64.jar +netty-transport-native-epoll/4.1.96.Final/linux-x86_64/netty-transport-native-epoll-4.1.96.Final-linux-x86_64.jar +netty-transport-native-kqueue/4.1.96.Final/osx-aarch_64/netty-transport-native-kqueue-4.1.96.Final-osx-aarch_64.jar +netty-transport-native-kqueue/4.1.96.Final/osx-x86_64/netty-transport-native-kqueue-4.1.96.Final-osx-x86_64.jar +netty-transport-native-unix-common/4.1.96.Final//netty-transport-native-unix-common-4.1.96.Final.jar +netty-transport/4.1.96.Final//netty-transport-4.1.96
[spark] branch master updated: [SPARK-44604][BUILD] Upgrade Netty to 4.1.96.Final
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 8053d5f1654 [SPARK-44604][BUILD] Upgrade Netty to 4.1.96.Final 8053d5f1654 is described below commit 8053d5f16541edb8e17cbc50684abae69187ff5a Author: panbingkun AuthorDate: Tue Aug 1 08:55:27 2023 -0500 [SPARK-44604][BUILD] Upgrade Netty to 4.1.96.Final ### What changes were proposed in this pull request? The pr aims to upgrade Netty from 4.1.93.Final to 4.1.96.Final. ### Why are the changes needed? 1.Netty 4.1.93.Final VS 4.1.96.Final https://github.com/netty/netty/compare/netty-4.1.93.Final...netty-4.1.96.Final 2.Netty newest version Fix a possible security issue: ([CVE-2023-34462](https://github.com/netty/netty/security/advisories/GHSA-6mjq-h674-j845)) when using SniHandler. 3.Netty full release notes: https://netty.io/news/2023/07/27/4-1-96-Final.html https://netty.io/news/2023/07/20/4-1-95-Final.html https://netty.io/news/2023/06/19/4-1-94-Final.html ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. Closes #42232 from panbingkun/SPARK-44604. Authored-by: panbingkun Signed-off-by: Sean Owen --- dev/deps/spark-deps-hadoop-3-hive-2.3 | 36 +-- pom.xml | 6 +- 2 files changed, 19 insertions(+), 23 deletions(-) diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 b/dev/deps/spark-deps-hadoop-3-hive-2.3 index 3b54ef43f6a..52a1d00f204 100644 --- a/dev/deps/spark-deps-hadoop-3-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3-hive-2.3 @@ -183,24 +183,24 @@ metrics-jmx/4.2.19//metrics-jmx-4.2.19.jar metrics-json/4.2.19//metrics-json-4.2.19.jar metrics-jvm/4.2.19//metrics-jvm-4.2.19.jar minlog/1.3.0//minlog-1.3.0.jar -netty-all/4.1.93.Final//netty-all-4.1.93.Final.jar -netty-buffer/4.1.93.Final//netty-buffer-4.1.93.Final.jar -netty-codec-http/4.1.93.Final//netty-codec-http-4.1.93.Final.jar -netty-codec-http2/4.1.93.Final//netty-codec-http2-4.1.93.Final.jar -netty-codec-socks/4.1.93.Final//netty-codec-socks-4.1.93.Final.jar -netty-codec/4.1.93.Final//netty-codec-4.1.93.Final.jar -netty-common/4.1.93.Final//netty-common-4.1.93.Final.jar -netty-handler-proxy/4.1.93.Final//netty-handler-proxy-4.1.93.Final.jar -netty-handler/4.1.93.Final//netty-handler-4.1.93.Final.jar -netty-resolver/4.1.93.Final//netty-resolver-4.1.93.Final.jar -netty-transport-classes-epoll/4.1.93.Final//netty-transport-classes-epoll-4.1.93.Final.jar -netty-transport-classes-kqueue/4.1.93.Final//netty-transport-classes-kqueue-4.1.93.Final.jar -netty-transport-native-epoll/4.1.93.Final/linux-aarch_64/netty-transport-native-epoll-4.1.93.Final-linux-aarch_64.jar -netty-transport-native-epoll/4.1.93.Final/linux-x86_64/netty-transport-native-epoll-4.1.93.Final-linux-x86_64.jar -netty-transport-native-kqueue/4.1.93.Final/osx-aarch_64/netty-transport-native-kqueue-4.1.93.Final-osx-aarch_64.jar -netty-transport-native-kqueue/4.1.93.Final/osx-x86_64/netty-transport-native-kqueue-4.1.93.Final-osx-x86_64.jar -netty-transport-native-unix-common/4.1.93.Final//netty-transport-native-unix-common-4.1.93.Final.jar -netty-transport/4.1.93.Final//netty-transport-4.1.93.Final.jar +netty-all/4.1.96.Final//netty-all-4.1.96.Final.jar +netty-buffer/4.1.96.Final//netty-buffer-4.1.96.Final.jar +netty-codec-http/4.1.96.Final//netty-codec-http-4.1.96.Final.jar +netty-codec-http2/4.1.96.Final//netty-codec-http2-4.1.96.Final.jar +netty-codec-socks/4.1.96.Final//netty-codec-socks-4.1.96.Final.jar +netty-codec/4.1.96.Final//netty-codec-4.1.96.Final.jar +netty-common/4.1.96.Final//netty-common-4.1.96.Final.jar +netty-handler-proxy/4.1.96.Final//netty-handler-proxy-4.1.96.Final.jar +netty-handler/4.1.96.Final//netty-handler-4.1.96.Final.jar +netty-resolver/4.1.96.Final//netty-resolver-4.1.96.Final.jar +netty-transport-classes-epoll/4.1.96.Final//netty-transport-classes-epoll-4.1.96.Final.jar +netty-transport-classes-kqueue/4.1.96.Final//netty-transport-classes-kqueue-4.1.96.Final.jar +netty-transport-native-epoll/4.1.96.Final/linux-aarch_64/netty-transport-native-epoll-4.1.96.Final-linux-aarch_64.jar +netty-transport-native-epoll/4.1.96.Final/linux-x86_64/netty-transport-native-epoll-4.1.96.Final-linux-x86_64.jar +netty-transport-native-kqueue/4.1.96.Final/osx-aarch_64/netty-transport-native-kqueue-4.1.96.Final-osx-aarch_64.jar +netty-transport-native-kqueue/4.1.96.Final/osx-x86_64/netty-transport-native-kqueue-4.1.96.Final-osx-x86_64.jar +netty-transport-native-unix-common/4.1.96.Final//netty-transport-native-unix-common-4.1.96.Final.jar +netty-transport/4.1.96.Final//netty-transport-4.1.96.Final.jar objenesis/3.3//objenesis-3.3.jar okhttp/3.12.12//okhttp-3.12.12.jar okio/1.15.0//okio-1.15.0.jar
[spark] branch branch-3.5 updated: [SPARK-44542][CORE] Eagerly load SparkExitCode class in exception handler
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new 47224b39f6c [SPARK-44542][CORE] Eagerly load SparkExitCode class in exception handler 47224b39f6c is described below commit 47224b39f6c937cadf5946870a4dc8d0dabdfa40 Author: Xianjin AuthorDate: Sun Jul 30 22:12:39 2023 -0500 [SPARK-44542][CORE] Eagerly load SparkExitCode class in exception handler ### What changes were proposed in this pull request? 1. eagerly load SparkExitCode class in the the SparkUncaughtExceptionHandler ### Why are the changes needed? In some extreme case, it's possible for SparkUncaughtExceptionHandler's exit/halt process function calls throw an exception if the SparkExitCode is not loaded earlier, See corresponding jira: [SPARK-44542](https://issues.apache.org/jira/browse/SPARK-44542) for more details. By eagerly load SparkExitCode class, we can make sure at least the halt/exit would work properly. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? No logic change, hence no new UTs. Closes #42195 from advancedxy/SPARK-44542. Authored-by: Xianjin Signed-off-by: Sean Owen (cherry picked from commit 32498b390db99c9451b14c643456437a023c0d93) Signed-off-by: Sean Owen --- .../scala/org/apache/spark/util/SparkUncaughtExceptionHandler.scala | 6 ++ 1 file changed, 6 insertions(+) diff --git a/core/src/main/scala/org/apache/spark/util/SparkUncaughtExceptionHandler.scala b/core/src/main/scala/org/apache/spark/util/SparkUncaughtExceptionHandler.scala index e7712875536..b24129eb369 100644 --- a/core/src/main/scala/org/apache/spark/util/SparkUncaughtExceptionHandler.scala +++ b/core/src/main/scala/org/apache/spark/util/SparkUncaughtExceptionHandler.scala @@ -28,6 +28,12 @@ import org.apache.spark.internal.Logging private[spark] class SparkUncaughtExceptionHandler(val exitOnUncaughtException: Boolean = true) extends Thread.UncaughtExceptionHandler with Logging { + locally { +// eagerly load SparkExitCode class, so the System.exit and runtime.halt have a chance to be +// executed when the disk containing Spark jars is corrupted. See SPARK-44542 for more details. +val _ = SparkExitCode.OOM + } + override def uncaughtException(thread: Thread, exception: Throwable): Unit = { try { // Make it explicit that uncaught exceptions are thrown when container is shutting down. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-44542][CORE] Eagerly load SparkExitCode class in exception handler
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 32498b390db [SPARK-44542][CORE] Eagerly load SparkExitCode class in exception handler 32498b390db is described below commit 32498b390db99c9451b14c643456437a023c0d93 Author: Xianjin AuthorDate: Sun Jul 30 22:12:39 2023 -0500 [SPARK-44542][CORE] Eagerly load SparkExitCode class in exception handler ### What changes were proposed in this pull request? 1. eagerly load SparkExitCode class in the the SparkUncaughtExceptionHandler ### Why are the changes needed? In some extreme case, it's possible for SparkUncaughtExceptionHandler's exit/halt process function calls throw an exception if the SparkExitCode is not loaded earlier, See corresponding jira: [SPARK-44542](https://issues.apache.org/jira/browse/SPARK-44542) for more details. By eagerly load SparkExitCode class, we can make sure at least the halt/exit would work properly. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? No logic change, hence no new UTs. Closes #42195 from advancedxy/SPARK-44542. Authored-by: Xianjin Signed-off-by: Sean Owen --- .../scala/org/apache/spark/util/SparkUncaughtExceptionHandler.scala | 6 ++ 1 file changed, 6 insertions(+) diff --git a/core/src/main/scala/org/apache/spark/util/SparkUncaughtExceptionHandler.scala b/core/src/main/scala/org/apache/spark/util/SparkUncaughtExceptionHandler.scala index e7712875536..b24129eb369 100644 --- a/core/src/main/scala/org/apache/spark/util/SparkUncaughtExceptionHandler.scala +++ b/core/src/main/scala/org/apache/spark/util/SparkUncaughtExceptionHandler.scala @@ -28,6 +28,12 @@ import org.apache.spark.internal.Logging private[spark] class SparkUncaughtExceptionHandler(val exitOnUncaughtException: Boolean = true) extends Thread.UncaughtExceptionHandler with Logging { + locally { +// eagerly load SparkExitCode class, so the System.exit and runtime.halt have a chance to be +// executed when the disk containing Spark jars is corrupted. See SPARK-44542 for more details. +val _ = SparkExitCode.OOM + } + override def uncaughtException(thread: Thread, exception: Throwable): Unit = { try { // Make it explicit that uncaught exceptions are thrown when container is shutting down. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.4 updated: [SPARK-44585][MLLIB] Fix warning condition in MLLib RankingMetrics ndcgAk
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.4 by this push: new f19a953b647 [SPARK-44585][MLLIB] Fix warning condition in MLLib RankingMetrics ndcgAk f19a953b647 is described below commit f19a953b6471673f89d689bea20e0d53026f7b5b Author: Guilhem Vuillier <101632595+guilhem-de...@users.noreply.github.com> AuthorDate: Fri Jul 28 17:29:47 2023 -0500 [SPARK-44585][MLLIB] Fix warning condition in MLLib RankingMetrics ndcgAk ### What changes were proposed in this pull request? This PR fixes the condition to raise the following warning in MLLib's RankingMetrics ndcgAk function: "# of ground truth set and # of relevance value set should be equal, check input data" The logic for raising warnings is faulty at the moment: it raises a warning if the `rel` input is empty and `lab.size` and `rel.size` are not equal. The logic should be to raise a warning if `rel` input is **not empty** and `lab.size` and `rel.size` are not equal. This warning was added in the following PR: https://github.com/apache/spark/pull/36843 ### Why are the changes needed? With the current logic, RankingMetrics will: - raise incorrect warning when a user is using it in the "binary" mode (i.e. no relevance values in the input) - not raise warning (that could be necessary) when the user is using it in the "non-binary" model (i.e. with relevance values in the input) ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? No change made to the test suite for RankingMetrics: https://github.com/uch/spark/blob/a172172329cc78b50f716924f2a344517deb71fc/mllib/src/test/scala/org/apache/spark/mllib/evaluation/RankingMetricsSuite.scala Closes #42207 from guilhem-depop/patch-1. Authored-by: Guilhem Vuillier <101632595+guilhem-de...@users.noreply.github.com> Signed-off-by: Sean Owen (cherry picked from commit 72af2c0fbc6673a5e49f1fd6693fe2c90141a84f) Signed-off-by: Sean Owen --- .../scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala b/mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala index 37e57736574..a3316d8a8fa 100644 --- a/mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala +++ b/mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala @@ -140,6 +140,9 @@ class RankingMetrics[T: ClassTag] @Since("1.2.0") (predictionAndLabels: RDD[_ <: * and the NDCG is obtained by dividing the DCG value on the ground truth set. In the current * implementation, the relevance value is binary if the relevance value is empty. + * If the relevance value is not empty but its size doesn't match the ground truth set size, + * a log warning is generated. + * * If a query has an empty ground truth set, zero will be used as ndcg together with * a log warning. * @@ -157,7 +160,7 @@ class RankingMetrics[T: ClassTag] @Since("1.2.0") (predictionAndLabels: RDD[_ <: val useBinary = rel.isEmpty val labSet = lab.toSet val relMap = Utils.toMap(lab, rel) - if (useBinary && lab.size != rel.size) { + if (!useBinary && lab.size != rel.size) { logWarning( "# of ground truth set and # of relevance value set should be equal, " + "check input data") - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.5 updated: [SPARK-44585][MLLIB] Fix warning condition in MLLib RankingMetrics ndcgAk
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new d0fa5a75d17 [SPARK-44585][MLLIB] Fix warning condition in MLLib RankingMetrics ndcgAk d0fa5a75d17 is described below commit d0fa5a75d17335e60aefbb554adb9b3fce1f97ff Author: Guilhem Vuillier <101632595+guilhem-de...@users.noreply.github.com> AuthorDate: Fri Jul 28 17:29:47 2023 -0500 [SPARK-44585][MLLIB] Fix warning condition in MLLib RankingMetrics ndcgAk ### What changes were proposed in this pull request? This PR fixes the condition to raise the following warning in MLLib's RankingMetrics ndcgAk function: "# of ground truth set and # of relevance value set should be equal, check input data" The logic for raising warnings is faulty at the moment: it raises a warning if the `rel` input is empty and `lab.size` and `rel.size` are not equal. The logic should be to raise a warning if `rel` input is **not empty** and `lab.size` and `rel.size` are not equal. This warning was added in the following PR: https://github.com/apache/spark/pull/36843 ### Why are the changes needed? With the current logic, RankingMetrics will: - raise incorrect warning when a user is using it in the "binary" mode (i.e. no relevance values in the input) - not raise warning (that could be necessary) when the user is using it in the "non-binary" model (i.e. with relevance values in the input) ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? No change made to the test suite for RankingMetrics: https://github.com/uch/spark/blob/a172172329cc78b50f716924f2a344517deb71fc/mllib/src/test/scala/org/apache/spark/mllib/evaluation/RankingMetricsSuite.scala Closes #42207 from guilhem-depop/patch-1. Authored-by: Guilhem Vuillier <101632595+guilhem-de...@users.noreply.github.com> Signed-off-by: Sean Owen (cherry picked from commit 72af2c0fbc6673a5e49f1fd6693fe2c90141a84f) Signed-off-by: Sean Owen --- .../scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala b/mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala index 37e57736574..a3316d8a8fa 100644 --- a/mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala +++ b/mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala @@ -140,6 +140,9 @@ class RankingMetrics[T: ClassTag] @Since("1.2.0") (predictionAndLabels: RDD[_ <: * and the NDCG is obtained by dividing the DCG value on the ground truth set. In the current * implementation, the relevance value is binary if the relevance value is empty. + * If the relevance value is not empty but its size doesn't match the ground truth set size, + * a log warning is generated. + * * If a query has an empty ground truth set, zero will be used as ndcg together with * a log warning. * @@ -157,7 +160,7 @@ class RankingMetrics[T: ClassTag] @Since("1.2.0") (predictionAndLabels: RDD[_ <: val useBinary = rel.isEmpty val labSet = lab.toSet val relMap = Utils.toMap(lab, rel) - if (useBinary && lab.size != rel.size) { + if (!useBinary && lab.size != rel.size) { logWarning( "# of ground truth set and # of relevance value set should be equal, " + "check input data") - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-44585][MLLIB] Fix warning condition in MLLib RankingMetrics ndcgAk
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 72af2c0fbc6 [SPARK-44585][MLLIB] Fix warning condition in MLLib RankingMetrics ndcgAk 72af2c0fbc6 is described below commit 72af2c0fbc6673a5e49f1fd6693fe2c90141a84f Author: Guilhem Vuillier <101632595+guilhem-de...@users.noreply.github.com> AuthorDate: Fri Jul 28 17:29:47 2023 -0500 [SPARK-44585][MLLIB] Fix warning condition in MLLib RankingMetrics ndcgAk ### What changes were proposed in this pull request? This PR fixes the condition to raise the following warning in MLLib's RankingMetrics ndcgAk function: "# of ground truth set and # of relevance value set should be equal, check input data" The logic for raising warnings is faulty at the moment: it raises a warning if the `rel` input is empty and `lab.size` and `rel.size` are not equal. The logic should be to raise a warning if `rel` input is **not empty** and `lab.size` and `rel.size` are not equal. This warning was added in the following PR: https://github.com/apache/spark/pull/36843 ### Why are the changes needed? With the current logic, RankingMetrics will: - raise incorrect warning when a user is using it in the "binary" mode (i.e. no relevance values in the input) - not raise warning (that could be necessary) when the user is using it in the "non-binary" model (i.e. with relevance values in the input) ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? No change made to the test suite for RankingMetrics: https://github.com/uch/spark/blob/a172172329cc78b50f716924f2a344517deb71fc/mllib/src/test/scala/org/apache/spark/mllib/evaluation/RankingMetricsSuite.scala Closes #42207 from guilhem-depop/patch-1. Authored-by: Guilhem Vuillier <101632595+guilhem-de...@users.noreply.github.com> Signed-off-by: Sean Owen --- .../scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala b/mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala index 37e57736574..a3316d8a8fa 100644 --- a/mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala +++ b/mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala @@ -140,6 +140,9 @@ class RankingMetrics[T: ClassTag] @Since("1.2.0") (predictionAndLabels: RDD[_ <: * and the NDCG is obtained by dividing the DCG value on the ground truth set. In the current * implementation, the relevance value is binary if the relevance value is empty. + * If the relevance value is not empty but its size doesn't match the ground truth set size, + * a log warning is generated. + * * If a query has an empty ground truth set, zero will be used as ndcg together with * a log warning. * @@ -157,7 +160,7 @@ class RankingMetrics[T: ClassTag] @Since("1.2.0") (predictionAndLabels: RDD[_ <: val useBinary = rel.isEmpty val labSet = lab.toSet val relMap = Utils.toMap(lab, rel) - if (useBinary && lab.size != rel.size) { + if (!useBinary && lab.size != rel.size) { logWarning( "# of ground truth set and # of relevance value set should be equal, " + "check input data") - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [MINOR][DOCS] fix: some minor typos
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 921fb289f00 [MINOR][DOCS] fix: some minor typos 921fb289f00 is described below commit 921fb289f003317d89120faa6937e4abd359195c Author: Eric Blanco AuthorDate: Thu Jul 27 08:53:54 2023 -0500 [MINOR][DOCS] fix: some minor typos ### What changes were proposed in this pull request? Change `the the` to `the` ### Why are the changes needed? To fix the typo ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Closes #42188 from ejblanco/docs/spark-typos. Authored-by: Eric Blanco Signed-off-by: Sean Owen --- .../spark/sql/connect/service/SparkConnectStreamingQueryCache.scala | 2 +- .../org/apache/spark/ui/static/vis-timeline-graph2d.min.js.map | 2 +- dev/connect-jvm-client-mima-check | 2 +- .../main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala| 2 +- .../scala/org/apache/spark/sql/catalyst/expressions/WindowTime.scala| 2 +- sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala | 2 +- 6 files changed, 6 insertions(+), 6 deletions(-) diff --git a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamingQueryCache.scala b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamingQueryCache.scala index 133686df018..87004242da9 100644 --- a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamingQueryCache.scala +++ b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamingQueryCache.scala @@ -84,7 +84,7 @@ private[connect] class SparkConnectStreamingQueryCache( /** * Returns [[StreamingQuery]] if it is cached and session matches the cached query. It ensures - * the the session associated with it matches the session passed into the call. If the query is + * the session associated with it matches the session passed into the call. If the query is * inactive (i.e. it has a cache expiry time set), this access extends its expiry time. So if a * client keeps accessing a query, it stays in the cache. */ diff --git a/core/src/main/resources/org/apache/spark/ui/static/vis-timeline-graph2d.min.js.map b/core/src/main/resources/org/apache/spark/ui/static/vis-timeline-graph2d.min.js.map index 95fdc523cf4..250b375e545 100644 --- a/core/src/main/resources/org/apache/spark/ui/static/vis-timeline-graph2d.min.js.map +++ b/core/src/main/resources/org/apache/spark/ui/static/vis-timeline-graph2d.min.js.map @@ -1 +1 @@ -{"version":3,"file":"vis-timeline-graph2d.min.js","sources":["../../node_modules/moment/locale/de.js","../../node_modules/moment/moment.js","../../node_modules/moment/locale/es.js","../../node_modules/moment/locale/fr.js","../../node_modules/moment/locale/it.js","../../node_modules/moment/locale/ja.js","../../node_modules/moment/locale/nl.js","../../node_modules/moment/locale/pl.js","../../node_modules/moment/locale/ru.js","../../node_modules/moment/locale/uk.js","../../node_modules/core [...] \ No newline at end of file +{"version":3,"file":"vis-timeline-graph2d.min.js","sources":["../../node_modules/moment/locale/de.js","../../node_modules/moment/moment.js","../../node_modules/moment/locale/es.js","../../node_modules/moment/locale/fr.js","../../node_modules/moment/locale/it.js","../../node_modules/moment/locale/ja.js","../../node_modules/moment/locale/nl.js","../../node_modules/moment/locale/pl.js","../../node_modules/moment/locale/ru.js","../../node_modules/moment/locale/uk.js","../../node_modules/core [...] \ No newline at end of file diff --git a/dev/connect-jvm-client-mima-check b/dev/connect-jvm-client-mima-check index ac4b95935b9..6a29cbf08ce 100755 --- a/dev/connect-jvm-client-mima-check +++ b/dev/connect-jvm-client-mima-check @@ -52,7 +52,7 @@ echo "finish connect-client-jvm module mima check ..." RESULT_SIZE=$(wc -l .connect-mima-check-result | awk '{print $1}') -# The the file has no content if check passed. +# The file has no content if check passed. if [[ $RESULT_SIZE -eq "0" ]]; then ERRORS="" else diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala index 3ece74a4d18..92e550ea941 100644 ---
[spark] branch branch-3.5 updated: [SPARK-44457][CONNECT][TESTS] Add `truncatedTo(ChronoUnit.MICROS)` to make `ArrowEncoderSuite` in Java 17 daily test GA task pass
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new 17fc3632f23 [SPARK-44457][CONNECT][TESTS] Add `truncatedTo(ChronoUnit.MICROS)` to make `ArrowEncoderSuite` in Java 17 daily test GA task pass 17fc3632f23 is described below commit 17fc3632f2344101f8318457e3f9d5f133913997 Author: yangjie01 AuthorDate: Wed Jul 26 19:17:40 2023 -0500 [SPARK-44457][CONNECT][TESTS] Add `truncatedTo(ChronoUnit.MICROS)` to make `ArrowEncoderSuite` in Java 17 daily test GA task pass ### What changes were proposed in this pull request? Similar to SPARK-42770 | https://github.com/apache/spark/pull/40395, this pr call `truncatedTo(ChronoUnit.MICROS)` on `Instant.now()` and `LocalDateTime.now()` to ensure microsecond accuracy is used in any environment. ### Why are the changes needed? Make Java 17 daily test GA task run successfully. The Java 17 daily test GA task failed now: https://github.com/apache/spark/actions/runs/5570003581/jobs/10173767006 ``` [info] - nullable fields *** FAILED *** (169 milliseconds) [info] NullableData(null, JANUARY, E1, null, 1.00, 2.00, null, 4, PT0S, null, 2023-07-16, 2023-07-16, null, 2023-07-16T23:01:54.059339Z, 2023-07-16T23:01:54.059359) did not equal NullableData(null, JANUARY, E1, null, 1.00, 2.00, null, 4, PT0S, null, 2023-07-16, 2023-07-16, null, 2023-07-16T23:01:54.059339538Z, 2023-07-16T23:01:54.059359638) (ArrowEncoderSuite.scala:194) [info] Analysis: [info] NullableData(instant: 2023-07-16T23:01:54.059339Z -> 2023-07-16T23:01:54.059339538Z, localDateTime: 2023-07-16T23:01:54.059359 -> 2023-07-16T23:01:54.059359638) [info] org.scalatest.exceptions.TestFailedException: ... [info] - lenient field serialization - timestamp/instant *** FAILED *** (26 milliseconds) [info] 2023-07-16T23:01:55.112838Z did not equal 2023-07-16T23:01:55.112838568Z (ArrowEncoderSuite.scala:194) [info] org.scalatest.exceptions.TestFailedException: ... ``` ### Does this PR introduce _any_ user-facing change? No, just for test ### How was this patch tested? - Pass GitHub Action - Git Hub Action test with Java 17 passed: https://github.com/LuciferYang/spark/actions/runs/5647253889/job/15297009685 https://github.com/apache/spark/assets/1475305/27a4350a-9475-45e3-b39f-b0b1e8f14e92";> Closes #42039 from LuciferYang/ArrowEncoderSuite-Java17. Authored-by: yangjie01 Signed-off-by: Sean Owen (cherry picked from commit da359259b138864a52ea98a4e19c55e593a5a8fa) Signed-off-by: Sean Owen --- .../spark/sql/connect/client/arrow/ArrowEncoderSuite.scala| 11 --- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/arrow/ArrowEncoderSuite.scala b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/arrow/ArrowEncoderSuite.scala index 3f8ac1cb8d1..5c035a613fe 100644 --- a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/arrow/ArrowEncoderSuite.scala +++ b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/arrow/ArrowEncoderSuite.scala @@ -18,6 +18,7 @@ package org.apache.spark.sql.connect.client.arrow import java.math.BigInteger import java.time.{Duration, Period, ZoneOffset} +import java.time.temporal.ChronoUnit import java.util import java.util.{Collections, Objects} @@ -361,8 +362,10 @@ class ArrowEncoderSuite extends ConnectFunSuite with BeforeAndAfterAll { test("nullable fields") { val encoder = ScalaReflection.encoderFor[NullableData] -val instant = java.time.Instant.now() -val now = java.time.LocalDateTime.now() +// SPARK-44457: Similar to SPARK-42770, calling `truncatedTo(ChronoUnit.MICROS)` +// on `Instant.now()` and `LocalDateTime.now()` to ensure microsecond accuracy is used. +val instant = java.time.Instant.now().truncatedTo(ChronoUnit.MICROS) +val now = java.time.LocalDateTime.now().truncatedTo(ChronoUnit.MICROS) val today = java.time.LocalDate.now() roundTripAndCheckIdentical(encoder) { () => val maybeNull = MaybeNull(3) @@ -602,7 +605,9 @@ class ArrowEncoderSuite extends ConnectFunSuite with BeforeAndAfterAll { } test("lenient field serialization - timestamp/instant") { -val base = java.time.Instant.now() +// SPARK-44457: Similar to SPARK-42770, calling `truncatedTo(ChronoUnit.MICROS)` +// on `Instant.now()` to ensure microsecond accuracy is used. +val base = java.time.Instant.now().truncatedTo(ChronoUnit.MICROS) val instants = () => Itera
[spark] branch master updated: [SPARK-44457][CONNECT][TESTS] Add `truncatedTo(ChronoUnit.MICROS)` to make `ArrowEncoderSuite` in Java 17 daily test GA task pass
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new da359259b13 [SPARK-44457][CONNECT][TESTS] Add `truncatedTo(ChronoUnit.MICROS)` to make `ArrowEncoderSuite` in Java 17 daily test GA task pass da359259b13 is described below commit da359259b138864a52ea98a4e19c55e593a5a8fa Author: yangjie01 AuthorDate: Wed Jul 26 19:17:40 2023 -0500 [SPARK-44457][CONNECT][TESTS] Add `truncatedTo(ChronoUnit.MICROS)` to make `ArrowEncoderSuite` in Java 17 daily test GA task pass ### What changes were proposed in this pull request? Similar to SPARK-42770 | https://github.com/apache/spark/pull/40395, this pr call `truncatedTo(ChronoUnit.MICROS)` on `Instant.now()` and `LocalDateTime.now()` to ensure microsecond accuracy is used in any environment. ### Why are the changes needed? Make Java 17 daily test GA task run successfully. The Java 17 daily test GA task failed now: https://github.com/apache/spark/actions/runs/5570003581/jobs/10173767006 ``` [info] - nullable fields *** FAILED *** (169 milliseconds) [info] NullableData(null, JANUARY, E1, null, 1.00, 2.00, null, 4, PT0S, null, 2023-07-16, 2023-07-16, null, 2023-07-16T23:01:54.059339Z, 2023-07-16T23:01:54.059359) did not equal NullableData(null, JANUARY, E1, null, 1.00, 2.00, null, 4, PT0S, null, 2023-07-16, 2023-07-16, null, 2023-07-16T23:01:54.059339538Z, 2023-07-16T23:01:54.059359638) (ArrowEncoderSuite.scala:194) [info] Analysis: [info] NullableData(instant: 2023-07-16T23:01:54.059339Z -> 2023-07-16T23:01:54.059339538Z, localDateTime: 2023-07-16T23:01:54.059359 -> 2023-07-16T23:01:54.059359638) [info] org.scalatest.exceptions.TestFailedException: ... [info] - lenient field serialization - timestamp/instant *** FAILED *** (26 milliseconds) [info] 2023-07-16T23:01:55.112838Z did not equal 2023-07-16T23:01:55.112838568Z (ArrowEncoderSuite.scala:194) [info] org.scalatest.exceptions.TestFailedException: ... ``` ### Does this PR introduce _any_ user-facing change? No, just for test ### How was this patch tested? - Pass GitHub Action - Git Hub Action test with Java 17 passed: https://github.com/LuciferYang/spark/actions/runs/5647253889/job/15297009685 https://github.com/apache/spark/assets/1475305/27a4350a-9475-45e3-b39f-b0b1e8f14e92";> Closes #42039 from LuciferYang/ArrowEncoderSuite-Java17. Authored-by: yangjie01 Signed-off-by: Sean Owen --- .../spark/sql/connect/client/arrow/ArrowEncoderSuite.scala| 11 --- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/arrow/ArrowEncoderSuite.scala b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/arrow/ArrowEncoderSuite.scala index 3f8ac1cb8d1..5c035a613fe 100644 --- a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/arrow/ArrowEncoderSuite.scala +++ b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/arrow/ArrowEncoderSuite.scala @@ -18,6 +18,7 @@ package org.apache.spark.sql.connect.client.arrow import java.math.BigInteger import java.time.{Duration, Period, ZoneOffset} +import java.time.temporal.ChronoUnit import java.util import java.util.{Collections, Objects} @@ -361,8 +362,10 @@ class ArrowEncoderSuite extends ConnectFunSuite with BeforeAndAfterAll { test("nullable fields") { val encoder = ScalaReflection.encoderFor[NullableData] -val instant = java.time.Instant.now() -val now = java.time.LocalDateTime.now() +// SPARK-44457: Similar to SPARK-42770, calling `truncatedTo(ChronoUnit.MICROS)` +// on `Instant.now()` and `LocalDateTime.now()` to ensure microsecond accuracy is used. +val instant = java.time.Instant.now().truncatedTo(ChronoUnit.MICROS) +val now = java.time.LocalDateTime.now().truncatedTo(ChronoUnit.MICROS) val today = java.time.LocalDate.now() roundTripAndCheckIdentical(encoder) { () => val maybeNull = MaybeNull(3) @@ -602,7 +605,9 @@ class ArrowEncoderSuite extends ConnectFunSuite with BeforeAndAfterAll { } test("lenient field serialization - timestamp/instant") { -val base = java.time.Instant.now() +// SPARK-44457: Similar to SPARK-42770, calling `truncatedTo(ChronoUnit.MICROS)` +// on `Instant.now()` to ensure microsecond accuracy is used. +val base = java.time.Instant.now().truncatedTo(ChronoUnit.MICROS) val instants = () => Iterator.tabulate(10)(i => base.plusSeconds(i * i * 60)) val timestamps = () => instants().map(java.sql.T
[spark] branch master updated: [SPARK-44522][BUILD] Upgrade `scala-xml` to 2.2.0
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 43b753a3530 [SPARK-44522][BUILD] Upgrade `scala-xml` to 2.2.0 43b753a3530 is described below commit 43b753a3530bcfdad415765e1348136d70d8125d Author: yangjie01 AuthorDate: Wed Jul 26 19:11:00 2023 -0500 [SPARK-44522][BUILD] Upgrade `scala-xml` to 2.2.0 ### What changes were proposed in this pull request? This pr aims to upgrade `scala-xml` from 2.1.0 to 2.2.0. ### Why are the changes needed? The new version bring some bug fix like: - https://github.com/scala/scala-xml/pull/651 - https://github.com/scala/scala-xml/pull/677 The full release notes as follows: - https://github.com/scala/scala-xml/releases/tag/v2.2.0 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - Pass GitHub Actions - Checked Scala 2.13, all Scala test passed: https://github.com/LuciferYang/spark/runs/15278359785 Closes #42119 from LuciferYang/scala-xml-220. Authored-by: yangjie01 Signed-off-by: Sean Owen --- dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +- pom.xml | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 b/dev/deps/spark-deps-hadoop-3-hive-2.3 index 168b0b34787..3b54ef43f6a 100644 --- a/dev/deps/spark-deps-hadoop-3-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3-hive-2.3 @@ -229,7 +229,7 @@ scala-compiler/2.12.18//scala-compiler-2.12.18.jar scala-library/2.12.18//scala-library-2.12.18.jar scala-parser-combinators_2.12/2.3.0//scala-parser-combinators_2.12-2.3.0.jar scala-reflect/2.12.18//scala-reflect-2.12.18.jar -scala-xml_2.12/2.1.0//scala-xml_2.12-2.1.0.jar +scala-xml_2.12/2.2.0//scala-xml_2.12-2.2.0.jar shims/0.9.45//shims-0.9.45.jar slf4j-api/2.0.7//slf4j-api-2.0.7.jar snakeyaml-engine/2.6//snakeyaml-engine-2.6.jar diff --git a/pom.xml b/pom.xml index 5711dba04b9..2e9d1d2d8f3 100644 --- a/pom.xml +++ b/pom.xml @@ -1089,7 +1089,7 @@ org.scala-lang.modules scala-xml_${scala.binary.version} -2.1.0 +2.2.0 org.scala-lang - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-44441][BUILD] Upgrade `bcprov-jdk15on` and `bcpkix-jdk15on` to 1.70
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new aec34451297 [SPARK-1][BUILD] Upgrade `bcprov-jdk15on` and `bcpkix-jdk15on` to 1.70 aec34451297 is described below commit aec3445129789c5b1d768333bacf3f3e680d73a0 Author: yangjie01 AuthorDate: Sat Jul 15 12:17:07 2023 -0500 [SPARK-1][BUILD] Upgrade `bcprov-jdk15on` and `bcpkix-jdk15on` to 1.70 ### What changes were proposed in this pull request? This pr aims to upgrade `bcprov-jdk15on` and `bcpkix-jdk15on` from 1.60 to 1.70 ### Why are the changes needed? The new version fixed [CVE-2020-15522](https://github.com/bcgit/bc-java/wiki/CVE-2020-15522). The release notes as follows: - https://www.bouncycastle.org/releasenotes.html#r1rv70 ### Does this PR introduce _any_ user-facing change? No, just upgrade test dependency ### How was this patch tested? Pass Git Hub Actions Closes #42015 from LuciferYang/SPARK-1. Authored-by: yangjie01 Signed-off-by: Sean Owen --- pom.xml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pom.xml b/pom.xml index eac34643fc9..3c2107b1b00 100644 --- a/pom.xml +++ b/pom.xml @@ -214,7 +214,7 @@ 3.1.0 1.1.0 1.5.0 -1.60 +1.70 1.9.0
[spark] branch master updated: [MINOR][SS][DOCS] Fix typos in the Scaladoc and make the semantic of getCurrentWatermarkMs explicit
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new e5e2b914de6 [MINOR][SS][DOCS] Fix typos in the Scaladoc and make the semantic of getCurrentWatermarkMs explicit e5e2b914de6 is described below commit e5e2b914de6a498ae191bdb0d02308c5b6f13f15 Author: bartosz25 AuthorDate: Sat Jul 15 08:31:31 2023 -0500 [MINOR][SS][DOCS] Fix typos in the Scaladoc and make the semantic of getCurrentWatermarkMs explicit ### What changes were proposed in this pull request? Improve the code comments: 1. Rate micro-batch data source Scaladoc parameters aren't consistent with the options really supported by this data source. 2. The `getCurrentWatermarkMs` has a special semantic for the 1st micro-batch when the watermark is not set yet. IMO, it should return `Option[Long]`, hence `None` instead of `0` for the first micro-batch, but since it's a breaking change, I preferred to add a note on that instead. ### Why are the changes needed? 1. Avoid confusion while using the classes and methods. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? The tests weren't added because the change is only at the Scaladoc level. I affirm that the contribution is my original work and that I license the work to the project under the project's open source license. Closes #41988 from bartosz25/comments_fixes. Authored-by: bartosz25 Signed-off-by: Sean Owen --- .../sql/execution/streaming/sources/RatePerMicroBatchProvider.scala | 4 ++-- .../src/main/scala/org/apache/spark/sql/streaming/GroupState.scala | 5 + 2 files changed, 7 insertions(+), 2 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/sources/RatePerMicroBatchProvider.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/sources/RatePerMicroBatchProvider.scala index ccf8b0a7b92..41878a6a549 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/sources/RatePerMicroBatchProvider.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/sources/RatePerMicroBatchProvider.scala @@ -34,11 +34,11 @@ import org.apache.spark.sql.util.CaseInsensitiveStringMap * with 0L. * * This source supports the following options: - * - `rowsPerMicroBatch` (e.g. 100): How many rows should be generated per micro-batch. + * - `rowsPerBatch` (e.g. 100): How many rows should be generated per micro-batch. * - `numPartitions` (e.g. 10, default: Spark's default parallelism): The partition number for the *generated rows. * - `startTimestamp` (e.g. 1000, default: 0): starting value of generated time - * - `advanceMillisPerMicroBatch` (e.g. 1000, default: 1000): the amount of time being advanced in + * - `advanceMillisPerBatch` (e.g. 1000, default: 1000): the amount of time being advanced in *generated time on each micro-batch. * * Unlike `rate` data source, this data source provides a consistent set of input rows per diff --git a/sql/core/src/main/scala/org/apache/spark/sql/streaming/GroupState.scala b/sql/core/src/main/scala/org/apache/spark/sql/streaming/GroupState.scala index 2c8f1db74f8..f08a2fd3cc5 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/streaming/GroupState.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/streaming/GroupState.scala @@ -315,6 +315,11 @@ trait GroupState[S] extends LogicalGroupState[S] { * * @note In a streaming query, this can be called only when watermark is set before calling * `[map/flatMap]GroupsWithState`. In a batch query, this method always returns -1. + * @note The watermark gets propagated in the end of each query. As a result, this method will + * return 0 (1970-01-01T00:00:00) for the first micro-batch. If you use this value + * as a part of the timestamp set in the `setTimeoutTimestamp`, it may lead to the + * state expiring immediately in the next micro-batch, once the watermark gets the + * real value from your data. */ @throws[UnsupportedOperationException]( "if watermark has not been set before in [map|flatMap]GroupsWithState") - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-43389][SQL] Added a null check for lineSep option
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 9f07e4a747b [SPARK-43389][SQL] Added a null check for lineSep option 9f07e4a747b is described below commit 9f07e4a747b0e2a62b954db3c9be425c924da47a Author: Gurpreet Singh AuthorDate: Thu Jul 13 18:17:45 2023 -0500 [SPARK-43389][SQL] Added a null check for lineSep option ### What changes were proposed in this pull request? ### Why are the changes needed? - `spark.read.csv` throws `NullPointerException` when lineSep is set to None - More details about the issue here: https://issues.apache.org/jira/browse/SPARK-43389 ### Does this PR introduce _any_ user-facing change? ~~Users now should be able to explicitly set `lineSep` as `None` without getting an exception~~ After some discussion, it was decided to add a `require` check for `null` instead of letting it through. ### How was this patch tested? Tested the changes with a python script that explicitly sets `lineSep` to `None` ```python from pyspark.sql import SparkSession # Create a SparkSession spark = SparkSession.builder.appName("HelloWorld").getOrCreate() # Read CSV into a DataFrame df = spark.read.csv("/tmp/hello.csv", header=True, inferSchema=True, lineSep=None) # Also tested the following case when options are passed before invoking .csv #df = spark.read.option("lineSep", None).csv("/Users/gdhuper/Documents/tmp/hello.csv", header=True, inferSchema=True) # Show the DataFrame df.show() # Stop the SparkSession spark.stop() ``` Closes #41904 from gdhuper/gdhuper/SPARK-43389. Authored-by: Gurpreet Singh Signed-off-by: Sean Owen --- .../src/main/scala/org/apache/spark/sql/catalyst/csv/CSVOptions.scala| 1 + .../org/apache/spark/sql/execution/datasources/text/TextOptions.scala| 1 + 2 files changed, 2 insertions(+) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVOptions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVOptions.scala index 2b6b60fdf76..f4ad1f2f2e5 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVOptions.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVOptions.scala @@ -254,6 +254,7 @@ class CSVOptions( * A string between two consecutive JSON records. */ val lineSeparator: Option[String] = parameters.get(LINE_SEP).map { sep => +require(sep != null, "'lineSep' cannot be a null value.") require(sep.nonEmpty, "'lineSep' cannot be an empty string.") // Intentionally allow it up to 2 for Window's CRLF although multiple // characters have an issue with quotes. This is intentionally undocumented. diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/text/TextOptions.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/text/TextOptions.scala index f26f05cbe1c..468d58974ed 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/text/TextOptions.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/text/TextOptions.scala @@ -45,6 +45,7 @@ class TextOptions(@transient private val parameters: CaseInsensitiveMap[String]) val encoding: Option[String] = parameters.get(ENCODING) val lineSeparator: Option[String] = parameters.get(LINE_SEP).map { lineSep => +require(lineSep != null, s"'$LINE_SEP' cannot be a null value.") require(lineSep.nonEmpty, s"'$LINE_SEP' cannot be an empty string.") lineSep - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-44332][CORE][WEBUI] Fix the sorting error of Executor ID Column on Executors UI Page
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 9717d74d072 [SPARK-44332][CORE][WEBUI] Fix the sorting error of Executor ID Column on Executors UI Page 9717d74d072 is described below commit 9717d74d0726bd177b8d0f0cc2c9b0404f82dafc Author: panbingkun AuthorDate: Mon Jul 10 19:15:56 2023 -0500 [SPARK-44332][CORE][WEBUI] Fix the sorting error of Executor ID Column on Executors UI Page ### What changes were proposed in this pull request? The pr aims to fix the sorting error of `Executor ID` Column on `Executor Page`. ### Why are the changes needed? Fix UI Sort bug. PS: Can be reproduced using: sh bin/spark-shell --master "local-cluster[12,1,1024]" - Before patch Before - asc: https://github.com/apache/spark/assets/15246973/83648087-804a-4a62-8f3e-c748f46b95d7";> Before - desc: https://github.com/apache/spark/assets/15246973/b68547f3-af36-4e97-b922-7c3ffa3cbb30";> - After patch After - asc: https://github.com/apache/spark/assets/15246973/9fd40fc7-9b72-4a08-8e16-a89d9625a1a0";> After - desc: https://github.com/apache/spark/assets/15246973/11921083-30cc-46e9-a9f6-1fe9aecde1a7";> ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - Pass GA. - Manually test. Closes #41887 from panbingkun/align_executor_id. Authored-by: panbingkun Signed-off-by: Sean Owen --- .../org/apache/spark/ui/static/executorspage.js| 31 +++--- 1 file changed, 28 insertions(+), 3 deletions(-) diff --git a/core/src/main/resources/org/apache/spark/ui/static/executorspage.js b/core/src/main/resources/org/apache/spark/ui/static/executorspage.js index 520efbd6def..38dc446eaac 100644 --- a/core/src/main/resources/org/apache/spark/ui/static/executorspage.js +++ b/core/src/main/resources/org/apache/spark/ui/static/executorspage.js @@ -96,6 +96,32 @@ jQuery.extend(jQuery.fn.dataTableExt.oSort, { } }); +jQuery.extend( jQuery.fn.dataTableExt.oSort, { + "executor-id-asc": function ( a, b ) { +if ($.isNumeric(a) && $.isNumeric(b)) { + return parseFloat(a) - parseFloat(b); +} else if (!$.isNumeric(a) && $.isNumeric(b)) { + return -1; +} else if ($.isNumeric(a) && !$.isNumeric(b)) { + return 1; +} else { + return a.localeCompare(b); +} + }, + + "executor-id-desc": function ( a, b ) { +if ($.isNumeric(a) && $.isNumeric(b)) { + return parseFloat(b) - parseFloat(a); +} else if (!$.isNumeric(a) && $.isNumeric(b)) { + return 1; +} else if ($.isNumeric(a) && !$.isNumeric(b)) { + return -1; +} else { + return b.localeCompare(a); +} + } +}); + $(document).ajaxStop($.unblockUI); $(document).ajaxStart(function () { $.blockUI({message: 'Loading Executors Page...'}); @@ -403,9 +429,8 @@ $(document).ready(function () { "data": response, "columns": [ { - data: function (row, type) { -return type !== 'display' ? (isNaN(row.id) ? 0 : row.id ) : row.id; - } + data: "id", + type: "executor-id" }, {data: 'hostPort'}, { - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-44350][BUILD] Upgrade sbt to 1.9.2
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new f1ec99b10ca [SPARK-44350][BUILD] Upgrade sbt to 1.9.2 f1ec99b10ca is described below commit f1ec99b10caf85e95aec2ed4f1e0b55cc0bd6f11 Author: panbingkun AuthorDate: Mon Jul 10 13:29:33 2023 -0500 [SPARK-44350][BUILD] Upgrade sbt to 1.9.2 ### What changes were proposed in this pull request? The pr aims to upgrade sbt from 1.9.1 to 1.9.2. ### Why are the changes needed? 1.The new version brings bug fixed: - Let ++ fall back to a bincompat Scala version by eed3si9n in https://github.com/sbt/sbt/pull/7328 2.v1.9.1 VS v1.9.2 https://github.com/sbt/sbt/compare/v1.9.1...v1.9.2 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. Closes #41916 from panbingkun/upgrade_sbt_192. Authored-by: panbingkun Signed-off-by: Sean Owen --- dev/appveyor-install-dependencies.ps1 | 2 +- project/build.properties | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/dev/appveyor-install-dependencies.ps1 b/dev/appveyor-install-dependencies.ps1 index 6848d3af43d..3737382eb86 100644 --- a/dev/appveyor-install-dependencies.ps1 +++ b/dev/appveyor-install-dependencies.ps1 @@ -97,7 +97,7 @@ if (!(Test-Path $tools)) { # == SBT Push-Location $tools -$sbtVer = "1.9.1" +$sbtVer = "1.9.2" Start-FileDownload "https://github.com/sbt/sbt/releases/download/v$sbtVer/sbt-$sbtVer.zip"; "sbt.zip" # extract diff --git a/project/build.properties b/project/build.properties index f27c9c4c8cc..3eb34b94744 100644 --- a/project/build.properties +++ b/project/build.properties @@ -15,4 +15,4 @@ # limitations under the License. # # Please update the version in appveyor-install-dependencies.ps1 together. -sbt.version=1.9.1 +sbt.version=1.9.2 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-44257][BUILD] Update some maven plugins & scalafmt to newest version
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 05f5dccbd34 [SPARK-44257][BUILD] Update some maven plugins & scalafmt to newest version 05f5dccbd34 is described below commit 05f5dccbd34218c7d399228529853bdb1595f3a2 Author: panbingkun AuthorDate: Fri Jun 30 09:14:22 2023 -0500 [SPARK-44257][BUILD] Update some maven plugins & scalafmt to newest version ### What changes were proposed in this pull request? The pr aims to update some maven plugins & scalafmt to newest version, include: - maven-clean-plugin from 3.2.0 to 3.3.1 - maven-shade-plugin from 3.4.1 to 3.5.0 - scalafmt from 3.7.4 to 3.7.5 ### Why are the changes needed? 1.maven-clean-plugin https://github.com/apache/maven-clean-plugin/releases/tag/maven-clean-plugin-3.3.1 2.maven-shade-plugin https://github.com/apache/maven-shade-plugin/releases/tag/maven-shade-plugin-3.5.0 3.scalafmt https://github.com/scalameta/scalafmt/releases/tag/v3.7.5 Router: make sure to indent comments after lambda (https://github.com/scalameta/scalafmt/pull/3556) kitbellew Fix proposed version syntax (https://github.com/scalameta/scalafmt/pull/3555) JD557 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. Closes #41803 from panbingkun/SPARK-44257. Authored-by: panbingkun Signed-off-by: Sean Owen --- .../src/main/scala/org/apache/spark/sql/Dataset.scala| 16 +++- .../scala/org/apache/spark/sql/catalog/Catalog.scala | 7 +++ .../org/apache/spark/sql/internal/CatalogImpl.scala | 7 +++ dev/.scalafmt.conf | 2 +- pom.xml | 4 ++-- 5 files changed, 16 insertions(+), 20 deletions(-) diff --git a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala index eba425ce127..b959974dc30 100644 --- a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala +++ b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala @@ -535,7 +535,7 @@ class Dataset[T] private[sql] ( assert(result.schema.size == 1) // scalastyle:off println println(result.toArray.head) -// scalastyle:on println + // scalastyle:on println } } @@ -2214,10 +2214,9 @@ class Dataset[T] private[sql] ( * tied to this Spark application. * * Global temporary view is cross-session. Its lifetime is the lifetime of the Spark - * application, - * i.e. it will be automatically dropped when the application terminates. It's tied to a system - * preserved database `global_temp`, and we must use the qualified name to refer a global temp - * view, e.g. `SELECT * FROM global_temp.view1`. + * application, i.e. it will be automatically dropped when the application terminates. It's tied + * to a system preserved database `global_temp`, and we must use the qualified name to refer a + * global temp view, e.g. `SELECT * FROM global_temp.view1`. * * @throws AnalysisException * if the view name is invalid or already exists @@ -2235,10 +2234,9 @@ class Dataset[T] private[sql] ( * temporary view is tied to this Spark application. * * Global temporary view is cross-session. Its lifetime is the lifetime of the Spark - * application, - * i.e. it will be automatically dropped when the application terminates. It's tied to a system - * preserved database `global_temp`, and we must use the qualified name to refer a global temp - * view, e.g. `SELECT * FROM global_temp.view1`. + * application, i.e. it will be automatically dropped when the application terminates. It's tied + * to a system preserved database `global_temp`, and we must use the qualified name to refer a + * global temp view, e.g. `SELECT * FROM global_temp.view1`. * * @group basic * @since 3.4.0 diff --git a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/catalog/Catalog.scala b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/catalog/Catalog.scala index 268f162cbfa..11c3f4e3d18 100644 --- a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/catalog/Catalog.scala +++ b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/catalog/Catalog.scala @@ -543,10 +543,9 @@ abstract class Catalog { * cached before, then it will also be uncached. * * Global temporary view is cross-session. Its lifetime is the lifetime of the Spark - * application, - * i.e. it will be automatically dropped when the application te
[spark] branch master updated: [SPARK-41599] Memory leak in FileSystem.CACHE when submitting apps to secure cluster using InProcessLauncher
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 7971e1c6a7c [SPARK-41599] Memory leak in FileSystem.CACHE when submitting apps to secure cluster using InProcessLauncher 7971e1c6a7c is described below commit 7971e1c6a7c074c65829c2bdfad857a33e0a7a5d Author: Xieming LI AuthorDate: Fri Jun 30 08:20:04 2023 -0500 [SPARK-41599] Memory leak in FileSystem.CACHE when submitting apps to secure cluster using InProcessLauncher ### What changes were proposed in this pull request? Using `FileSystem.closeAllForUGI` to close the cache to prevent memory leak. ### Why are the changes needed? There seems to be a memory leak in FileSystem.CACHE when submitting apps to secure cluster using InProcessLauncher. For more detail, see [SPARK-41599](https://issues.apache.org/jira/browse/SPARK-41599) ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? I have tested the patch with my code which uses inProcessLauncher. Confirmed that the memory leak issue is mitigated. https://github.com/apache/spark/assets/4378066/cfdef4d3-cb43-464c-bb46-de60f3b91622";> I will be very helpful if I can have some feedback and I will add some test cases if required. Closes #41692 from risyomei/fix-SPARK-41599. Authored-by: Xieming LI Signed-off-by: Sean Owen --- core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala | 2 ++ .../apache/spark/deploy/security/HadoopDelegationTokenManager.scala | 4 2 files changed, 6 insertions(+) diff --git a/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala b/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala index 8f9477385e7..60253ed5fda 100644 --- a/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala +++ b/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala @@ -186,6 +186,8 @@ private[spark] class SparkSubmit extends Logging { } else { throw e } + } finally { +FileSystem.closeAllForUGI(proxyUser) } } } else { diff --git a/core/src/main/scala/org/apache/spark/deploy/security/HadoopDelegationTokenManager.scala b/core/src/main/scala/org/apache/spark/deploy/security/HadoopDelegationTokenManager.scala index 6ce195b6c7a..54a24927ded 100644 --- a/core/src/main/scala/org/apache/spark/deploy/security/HadoopDelegationTokenManager.scala +++ b/core/src/main/scala/org/apache/spark/deploy/security/HadoopDelegationTokenManager.scala @@ -26,6 +26,7 @@ import java.util.concurrent.{ScheduledExecutorService, TimeUnit} import scala.collection.mutable import org.apache.hadoop.conf.Configuration +import org.apache.hadoop.fs.FileSystem import org.apache.hadoop.security.{Credentials, UserGroupInformation} import org.apache.spark.SparkConf @@ -149,6 +150,9 @@ private[spark] class HadoopDelegationTokenManager( creds.addAll(newTokens) } }) + if(!currentUser.equals(freshUGI)) { +FileSystem.closeAllForUGI(freshUGI) + } } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (6590e7db521 -> a8ea35f7c2f)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 6590e7db521 [SPARK-44158][K8S] Remove unused `spark.kubernetes.executor.lostCheckmaxAttempts` add a8ea35f7c2f [SPARK-39740][UI] Upgrade vis timeline to 7.7.2 to fix CVE-2020-28487 No new revisions were added by this update. Summary of changes: .../org/apache/spark/ui/static/timeline-view.js| 40 ++- .../spark/ui/static/vis-timeline-graph2d.min.css | 3 +- .../ui/static/vis-timeline-graph2d.min.css.map | 1 + .../spark/ui/static/vis-timeline-graph2d.min.js| 57 ++ .../ui/static/vis-timeline-graph2d.min.js.map | 1 + dev/.rat-excludes | 2 + licenses-binary/LICENSE-vis-timeline.txt | 29 +-- licenses/LICENSE-vis-timeline.txt | 29 +-- 8 files changed, 100 insertions(+), 62 deletions(-) create mode 100644 core/src/main/resources/org/apache/spark/ui/static/vis-timeline-graph2d.min.css.map create mode 100644 core/src/main/resources/org/apache/spark/ui/static/vis-timeline-graph2d.min.js.map - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-44024][SQL] Change to use `map` when `unzip` only used to extract a single element
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new ad6cd60ca74 [SPARK-44024][SQL] Change to use `map` when `unzip` only used to extract a single element ad6cd60ca74 is described below commit ad6cd60ca7408018d8c6259597456e9c2fe8b376 Author: yangjie01 AuthorDate: Sun Jun 18 07:19:56 2023 -0500 [SPARK-44024][SQL] Change to use `map` when `unzip` only used to extract a single element ### What changes were proposed in this pull request? A minor code simplification, use `map` instead of `unzip` when `unzip` only used to extract a single element. ### Why are the changes needed? Code simplification ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass GitHub Actions Closes #41548 from LuciferYang/SPARK-44024. Lead-authored-by: yangjie01 Co-authored-by: YangJie Signed-off-by: Sean Owen --- .../scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala | 2 +- .../apache/spark/sql/execution/datasources/v2/CreateIndexExec.scala | 2 +- .../spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala | 4 ++-- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala index 568e3d30e34..c70dba01808 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala @@ -861,7 +861,7 @@ object ColumnPruning extends Rule[LogicalPlan] { val newProjects = e.projections.map { proj => proj.zip(e.output).filter { case (_, a) => newOutput.contains(a) -}.unzip._1 +}.map(_._1) } a.copy(child = Expand(newProjects, newOutput, grandChild)) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/CreateIndexExec.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/CreateIndexExec.scala index 20ccf991af6..8dac6737334 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/CreateIndexExec.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/CreateIndexExec.scala @@ -52,7 +52,7 @@ case class CreateIndexExec( } try { table.createIndex( -indexName, columns.unzip._1.toArray, colProperties, propertiesWithIndexType.asJava) +indexName, columns.map(_._1).toArray, colProperties, propertiesWithIndexType.asJava) } catch { case _: IndexAlreadyExistsException if ignoreIfExists => logWarning(s"Index $indexName already exists in table ${table.name}. Ignoring.") diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala index 49a6c7232ec..e58fe7844ab 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala @@ -192,11 +192,11 @@ object V2ScanRelationPushDown extends Rule[LogicalPlan] with PredicateHelper { val groupOutputMap = normalizedGroupingExpr.zipWithIndex.map { case (e, i) => AttributeReference(s"group_col_$i", e.dataType)() -> e } - val groupOutput = groupOutputMap.unzip._1 + val groupOutput = groupOutputMap.map(_._1) val aggOutputMap = finalAggExprs.zipWithIndex.map { case (e, i) => AttributeReference(s"agg_func_$i", e.dataType)() -> e } - val aggOutput = aggOutputMap.unzip._1 + val aggOutput = aggOutputMap.map(_._1) val newOutput = groupOutput ++ aggOutput val groupByExprToOutputOrdinal = mutable.HashMap.empty[Expression, Int] normalizedGroupingExpr.zipWithIndex.foreach { case (expr, ordinal) => - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-43179][FOLLOW-UP] Use the secret ByteBuffer instead of the String
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 45ad044042f [SPARK-43179][FOLLOW-UP] Use the secret ByteBuffer instead of the String 45ad044042f is described below commit 45ad044042f7f376c4c0234807a62179b680edae Author: Chandni Singh AuthorDate: Sun Jun 11 07:59:35 2023 -0500 [SPARK-43179][FOLLOW-UP] Use the secret ByteBuffer instead of the String ### What changes were proposed in this pull request? Introduced a bug with this change: https://github.com/apache/spark/pull/40843. To get the value that is persisted in db, we used to use `mapper.writeValueAsString(ByteBuffer)`. We changed it to `mapper.writeValueAsString(String)`. However, when we load from the db, it still uses `ByteBuffer secret = mapper.readValue(e.getValue(), ByteBuffer.class);` causing exceptions when the shuffle service is unable to recover the apps: ``` ERROR org.apache.spark.network.server.TransportRequestHandler: Error while invoking RpcHandler#receive() on RPC id 5764589675121231159 java.lang.RuntimeException: javax.security.sasl.SaslException: DIGEST-MD5: digest response format violation. Mismatched response. at org.sparkproject.guava.base.Throwables.propagate(Throwables.java:160) at org.apache.spark.network.sasl.SparkSaslServer.response(SparkSaslServer.java:121) at org.apache.spark.network.sasl.SaslRpcHandler.doAuthChallenge(Sas [...] ``` ### Why are the changes needed? It fixes the bug that was introduced with SPARK-43179 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? The existing UTs in the `YarnShuffleServiceSuite` were using empty password which masked the issue. Changed it to use a non-empty password. Closes #41502 from otterc/SPARK-43179-followup. Authored-by: Chandni Singh Signed-off-by: Sean Owen --- .../spark/network/yarn/YarnShuffleService.java | 4 +++- .../network/yarn/YarnShuffleServiceSuite.scala | 25 +- 2 files changed, 18 insertions(+), 11 deletions(-) diff --git a/common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java b/common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java index 578c1a19c40..b34ebf6e29b 100644 --- a/common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java +++ b/common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java @@ -440,7 +440,9 @@ public class YarnShuffleService extends AuxiliaryService { if (db != null && AppsWithRecoveryDisabled.isRecoveryEnabledForApp(appId)) { AppId fullId = new AppId(appId); byte[] key = dbAppKey(fullId); - byte[] value = mapper.writeValueAsString(shuffleSecret).getBytes(StandardCharsets.UTF_8); + ByteBuffer dbVal = metaInfo != null ? + JavaUtils.stringToBytes(shuffleSecret) : appServiceData; + byte[] value = mapper.writeValueAsString(dbVal).getBytes(StandardCharsets.UTF_8); db.put(key, value); } secretManager.registerApp(appId, shuffleSecret); diff --git a/resource-managers/yarn/src/test/scala/org/apache/spark/network/yarn/YarnShuffleServiceSuite.scala b/resource-managers/yarn/src/test/scala/org/apache/spark/network/yarn/YarnShuffleServiceSuite.scala index 3e78262a765..552cc98311e 100644 --- a/resource-managers/yarn/src/test/scala/org/apache/spark/network/yarn/YarnShuffleServiceSuite.scala +++ b/resource-managers/yarn/src/test/scala/org/apache/spark/network/yarn/YarnShuffleServiceSuite.scala @@ -71,6 +71,8 @@ abstract class YarnShuffleServiceSuite extends SparkFunSuite with Matchers { private[yarn] val SORT_MANAGER_WITH_MERGE_SHUFFLE_META_WithNoAttemptID = "org.apache.spark.shuffle.sort.SortShuffleManager:{\"mergeDir\": \"merge_manager\"}" private val DUMMY_BLOCK_DATA = "dummyBlockData".getBytes(StandardCharsets.UTF_8) + private val DUMMY_PASSWORD = "dummyPassword" + private val EMPTY_PASSWORD = "" private var recoveryLocalDir: File = _ protected var tempDir: File = _ @@ -191,7 +193,8 @@ abstract class YarnShuffleServiceSuite extends SparkFunSuite with Matchers { val app3Data = makeAppInfo("user", app3Id) s1.initializeApplication(app3Data) val app4Id = ApplicationId.newInstance(0, 4) -val app4Data = makeAppInfo("user", app4Id) +val app4Data = makeAppInfo("user", app4Id, metadataStorageDisabled = false, +authEnabled = true, DUMMY_PASSWORD) s1.initializeApplication(app4Data) val execStateFile = s1.registeredExecutorFile @@ -1038,15 +1041,15 @@ abstract class YarnShuffleServiceSuite exte
[spark] branch master updated (595ad30e625 -> 8ae95724721)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 595ad30e625 [SPARK-43911][SQL] Use toSet to deduplicate the iterator data to prevent the creation of large Array add 8ae95724721 [SPARK-43955][BUILD] Upgrade `scalafmt` from 3.7.3 to 3.7.4 No new revisions were added by this update. Summary of changes: dev/.scalafmt.conf | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org