from:"\"srowen\""

(spark-website) branch asf-site updated: Fix typo in downloads.md

2024-09-14 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/spark-website.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new e73fcd924f Fix typo in downloads.md
e73fcd924f is described below

commit e73fcd924f6e30a292053c85d52b1eba2c074d90
Author: asikeero <60272147+asike...@users.noreply.github.com>
AuthorDate: Sat Sep 14 19:25:00 2024 -0500

Fix typo in downloads.md

There seems to have been a small typo in the Docker section of downloads.



Author: asikeero <60272147+asike...@users.noreply.github.com>
Author: Eero Asikainen <60272147+asike...@users.noreply.github.com>

Closes #554 from asikeero/patch-1.
---
 downloads.md| 2 +-
 site/downloads.html | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/downloads.md b/downloads.md
index cc89ec8382..0d1af9d5c8 100644
--- a/downloads.md
+++ b/downloads.md
@@ -45,7 +45,7 @@ Spark artifacts are [hosted in Maven 
Central](https://search.maven.org/search?q=
 
 Spark docker images are available from Dockerhub under the accounts of both 
[The Apache Software Foundation](https://hub.docker.com/r/apache/spark/) and 
[Official Images](https://hub.docker.com/_/spark).
 
-Note that, these images contain non-ASF software and may be subject to 
different license terms. Please check their 
[Dockerfiles](https://github.com/apache/spark-docker) to verify whether to 
verify whether they are compatible with your deployment.
+Note that, these images contain non-ASF software and may be subject to 
different license terms. Please check their 
[Dockerfiles](https://github.com/apache/spark-docker) to verify whether they 
are compatible with your deployment.
 
 ### Release notes for stable releases
 
diff --git a/site/downloads.html b/site/downloads.html
index ddb50cd9bd..5541878a5c 100644
--- a/site/downloads.html
+++ b/site/downloads.html
@@ -198,7 +198,7 @@ version: 3.5.2
 
 Spark docker images are available from Dockerhub under the accounts of both 
https://hub.docker.com/r/apache/spark/";>The Apache Software 
Foundation and https://hub.docker.com/_/spark";>Official 
Images.
 
-Note that, these images contain non-ASF software and may be subject to 
different license terms. Please check their https://github.com/apache/spark-docker";>Dockerfiles to verify whether 
to verify whether they are compatible with your deployment.
+Note that, these images contain non-ASF software and may be subject to 
different license terms. Please check their https://github.com/apache/spark-docker";>Dockerfiles to verify whether 
they are compatible with your deployment.
 
 Release notes for stable 
releases
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark-website) branch asf-site updated: add dataflint to third party projects page

2024-09-14 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/spark-website.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new 55f231b067 add dataflint to third party projects page
55f231b067 is described below

commit 55f231b067c1e5e44fc1ead737a0bfd37a7d0327
Author: menishmueli 
AuthorDate: Sat Sep 14 19:11:52 2024 -0500

add dataflint to third party projects page

Added DataFlint(https://github.com/dataflint/spark) to the third party 
projects page

Generated site HTML with `bundle exec jekyll build` and tested it with 
`bundle exec jekyll serve`

Author: menishmueli 

Closes #538 from menishmueli/asf-site.
---
 site/third-party-projects.html | 1 +
 third-party-projects.md| 1 +
 2 files changed, 2 insertions(+)

diff --git a/site/third-party-projects.html b/site/third-party-projects.html
index cbb07d2506..24f92c5639 100644
--- a/site/third-party-projects.html
+++ b/site/third-party-projects.html
@@ -227,6 +227,7 @@ transforming, and analyzing genomic data using Apache 
Spark
 
 
   https://www.datamechanics.co/delight";>Data Mechanics 
Delight - Delight is a free, hosted, cross-platform Spark UI alternative 
backed by an open-source Spark agent. It features new metrics and 
visualizations to simplify Spark monitoring and performance tuning.
+  https://github.com/dataflint/spark";>DataFlint - DataFlint 
is A Spark UI replacement installed via an open-source library, which updates 
in real-time and alerts on performance issues
 
 
 Additional language bindings
diff --git a/third-party-projects.md b/third-party-projects.md
index e83ff1eadf..7d2f3feb26 100644
--- a/third-party-projects.md
+++ b/third-party-projects.md
@@ -70,6 +70,7 @@ transforming, and analyzing genomic data using Apache Spark
 Performance, monitoring, and debugging tools for Spark
 
 - https://www.datamechanics.co/delight";>Data Mechanics Delight - 
Delight is a free, hosted, cross-platform Spark UI alternative backed by an 
open-source Spark agent. It features new metrics and visualizations to simplify 
Spark monitoring and performance tuning.
+- https://github.com/dataflint/spark";>DataFlint - DataFlint is A 
Spark UI replacement installed via an open-source library, which updates in 
real-time and alerts on performance issues
 
 Additional language bindings
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark-website) branch asf-site updated: Update rexml per Github security warning

2024-07-30 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/spark-website.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new 6e602593f3 Update rexml per Github security warning
6e602593f3 is described below

commit 6e602593f3e6bd49151bb8eaa7da4faa427a751d
Author: Sean Owen 
AuthorDate: Tue Jul 30 10:57:38 2024 -0500

Update rexml per Github security warning

Author: Sean Owen 

Closes #540 from srowen/rexml.
---
 Gemfile.lock | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/Gemfile.lock b/Gemfile.lock
index f4dedba223..db9f15953f 100644
--- a/Gemfile.lock
+++ b/Gemfile.lock
@@ -48,11 +48,13 @@ GEM
 rb-fsevent (0.11.2)
 rb-inotify (0.10.1)
   ffi (~> 1.0)
-rexml (3.2.6)
+rexml (3.3.2)
+  strscan
 rouge (3.26.0)
 safe_yaml (1.0.5)
 sassc (2.4.0)
   ffi (~> 1.9)
+strscan (3.1.0)
 terminal-table (2.0.0)
   unicode-display_width (~> 1.1, >= 1.1.1)
 unicode-display_width (1.8.0)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark-website) branch asf-site updated: Patch 1

2024-06-05 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/spark-website.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new a3693ee235 Patch 1
a3693ee235 is described below

commit a3693ee2358fde320fc9000be3a9fbc84e1df959
Author: Stefan Krawczyk 
AuthorDate: Wed Jun 5 07:27:23 2024 -0500

Patch 1

This adds [Hamilton](https://github.com/DAGWorks-Inc/hamilton) to the list 
of libraries with integrations.

Hamilton has PySpark support (e.g. 
[examples](https://github.com/DAGWorks-Inc/hamilton/tree/main/examples/spark)) 
and this specific
functionality is utilized by several enterprises in production.

Author: Stefan Krawczyk 

Closes #520 from skrawcz/patch-1.
---
 site/third-party-projects.html | 1 +
 third-party-projects.md| 1 +
 2 files changed, 2 insertions(+)

diff --git a/site/third-party-projects.html b/site/third-party-projects.html
index f5e2dd2873..2629d07d38 100644
--- a/site/third-party-projects.html
+++ b/site/third-party-projects.html
@@ -153,6 +153,7 @@
   https://github.com/awslabs/python-deequ";>python-deequ - 
Measures data quality in large datasets
   https://github.com/datahub-project/datahub";>datahub - 
Metadata platform for the modern data stack
   https://github.com/dbt-labs/dbt-spark";>dbt-spark - Enables 
dbt to work with Apache Spark
+  https://github.com/DAGWorks-Inc/hamilton";>Hamilton - 
Enables one to declaratively describe PySpark transformations that helps keep 
code testable, modular, and logically visualizable.
 
 
 Connectors
diff --git a/third-party-projects.md b/third-party-projects.md
index e8b4b16c85..ed7e7b3353 100644
--- a/third-party-projects.md
+++ b/third-party-projects.md
@@ -18,6 +18,7 @@ This page tracks external software projects that supplement 
Apache Spark and add
 - [python-deequ](https://github.com/awslabs/python-deequ) - Measures data 
quality in large datasets
 - [datahub](https://github.com/datahub-project/datahub) - Metadata platform 
for the modern data stack
 - [dbt-spark](https://github.com/dbt-labs/dbt-spark) - Enables dbt to work 
with Apache Spark
+- [Hamilton](https://github.com/DAGWorks-Inc/hamilton) - Enables one to 
declaratively describe PySpark transformations that helps keep code testable, 
modular, and logically visualizable. 
 
 ## Connectors
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-46760][SQL][DOCS] Make the document of spark.sql.adaptive.coalescePartitions.parallelismFirst clearer

2024-02-03 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 9d4d41c43f1c [SPARK-46760][SQL][DOCS] Make the document of 
spark.sql.adaptive.coalescePartitions.parallelismFirst clearer
9d4d41c43f1c is described below

commit 9d4d41c43f1cb4cf724e0e27c1762df8bbdf2a54
Author: beliefer 
AuthorDate: Sat Feb 3 09:06:38 2024 -0600

[SPARK-46760][SQL][DOCS] Make the document of 
spark.sql.adaptive.coalescePartitions.parallelismFirst clearer

### What changes were proposed in this pull request?
This PR propose to make the document of 
`spark.sql.adaptive.coalescePartitions.parallelismFirst` clearer.

### Why are the changes needed?
The default value of 
`spark.sql.adaptive.coalescePartitions.parallelismFirst` is true, but the 
document contains the word `recommended to set this config to false and respect 
the configured target size`. It's very confused.

### Does this PR introduce _any_ user-facing change?
'Yes'.
The document is more clear.

### How was this patch tested?
N/A

### Was this patch authored or co-authored using generative AI tooling?
'No'.

Closes #44787 from beliefer/SPARK-46760.

Authored-by: beliefer 
Signed-off-by: Sean Owen 
---
 docs/sql-performance-tuning.md   | 2 +-
 .../src/main/scala/org/apache/spark/sql/internal/SQLConf.scala   | 5 +++--
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/docs/sql-performance-tuning.md b/docs/sql-performance-tuning.md
index 1dbe1bb7e1a2..25c22d660562 100644
--- a/docs/sql-performance-tuning.md
+++ b/docs/sql-performance-tuning.md
@@ -267,7 +267,7 @@ This feature coalesces the post shuffle partitions based on 
the map output stati
  
spark.sql.adaptive.coalescePartitions.parallelismFirst
  true
  
-   When true, Spark ignores the target size specified by 
spark.sql.adaptive.advisoryPartitionSizeInBytes (default 64MB) 
when coalescing contiguous shuffle partitions, and only respect the minimum 
partition size specified by 
spark.sql.adaptive.coalescePartitions.minPartitionSize (default 
1MB), to maximize the parallelism. This is to avoid performance regression when 
enabling adaptive query execution. It's recommended to set this config to false 
and respect th [...]
+   When true, Spark ignores the target size specified by 
spark.sql.adaptive.advisoryPartitionSizeInBytes (default 64MB) 
when coalescing contiguous shuffle partitions, and only respect the minimum 
partition size specified by 
spark.sql.adaptive.coalescePartitions.minPartitionSize (default 
1MB), to maximize the parallelism. This is to avoid performance regressions 
when enabling adaptive query execution. It's recommended to set this config to 
true on a busy clus [...]
  
  3.2.0

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
index d88cbed6b27d..1bff0ff1a350 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
@@ -713,8 +713,9 @@ object SQLConf {
 "shuffle partitions, but adaptively calculate the target size 
according to the default " +
 "parallelism of the Spark cluster. The calculated size is usually 
smaller than the " +
 "configured target size. This is to maximize the parallelism and avoid 
performance " +
-"regression when enabling adaptive query execution. It's recommended 
to set this config " +
-"to false and respect the configured target size.")
+"regressions when enabling adaptive query execution. It's recommended 
to set this " +
+"config to true on a busy cluster to make resource utilization more 
efficient (not many " +
+"small tasks).")
   .version("3.2.0")
   .booleanConf
   .createWithDefault(true)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-45110][BUILD] Upgrade rocksdbjni to 8.8.1

2024-02-01 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 1870de0b329a [SPARK-45110][BUILD] Upgrade rocksdbjni to 8.8.1
1870de0b329a is described below

commit 1870de0b329ac5ef35a331a653b4debd85eaa684
Author: panbingkun 
AuthorDate: Thu Feb 1 06:37:00 2024 -0600

[SPARK-45110][BUILD] Upgrade rocksdbjni to 8.8.1

### What changes were proposed in this pull request?
The pr aims to upgrade rocksdbjni from `8.3.2` to `8.8.1`.

Why version `8.8.1`? Because so far, `32` tests have been conducted based 
on version `8.6.7` or `8.8.1`, and no previous core issues have been found. The 
later versions have not been rigorously validated.

### Why are the changes needed?
1.The full release notes:
- https://github.com/facebook/rocksdb/releases/tag/v8.8.1
- https://github.com/facebook/rocksdb/releases/tag/v8.7.3
- https://github.com/facebook/rocksdb/releases/tag/v8.6.7
- https://github.com/facebook/rocksdb/releases/tag/v8.5.4
- https://github.com/facebook/rocksdb/releases/tag/v8.5.3
- https://github.com/facebook/rocksdb/releases/tag/v8.4.4
- https://github.com/facebook/rocksdb/releases/tag/v8.3.3

2.Bug Fixes, eg:
- Fixed a bug where compaction read under non direct IO still falls back to 
RocksDB internal prefetching after file system's prefetching returns non-OK 
status other than Status::NotSupported()
- Fix a bug with atomic_flush=true that can cause DB to stuck after a flush 
fails
- Fix a bug where if there is an error reading from offset 0 of a file from 
L1+ and that the file is not the first file in the sorted run, data can be lost 
in compaction and read/scan can return incorrect results.
- Fix a bug where iterator may return incorrect result for DeleteRange() 
users if there was an error reading from a file.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
- Pass GA.
- Manually test:
```
./build/mvn clean install -pl core -am 
-Dtest.exclude.tags=org.apache.spark.tags.ExtendedLevelDBTest -fn

```

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #43924 from panbingkun/upgrade_rocksdbjni.

Lead-authored-by: panbingkun 
Co-authored-by: panbingkun 
Signed-off-by: Sean Owen 
---
 dev/deps/spark-deps-hadoop-3-hive-2.3  |  2 +-
 pom.xml|  2 +-
 ...StoreBasicOperationsBenchmark-jdk21-results.txt | 70 ++---
 .../StateStoreBasicOperationsBenchmark-results.txt | 72 +++---
 4 files changed, 73 insertions(+), 73 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index fcb3350e5de2..e02733883642 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -239,7 +239,7 @@ parquet-jackson/1.13.1//parquet-jackson-1.13.1.jar
 pickle/1.3//pickle-1.3.jar
 py4j/0.10.9.7//py4j-0.10.9.7.jar
 remotetea-oncrpc/1.1.2//remotetea-oncrpc-1.1.2.jar
-rocksdbjni/8.3.2//rocksdbjni-8.3.2.jar
+rocksdbjni/8.8.1//rocksdbjni-8.8.1.jar
 scala-collection-compat_2.13/2.7.0//scala-collection-compat_2.13-2.7.0.jar
 scala-compiler/2.13.12//scala-compiler-2.13.12.jar
 scala-library/2.13.12//scala-library-2.13.12.jar
diff --git a/pom.xml b/pom.xml
index 6e118bb27f5a..2fc14a4cdede 100644
--- a/pom.xml
+++ b/pom.xml
@@ -677,7 +677,7 @@
   
 org.rocksdb
 rocksdbjni
-8.3.2
+8.8.1
   
   
 ${leveldbjni.group}
diff --git 
a/sql/core/benchmarks/StateStoreBasicOperationsBenchmark-jdk21-results.txt 
b/sql/core/benchmarks/StateStoreBasicOperationsBenchmark-jdk21-results.txt
index f92ae8668e16..c0d710873aed 100644
--- a/sql/core/benchmarks/StateStoreBasicOperationsBenchmark-jdk21-results.txt
+++ b/sql/core/benchmarks/StateStoreBasicOperationsBenchmark-jdk21-results.txt
@@ -6,33 +6,33 @@ OpenJDK 64-Bit Server VM 21.0.1+12-LTS on Linux 
5.15.0-1053-azure
 AMD EPYC 7763 64-Core Processor
 putting 1 rows (1 rows to overwrite - rate 100):  Best Time(ms)   Avg 
Time(ms)   Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
 
---
-In-memory5 
 6   0  1.8 541.4   1.0X
-RocksDB (trackTotalNumberOfRows: true)  40 
41   2  0.24023.4   0.1X
-RocksDB (trackTotalNumberOfRows: false) 15 
15   1  0.71452.5   0.4X
+In-m

(spark) branch master updated: [SPARK-46473][SQL] Reuse `getPartitionedFile` method

2024-01-31 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 223afea9960c [SPARK-46473][SQL] Reuse `getPartitionedFile` method
223afea9960c is described below

commit 223afea9960c7ef1a4c8654e043e860f6c248185
Author: huangxiaoping <1754789...@qq.com>
AuthorDate: Wed Jan 31 22:59:20 2024 -0600

[SPARK-46473][SQL] Reuse `getPartitionedFile` method

### What changes were proposed in this pull request?
Reuse `getPartitionedFile` method to reduce redundant code.

### Why are the changes needed?
Reduce redundant code.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #44437 from huangxiaopingRD/SPARK-46473.

Authored-by: huangxiaoping <1754789...@qq.com>
Signed-off-by: Sean Owen 
---
 .../apache/spark/sql/execution/DataSourceScanExec.scala|  2 +-
 .../apache/spark/sql/execution/PartitionedFileUtil.scala   | 14 +++---
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala
index b3b2b0eab055..2622eadaefb3 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala
@@ -645,7 +645,7 @@ case class FileSourceScanExec(
 logInfo(s"Planning with ${bucketSpec.numBuckets} buckets")
 val filesGroupedToBuckets =
   selectedPartitions.flatMap { p =>
-p.files.map(f => PartitionedFileUtil.getPartitionedFile(f, p.values))
+p.files.map(f => PartitionedFileUtil.getPartitionedFile(f, p.values, 
0, f.getLen))
   }.groupBy { f =>
 BucketingUtils
   .getBucketId(f.toPath.getName)
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/PartitionedFileUtil.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/PartitionedFileUtil.scala
index b31369b6768e..997859058de1 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/PartitionedFileUtil.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/PartitionedFileUtil.scala
@@ -33,20 +33,20 @@ object PartitionedFileUtil {
   (0L until file.getLen by maxSplitBytes).map { offset =>
 val remaining = file.getLen - offset
 val size = if (remaining > maxSplitBytes) maxSplitBytes else remaining
-val hosts = getBlockHosts(getBlockLocations(file.fileStatus), offset, 
size)
-PartitionedFile(partitionValues, SparkPath.fromPath(file.getPath), 
offset, size, hosts,
-  file.getModificationTime, file.getLen, file.metadata)
+getPartitionedFile(file, partitionValues, offset, size)
   }
 } else {
-  Seq(getPartitionedFile(file, partitionValues))
+  Seq(getPartitionedFile(file, partitionValues, 0, file.getLen))
 }
   }
 
   def getPartitionedFile(
   file: FileStatusWithMetadata,
-  partitionValues: InternalRow): PartitionedFile = {
-val hosts = getBlockHosts(getBlockLocations(file.fileStatus), 0, 
file.getLen)
-PartitionedFile(partitionValues, SparkPath.fromPath(file.getPath), 0, 
file.getLen, hosts,
+  partitionValues: InternalRow,
+  start: Long,
+  length: Long): PartitionedFile = {
+val hosts = getBlockHosts(getBlockLocations(file.fileStatus), start, 
length)
+PartitionedFile(partitionValues, SparkPath.fromPath(file.getPath), start, 
length, hosts,
   file.getModificationTime, file.getLen, file.metadata)
   }
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-46929][CORE][CONNECT][SS] Use ThreadUtils.shutdown to close thread pools

2024-01-31 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 262ed5bcab0b [SPARK-46929][CORE][CONNECT][SS] Use ThreadUtils.shutdown 
to close thread pools
262ed5bcab0b is described below

commit 262ed5bcab0ba750b089b0693dbb1a59ef6fd11f
Author: beliefer 
AuthorDate: Wed Jan 31 09:52:19 2024 -0600

[SPARK-46929][CORE][CONNECT][SS] Use ThreadUtils.shutdown to close thread 
pools

### What changes were proposed in this pull request?
This PR propose use `ThreadUtils.shutdown` to close thread pools.

### Why are the changes needed?
`ThreadUtils` provided the `shutdown` to close thread pools. `ThreadUtils` 
wraps common logic to shutdown thread pools.
We should use `ThreadUtils.shutdown` to close the thread pool.

### Does this PR introduce _any_ user-facing change?
'No'.

### How was this patch tested?
GA

### Was this patch authored or co-authored using generative AI tooling?
'No'.

Closes #44962 from beliefer/SPARK-46929.

Authored-by: beliefer 
Signed-off-by: Sean Owen 
---
 .../sql/connect/service/SparkConnectExecutionManager.scala  |  5 +++--
 .../sql/connect/service/SparkConnectSessionManager.scala|  5 +++--
 .../connect/service/SparkConnectStreamingQueryCache.scala   |  9 +++--
 .../scala/org/apache/spark/ExecutorAllocationManager.scala  |  4 ++--
 .../org/apache/spark/status/ElementTrackingStore.scala  |  6 ++
 .../main/scala/org/apache/spark/streaming/Checkpoint.scala  | 12 +---
 .../org/apache/spark/streaming/scheduler/JobScheduler.scala | 13 ++---
 7 files changed, 24 insertions(+), 30 deletions(-)

diff --git 
a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectExecutionManager.scala
 
b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectExecutionManager.scala
index c90f53ac07df..85fb150b3171 100644
--- 
a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectExecutionManager.scala
+++ 
b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectExecutionManager.scala
@@ -21,6 +21,7 @@ import java.util.concurrent.{Executors, 
ScheduledExecutorService, TimeUnit}
 import javax.annotation.concurrent.GuardedBy
 
 import scala.collection.mutable
+import scala.concurrent.duration.FiniteDuration
 import scala.jdk.CollectionConverters._
 import scala.util.control.NonFatal
 
@@ -30,6 +31,7 @@ import org.apache.spark.{SparkEnv, SparkSQLException}
 import org.apache.spark.connect.proto
 import org.apache.spark.internal.Logging
 import 
org.apache.spark.sql.connect.config.Connect.{CONNECT_EXECUTE_MANAGER_ABANDONED_TOMBSTONES_SIZE,
 CONNECT_EXECUTE_MANAGER_DETACHED_TIMEOUT, 
CONNECT_EXECUTE_MANAGER_MAINTENANCE_INTERVAL}
+import org.apache.spark.util.ThreadUtils
 
 // Unique key identifying execution by combination of user, session and 
operation id
 case class ExecuteKey(userId: String, sessionId: String, operationId: String)
@@ -167,8 +169,7 @@ private[connect] class SparkConnectExecutionManager() 
extends Logging {
 
   private[connect] def shutdown(): Unit = executionsLock.synchronized {
 scheduledExecutor.foreach { executor =>
-  executor.shutdown()
-  executor.awaitTermination(1, TimeUnit.MINUTES)
+  ThreadUtils.shutdown(executor, FiniteDuration(1, TimeUnit.MINUTES))
 }
 scheduledExecutor = None
 // note: this does not cleanly shut down the executions, but the server is 
shutting down.
diff --git 
a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectSessionManager.scala
 
b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectSessionManager.scala
index ef14cd305d40..4da728b95a33 100644
--- 
a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectSessionManager.scala
+++ 
b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectSessionManager.scala
@@ -22,6 +22,7 @@ import java.util.concurrent.{Executors, 
ScheduledExecutorService, TimeUnit}
 import javax.annotation.concurrent.GuardedBy
 
 import scala.collection.mutable
+import scala.concurrent.duration.FiniteDuration
 import scala.jdk.CollectionConverters._
 import scala.util.control.NonFatal
 
@@ -31,6 +32,7 @@ import org.apache.spark.{SparkEnv, SparkSQLException}
 import org.apache.spark.internal.Logging
 import org.apache.spark.sql.SparkSession
 import 
org.apache.spark.sql.connect.config.Connect.{CONNECT_SESSION_MANAGER_CLOSED_SESSIONS_TOMBSTONES_SIZE,
 CONNECT_SESSION_MANAGER_DEFAULT_SESSION_TIMEOUT, 
CONNECT_SESSION_MANAGER_MAINTENANCE_INTERVAL}
+import org.apache.spark.util.ThreadUtils
 
 /**
  * Global tracke

(spark) branch master updated: [SPARK-46400][CORE][SQL] When there are corrupted files in the local maven repo, skip this cache and try again

2024-01-31 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new f2a471e9cc75 [SPARK-46400][CORE][SQL] When there are corrupted files 
in the local maven repo, skip this cache and try again
f2a471e9cc75 is described below

commit f2a471e9cc752f3826232eedc9025fd156a85965
Author: panbingkun 
AuthorDate: Wed Jan 31 09:46:07 2024 -0600

[SPARK-46400][CORE][SQL] When there are corrupted files in the local maven 
repo, skip this cache and try again

### What changes were proposed in this pull request?
The pr aims to
- fix potential bug(ie: https://github.com/apache/spark/pull/44208) and 
enhance user experience.
- make the code more compliant with standards

### Why are the changes needed?
We use the local maven repo as the first-level cache in ivy.  The original 
intention was to reduce the time required to parse and obtain the ar, but when 
there are corrupted files in the local maven repo,The above mechanism will be 
directly interrupted and the prompt is very unfriendly, which will greatly 
confuse the user.  Based on the original intention, we should skip the cache 
directly in similar situations.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Manually test.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #44343 from panbingkun/SPARK-46400.

Authored-by: panbingkun 
Signed-off-by: Sean Owen 
---
 .../scala/org/apache/spark/util/MavenUtils.scala   | 147 +++--
 .../sql/hive/client/IsolatedClientLoader.scala |   4 +
 2 files changed, 112 insertions(+), 39 deletions(-)

diff --git a/common/utils/src/main/scala/org/apache/spark/util/MavenUtils.scala 
b/common/utils/src/main/scala/org/apache/spark/util/MavenUtils.scala
index 2d7fba6f07d5..65530b7fa473 100644
--- a/common/utils/src/main/scala/org/apache/spark/util/MavenUtils.scala
+++ b/common/utils/src/main/scala/org/apache/spark/util/MavenUtils.scala
@@ -27,7 +27,7 @@ import org.apache.ivy.Ivy
 import org.apache.ivy.core.LogOptions
 import org.apache.ivy.core.module.descriptor.{Artifact, 
DefaultDependencyDescriptor, DefaultExcludeRule, DefaultModuleDescriptor, 
ExcludeRule}
 import org.apache.ivy.core.module.id.{ArtifactId, ModuleId, ModuleRevisionId}
-import org.apache.ivy.core.report.ResolveReport
+import org.apache.ivy.core.report.{DownloadStatus, ResolveReport}
 import org.apache.ivy.core.resolve.ResolveOptions
 import org.apache.ivy.core.retrieve.RetrieveOptions
 import org.apache.ivy.core.settings.IvySettings
@@ -43,8 +43,8 @@ import org.apache.spark.util.ArrayImplicits._
 private[spark] object MavenUtils extends Logging {
   val JAR_IVY_SETTING_PATH_KEY: String = "spark.jars.ivySettings"
 
-//  // Exposed for testing
-//  var printStream = SparkSubmit.printStream
+  // Exposed for testing
+  // var printStream = SparkSubmit.printStream
 
   // Exposed for testing.
   // These components are used to make the default exclusion rules for Spark 
dependencies.
@@ -113,7 +113,7 @@ private[spark] object MavenUtils extends Logging {
 splits(2) != null && splits(2).trim.nonEmpty,
 s"The version cannot be null or " +
   s"be whitespace. The version provided is: ${splits(2)}")
-  new MavenCoordinate(splits(0), splits(1), splits(2))
+  MavenCoordinate(splits(0), splits(1), splits(2))
 }.toImmutableArraySeq
   }
 
@@ -128,24 +128,30 @@ private[spark] object MavenUtils extends Logging {
   }
 
   /**
-   * Extracts maven coordinates from a comma-delimited string
+   * Create a ChainResolver used by Ivy to search for and resolve dependencies.
*
* @param defaultIvyUserDir
*   The default user path for Ivy
+   * @param useLocalM2AsCache
+   *   Whether to use the local maven repo as a cache
* @return
*   A ChainResolver used by Ivy to search for and resolve dependencies.
*/
-  private[util] def createRepoResolvers(defaultIvyUserDir: File): 
ChainResolver = {
+  private[util] def createRepoResolvers(
+  defaultIvyUserDir: File,
+  useLocalM2AsCache: Boolean = true): ChainResolver = {
 // We need a chain resolver if we want to check multiple repositories
 val cr = new ChainResolver
 cr.setName("spark-list")
 
-val localM2 = new IBiblioResolver
-localM2.setM2compatible(true)
-localM2.setRoot(m2Path.toURI.toString)
-localM2.setUsepoms(true)
-localM2.setName("local-m2-cache")
-cr.add(localM2)
+if (useLocalM2AsCache) {
+  val localM2 = new IBiblioResolver
+  localM2.setM2compatible(true)
+  localM2.setRoot(m2Path.toURI.toString)
+  localM2.setUsepoms(true)
+  localM2.setName("local-m2-cache")
+  cr.add(l

(spark) branch master updated: [SPARK-45522][BUILD][CORE][SQL][UI] Migrate from Jetty 9 to Jetty 10

2024-01-31 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 6c19bf6b48e7 [SPARK-45522][BUILD][CORE][SQL][UI] Migrate from Jetty 9 
to Jetty 10
6c19bf6b48e7 is described below

commit 6c19bf6b48e7e2ab9937dc2d91ea23dd83abae64
Author: HiuFung Kwok 
AuthorDate: Wed Jan 31 09:42:16 2024 -0600

[SPARK-45522][BUILD][CORE][SQL][UI] Migrate from Jetty 9 to Jetty 10

### What changes were proposed in this pull request?

This is an upgrade ticket to bump the Jetty version from 9 to 10.
This PR aims to bring incremental Jetty upgrades to Spark, as Jetty 9 
support already reached EOL.

### Why are the changes needed?

Jetty 9 is already beyond EOL, which means that we won't receive any 
security fix onward for Spark.

### Does this PR introduce _any_ user-facing change?

No, SNI host check is now defaulted to true on embedded Jetty, hence set it 
back to false to maintain backward compatibility.
Despite the redirect behaviour changed for trailing /, but modern browser 
should be able to pick up the 302 status code and perform redirect accordingly, 
hence there is no impact on user level.

### How was this patch tested?

Junit test case.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #43765 from HiuKwok/ft-hf-SPARK-45522-jetty-upgradte.

Lead-authored-by: HiuFung Kwok 
Co-authored-by: HiuFung Kwok <37996731+hiuk...@users.noreply.github.com>
Signed-off-by: Sean Owen 
---
 LICENSE-binary |  1 -
 core/pom.xml   |  8 +---
 .../main/scala/org/apache/spark/SSLOptions.scala   |  2 +-
 .../main/scala/org/apache/spark/TestUtils.scala| 13 +
 .../scala/org/apache/spark/ui/JettyUtils.scala | 13 ++---
 .../test/scala/org/apache/spark/ui/UISuite.scala   | 22 +-
 dev/deps/spark-deps-hadoop-3-hive-2.3  |  4 ++--
 dev/test-dependencies.sh   |  2 +-
 pom.xml|  8 +---
 .../service/cli/thrift/ThriftHttpCLIService.java   | 12 ++--
 10 files changed, 52 insertions(+), 33 deletions(-)

diff --git a/LICENSE-binary b/LICENSE-binary
index c6f291f11088..2073d85246b6 100644
--- a/LICENSE-binary
+++ b/LICENSE-binary
@@ -368,7 +368,6 @@ xerces:xercesImpl
 org.codehaus.jackson:jackson-jaxrs
 org.codehaus.jackson:jackson-xc
 org.eclipse.jetty:jetty-client
-org.eclipse.jetty:jetty-continuation
 org.eclipse.jetty:jetty-http
 org.eclipse.jetty:jetty-io
 org.eclipse.jetty:jetty-jndi
diff --git a/core/pom.xml b/core/pom.xml
index c093213bd6b9..f780551fb555 100644
--- a/core/pom.xml
+++ b/core/pom.xml
@@ -146,11 +146,6 @@
   jetty-http
   compile
 
-
-  org.eclipse.jetty
-  jetty-continuation
-  compile
-
 
   org.eclipse.jetty
   jetty-servlet
@@ -538,7 +533,7 @@
   true
   true
   
-
guava,protobuf-java,jetty-io,jetty-servlet,jetty-servlets,jetty-continuation,jetty-http,jetty-plus,jetty-util,jetty-server,jetty-security,jetty-proxy,jetty-client
+
guava,protobuf-java,jetty-io,jetty-servlet,jetty-servlets,jetty-http,jetty-plus,jetty-util,jetty-server,jetty-security,jetty-proxy,jetty-client
   
   true
 
@@ -558,7 +553,6 @@
   org.eclipse.jetty:jetty-http
   org.eclipse.jetty:jetty-proxy
   org.eclipse.jetty:jetty-client
-  org.eclipse.jetty:jetty-continuation
   org.eclipse.jetty:jetty-servlet
   org.eclipse.jetty:jetty-servlets
   org.eclipse.jetty:jetty-plus
diff --git a/core/src/main/scala/org/apache/spark/SSLOptions.scala 
b/core/src/main/scala/org/apache/spark/SSLOptions.scala
index 26108d885e4c..ce058cec2686 100644
--- a/core/src/main/scala/org/apache/spark/SSLOptions.scala
+++ b/core/src/main/scala/org/apache/spark/SSLOptions.scala
@@ -87,7 +87,7 @@ private[spark] case class SSLOptions(
   /**
* Creates a Jetty SSL context factory according to the SSL settings 
represented by this object.
*/
-  def createJettySslContextFactory(): Option[SslContextFactory] = {
+  def createJettySslContextFactoryServer(): Option[SslContextFactory.Server] = 
{
 if (enabled) {
   val sslContextFactory = new SslContextFactory.Server()
 
diff --git a/core/src/main/scala/org/apache/spark/TestUtils.scala 
b/core/src/main/scala/org/apache/spark/TestUtils.scala
index e85f98ff55c5..5e3078d7292b 100644
--- a/core/src/main/scala/org/apache/spark/TestUtils.scala
+++ b/core/src/main/scala/org/apache/spark/TestUtils.scala
@@ -252,6 +252,19 @@ private[spark] object TestUt

(spark) branch master updated: [MINOR][SQL] Use `DecimalType.MINIMUM_ADJUSTED_SCALE` instead of magic number `6` in `Divide` class

2024-01-31 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 0c7770f4de56 [MINOR][SQL] Use `DecimalType.MINIMUM_ADJUSTED_SCALE` 
instead of magic number `6` in `Divide` class
0c7770f4de56 is described below

commit 0c7770f4de560ad74e93b0902ab7a6be52c655be
Author: longfei.jiang <1251489...@qq.com>
AuthorDate: Wed Jan 31 09:40:07 2024 -0600

[MINOR][SQL] Use `DecimalType.MINIMUM_ADJUSTED_SCALE` instead of magic 
number `6` in `Divide` class

### What changes were proposed in this pull request?

Replace magic value `6` with constants `DecimalType.MINIMUM_ADJUSTED_SCALE`

### Why are the changes needed?

Magic values are less self-documenting than constant values.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Existing UT `ArithmeticExpressionSuite#"SPARK-45786: Decimal multiply, 
divide, remainder, quot"` can provide testing

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #44941 from jlfsdtc/magic_value.

Authored-by: longfei.jiang <1251489...@qq.com>
Signed-off-by: Sean Owen 
---
 .../scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala| 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
index a0fb17cec812..9f1b42ad84d3 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
@@ -810,7 +810,7 @@ case class Divide(
   DecimalType.adjustPrecisionScale(prec, scale)
 } else {
   var intDig = min(DecimalType.MAX_SCALE, p1 - s1 + s2)
-  var decDig = min(DecimalType.MAX_SCALE, max(6, s1 + p2 + 1))
+  var decDig = min(DecimalType.MAX_SCALE, 
max(DecimalType.MINIMUM_ADJUSTED_SCALE, s1 + p2 + 1))
   val diff = (intDig + decDig) - DecimalType.MAX_SCALE
   if (diff > 0) {
 decDig -= diff / 2 + 1


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-46100][CORE][PYTHON] Reduce stack depth by replace (string|array).size with (string|array).length

2023-11-25 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 7b58fffdeeb [SPARK-46100][CORE][PYTHON] Reduce stack depth by replace 
(string|array).size with (string|array).length
7b58fffdeeb is described below

commit 7b58fffdeeb70524e18ad80ea0aa53e2ac910e2a
Author: Jiaan Geng 
AuthorDate: Sat Nov 25 14:38:34 2023 -0600

[SPARK-46100][CORE][PYTHON] Reduce stack depth by replace 
(string|array).size with (string|array).length

### What changes were proposed in this pull request?
There are a lot of `[string|array].size` called.
In fact, the size calls the underlying length, this behavior increase the 
stack length.
We should call `[string|array].length` directly.
We also get the compile waring `Replace .size with .length on arrays and 
strings`

This PR just improve the core module.

### Why are the changes needed?
Reduce stack depth by replace (string|array).size with (string|array).length

### Does this PR introduce _any_ user-facing change?
'No'.

### How was this patch tested?
Exists test cases.

### Was this patch authored or co-authored using generative AI tooling?
'No'.

Closes #44011 from beliefer/SPARK-46100.

Authored-by: Jiaan Geng 
Signed-off-by: Sean Owen 
---
 .../org/apache/spark/api/python/PythonRunner.scala |  2 +-
 .../apache/spark/deploy/master/ui/MasterPage.scala |  4 +-
 .../apache/spark/executor/ExecutorMetrics.scala|  2 +-
 .../org/apache/spark/resource/ResourceUtils.scala  |  2 +-
 .../apache/spark/scheduler/TaskDescription.scala   |  2 +-
 .../apache/spark/scheduler/TaskSchedulerImpl.scala |  4 +-
 .../org/apache/spark/ui/ConsoleProgressBar.scala   |  2 +-
 .../org/apache/spark/util/HadoopFSUtils.scala  |  2 +-
 .../util/io/ChunkedByteBufferFileRegion.scala  |  2 +-
 .../scala/org/apache/spark/CheckpointSuite.scala   | 16 ++---
 .../scala/org/apache/spark/DistributedSuite.scala  | 16 ++---
 .../test/scala/org/apache/spark/FileSuite.scala|  2 +-
 .../org/apache/spark/MapOutputTrackerSuite.scala   |  4 +-
 .../scala/org/apache/spark/PartitioningSuite.scala |  4 +-
 .../test/scala/org/apache/spark/ShuffleSuite.scala |  2 +-
 .../spark/deploy/DecommissionWorkerSuite.scala |  2 +-
 .../org/apache/spark/deploy/SparkSubmitSuite.scala |  4 +-
 .../deploy/StandaloneDynamicAllocationSuite.scala  | 22 +++---
 .../spark/deploy/client/AppClientSuite.scala   |  6 +-
 .../deploy/history/FsHistoryProviderSuite.scala| 20 +++---
 .../deploy/rest/StandaloneRestSubmitSuite.scala|  2 +-
 .../input/WholeTextFileRecordReaderSuite.scala |  4 +-
 .../internal/plugin/PluginContainerSuite.scala |  4 +-
 .../apache/spark/rdd/AsyncRDDActionsSuite.scala|  2 +-
 .../apache/spark/rdd/LocalCheckpointSuite.scala|  2 +-
 .../apache/spark/rdd/PairRDDFunctionsSuite.scala   | 44 ++--
 .../scala/org/apache/spark/rdd/PipedRDDSuite.scala | 10 +--
 .../test/scala/org/apache/spark/rdd/RDDSuite.scala | 80 +++---
 .../scala/org/apache/spark/rdd/SortingSuite.scala  |  6 +-
 .../apache/spark/rdd/ZippedPartitionsSuite.scala   |  4 +-
 .../spark/resource/ResourceProfileSuite.scala  |  2 +-
 .../apache/spark/resource/ResourceUtilsSuite.scala |  6 +-
 .../apache/spark/scheduler/AQEShuffledRDD.scala|  2 +-
 .../CoarseGrainedSchedulerBackendSuite.scala   |  2 +-
 .../apache/spark/scheduler/DAGSchedulerSuite.scala | 32 -
 .../apache/spark/scheduler/MapStatusSuite.scala|  2 +-
 .../scheduler/OutputCommitCoordinatorSuite.scala   |  8 +--
 .../spark/scheduler/TaskSchedulerImplSuite.scala   | 12 ++--
 .../spark/scheduler/TaskSetManagerSuite.scala  |  4 +-
 .../KryoSerializerDistributedSuite.scala   |  2 +-
 .../sort/IndexShuffleBlockResolverSuite.scala  |  2 +-
 .../org/apache/spark/storage/DiskStoreSuite.scala  |  2 +-
 .../org/apache/spark/util/FileAppenderSuite.scala  |  4 +-
 .../spark/util/collection/SizeTrackerSuite.scala   |  2 +-
 44 files changed, 180 insertions(+), 180 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala 
b/core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala
index d6363182606..e6d5a750ea3 100644
--- a/core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala
+++ b/core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala
@@ -378,7 +378,7 @@ private[spark] abstract class BasePythonRunner[IN, OUT](
 resources.foreach { case (k, v) =>
   PythonRDD.writeUTF(k, dataOut)
   PythonRDD.writeUTF(v.name, dataOut)
-  dataOut.writeInt(v.addresses.size)
+  dataOut.writeInt(v.addresses.length)
   v.addresses.foreach { case addr =>
 PythonRDD.writeUTF(addr, dataOut)

(spark) branch master updated: [SPARK-45687][CORE][SQL][ML][MLLIB][KUBERNETES][EXAMPLES][CONNECT][STRUCTURED STREAMING] Fix `Passing an explicit array value to a Scala varargs method is deprecated`

2023-11-10 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 605aa0c299c 
[SPARK-45687][CORE][SQL][ML][MLLIB][KUBERNETES][EXAMPLES][CONNECT][STRUCTURED 
STREAMING] Fix `Passing an explicit array value to a Scala varargs method is 
deprecated`
605aa0c299c is described below

commit 605aa0c299c1d88f8a31ba888ac8e6b6203be6c5
Author: Tengfei Huang 
AuthorDate: Fri Nov 10 08:10:20 2023 -0600


[SPARK-45687][CORE][SQL][ML][MLLIB][KUBERNETES][EXAMPLES][CONNECT][STRUCTURED 
STREAMING] Fix `Passing an explicit array value to a Scala varargs method is 
deprecated`

### What changes were proposed in this pull request?
Fix the deprecated behavior below:
`Passing an explicit array value to a Scala varargs method is deprecated 
(since 2.13.0) and will result in a defensive copy; Use the more efficient 
non-copying ArraySeq.unsafeWrapArray or an explicit toIndexedSeq call`

For all the use cases, we don't need to make a copy of the array. 
Explicitly use `ArraySeq.unsafeWrapArray` to do the conversion.

### Why are the changes needed?
Eliminate compile warnings and no longer use deprecated scala APIs.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GA.
Fixed all the warning with build: `mvn clean package -DskipTests 
-Pspark-ganglia-lgpl -Pkinesis-asl -Pdocker-integration-tests -Pyarn 
-Pkubernetes -Pkubernetes-integration-tests -Phive-thriftserver -Phadoop-cloud`

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #43642 from ivoson/SPARK-45687.

Authored-by: Tengfei Huang 
Signed-off-by: Sean Owen 
---
 .../scala/org/apache/spark/sql/KeyValueGroupedDataset.scala  |  9 ++---
 .../test/scala/org/apache/spark/sql/ColumnTestSuite.scala|  3 ++-
 .../apache/spark/sql/UserDefinedFunctionE2ETestSuite.scala   |  5 -
 .../spark/sql/connect/planner/SparkConnectPlanner.scala  |  3 ++-
 .../main/scala/org/apache/spark/api/python/PythonRDD.scala   |  3 ++-
 core/src/main/scala/org/apache/spark/executor/Executor.scala |  3 ++-
 core/src/main/scala/org/apache/spark/rdd/RDD.scala   |  3 ++-
 .../scala/org/apache/spark/examples/graphx/Analytics.scala   |  4 ++--
 .../scala/org/apache/spark/ml/classification/OneVsRest.scala |  3 ++-
 .../scala/org/apache/spark/ml/feature/FeatureHasher.scala|  4 +++-
 .../src/main/scala/org/apache/spark/ml/feature/Imputer.scala |  8 +---
 .../main/scala/org/apache/spark/ml/feature/Interaction.scala |  4 +++-
 .../main/scala/org/apache/spark/ml/feature/RFormula.scala|  6 --
 .../scala/org/apache/spark/ml/feature/VectorAssembler.scala  |  5 +++--
 mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala  |  3 ++-
 .../src/main/scala/org/apache/spark/ml/fpm/PrefixSpan.scala  |  3 ++-
 .../src/main/scala/org/apache/spark/ml/r/KSTestWrapper.scala |  3 ++-
 .../apache/spark/ml/regression/DecisionTreeRegressor.scala   |  3 ++-
 .../src/main/scala/org/apache/spark/ml/tree/treeModels.scala |  3 ++-
 .../src/main/scala/org/apache/spark/mllib/util/MLUtils.scala | 12 
 .../scala/org/apache/spark/ml/feature/ImputerSuite.scala | 12 
 .../apache/spark/ml/source/image/ImageFileFormatSuite.scala  |  3 ++-
 .../apache/spark/ml/stat/KolmogorovSmirnovTestSuite.scala|  3 ++-
 mllib/src/test/scala/org/apache/spark/ml/util/MLTest.scala   |  6 --
 .../deploy/k8s/features/DriverCommandFeatureStepSuite.scala  |  2 +-
 .../apache/spark/sql/catalyst/expressions/generators.scala   |  8 ++--
 .../sql/catalyst/expressions/UnsafeRowConverterSuite.scala   |  4 +++-
 .../scala/org/apache/spark/sql/DataFrameStatFunctions.scala  |  3 ++-
 .../scala/org/apache/spark/sql/KeyValueGroupedDataset.scala  |  8 ++--
 .../spark/sql/execution/datasources/jdbc/JDBCRDD.scala   |  2 +-
 .../org/apache/spark/sql/execution/stat/StatFunctions.scala  |  3 ++-
 .../apache/spark/sql/execution/streaming/OffsetSeqLog.scala  |  3 ++-
 .../streaming/continuous/ContinuousRateStreamSource.scala|  3 ++-
 .../src/test/scala/org/apache/spark/sql/DataFrameSuite.scala |  3 ++-
 .../src/test/scala/org/apache/spark/sql/DatasetSuite.scala   |  6 --
 .../src/test/scala/org/apache/spark/sql/GenTPCDSData.scala   |  3 ++-
 .../test/scala/org/apache/spark/sql/ParametersSuite.scala|  9 +
 .../spark/sql/connector/SimpleWritableDataSource.scala   |  4 +++-
 .../sql/execution/datasources/FileMetadataStructSuite.scala  |  3 ++-
 .../spark/sql/execution/datasources/csv/CSVBenchmark.scala   |  7 ---
 .../scala/org/apache/spark/sql/streaming/StreamSuite.scala   |  2 +-
 .../org/apache/spark/sql/streaming/StreamingQuerySuite.scala |  3 ++-
 .../org/apache/spark/sql/hive/thriftserver/CliSuite.scala

(spark) branch master updated: [SPARK-45368][SQL] Remove scala2.12 compatibility logic for DoubleType, FloatType, Decimal

2023-10-31 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 102daf9d149 [SPARK-45368][SQL] Remove scala2.12 compatibility logic 
for DoubleType, FloatType, Decimal
102daf9d149 is described below

commit 102daf9d1490d12b812be4432c77ce102e82c3bb
Author: tangjiafu 
AuthorDate: Tue Oct 31 08:42:46 2023 -0500

[SPARK-45368][SQL] Remove scala2.12 compatibility logic for DoubleType, 
FloatType, Decimal

### What changes were proposed in this pull request?

Remove scala2.12 compatibility logic for DoubleType, FloatType, Decimal

### Why are the changes needed?

Drop Scala 2.12 and make Scala 2.13 by default
https://issues.apache.org/jira/browse/SPARK-45368

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

test by ci

### Was this patch authored or co-authored using generative AI tooling?

no

Closes #43456 from laglangyue/f_SPARK-45368_scala12_dataType.

Lead-authored-by: tangjiafu 
Co-authored-by: laglangyue 
Signed-off-by: Sean Owen 
---
 sql/api/src/main/scala/org/apache/spark/sql/types/Decimal.scala| 4 +---
 sql/api/src/main/scala/org/apache/spark/sql/types/DoubleType.scala | 5 +
 sql/api/src/main/scala/org/apache/spark/sql/types/FloatType.scala  | 5 +
 3 files changed, 3 insertions(+), 11 deletions(-)

diff --git a/sql/api/src/main/scala/org/apache/spark/sql/types/Decimal.scala 
b/sql/api/src/main/scala/org/apache/spark/sql/types/Decimal.scala
index afe73635a68..3ce0508951f 100644
--- a/sql/api/src/main/scala/org/apache/spark/sql/types/Decimal.scala
+++ b/sql/api/src/main/scala/org/apache/spark/sql/types/Decimal.scala
@@ -681,9 +681,7 @@ object Decimal {
 override def toLong(x: Decimal): Long = x.toLong
 override def fromInt(x: Int): Decimal = new Decimal().set(x)
 override def compare(x: Decimal, y: Decimal): Int = x.compare(y)
-// Added from Scala 2.13; don't override to work in 2.12
-// TODO revisit once Scala 2.12 support is dropped
-def parseString(str: String): Option[Decimal] = Try(Decimal(str)).toOption
+override def parseString(str: String): Option[Decimal] = 
Try(Decimal(str)).toOption
   }
 
   /** A [[scala.math.Fractional]] evidence parameter for Decimals. */
diff --git a/sql/api/src/main/scala/org/apache/spark/sql/types/DoubleType.scala 
b/sql/api/src/main/scala/org/apache/spark/sql/types/DoubleType.scala
index d18c7b98af2..bc0ed725cf2 100644
--- a/sql/api/src/main/scala/org/apache/spark/sql/types/DoubleType.scala
+++ b/sql/api/src/main/scala/org/apache/spark/sql/types/DoubleType.scala
@@ -42,8 +42,6 @@ class DoubleType private() extends FractionalType {
 @Stable
 case object DoubleType extends DoubleType {
 
-  // Traits below copied from Scala 2.12; not present in 2.13
-  // TODO: SPARK-30011 revisit once Scala 2.12 support is dropped
   trait DoubleIsConflicted extends Numeric[Double] {
 def plus(x: Double, y: Double): Double = x + y
 def minus(x: Double, y: Double): Double = x - y
@@ -56,8 +54,7 @@ case object DoubleType extends DoubleType {
 def toDouble(x: Double): Double = x
 // logic in Numeric base trait mishandles abs(-0.0)
 override def abs(x: Double): Double = math.abs(x)
-// Added from Scala 2.13; don't override to work in 2.12
-def parseString(str: String): Option[Double] =
+override def parseString(str: String): Option[Double] =
   Try(java.lang.Double.parseDouble(str)).toOption
 
   }
diff --git a/sql/api/src/main/scala/org/apache/spark/sql/types/FloatType.scala 
b/sql/api/src/main/scala/org/apache/spark/sql/types/FloatType.scala
index 978384eebfe..8b54f830d48 100644
--- a/sql/api/src/main/scala/org/apache/spark/sql/types/FloatType.scala
+++ b/sql/api/src/main/scala/org/apache/spark/sql/types/FloatType.scala
@@ -43,8 +43,6 @@ class FloatType private() extends FractionalType {
 @Stable
 case object FloatType extends FloatType {
 
-  // Traits below copied from Scala 2.12; not present in 2.13
-  // TODO: SPARK-30011 revisit once Scala 2.12 support is dropped
   trait FloatIsConflicted extends Numeric[Float] {
 def plus(x: Float, y: Float): Float = x + y
 def minus(x: Float, y: Float): Float = x - y
@@ -57,8 +55,7 @@ case object FloatType extends FloatType {
 def toDouble(x: Float): Double = x.toDouble
 // logic in Numeric base trait mishandles abs(-0.0f)
 override def abs(x: Float): Float = math.abs(x)
-// Added from Scala 2.13; don't override to work in 2.12
-def parseString(str: String): Option[Float] =
+override def parseString(str: String): Option[Float] =
   Try(java.lang.Float.parseFloat(str)).toOption
   }
 


-
To unsubscribe, e-mail: c

(spark) branch master updated: [SPARK-45605][CORE][SQL][SS][CONNECT][MLLIB][GRAPHX][DSTREAM][PROTOBUF][EXAMPLES] Replace `s.c.MapOps.mapValues` with `s.c.MapOps.view.mapValues`

2023-10-29 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 89ca8b6065e 
[SPARK-45605][CORE][SQL][SS][CONNECT][MLLIB][GRAPHX][DSTREAM][PROTOBUF][EXAMPLES]
 Replace `s.c.MapOps.mapValues` with `s.c.MapOps.view.mapValues`
89ca8b6065e is described below

commit 89ca8b6065e9f690a492c778262080741d50d94d
Author: yangjie01 
AuthorDate: Sun Oct 29 09:19:30 2023 -0500


[SPARK-45605][CORE][SQL][SS][CONNECT][MLLIB][GRAPHX][DSTREAM][PROTOBUF][EXAMPLES]
 Replace `s.c.MapOps.mapValues` with `s.c.MapOps.view.mapValues`

### What changes were proposed in this pull request?
This pr replace `s.c.MapOps.mapValues` with `s.c.MapOps.view.mapValues`  
due to `s.c.MapOps.mapValues` marked as deprecated since Scala 2.13.0:


https://github.com/scala/scala/blob/bf45e199e96383b96a6955520d7d2524c78e6e12/src/library/scala/collection/Map.scala#L256-L262

```scala
  deprecated("Use .view.mapValues(f). A future version will include a 
strict version of this method (for now, .view.mapValues(f).toMap).", "2.13.0")
  def mapValues[W](f: V => W): MapView[K, W] = new MapView.MapValues(this, 
f)
```

### Why are the changes needed?
Cleanup deprecated API usage.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
- Pass GitHub Acitons
- Packaged the client, manually tested 
`DFSReadWriteTest/MiniReadWriteTest/PowerIterationClusteringExample`.

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #43448 from LuciferYang/SPARK-45605.

Lead-authored-by: yangjie01 
Co-authored-by: YangJie 
Signed-off-by: Sean Owen 
---
 .../spark/util/sketch/CountMinSketchSuite.scala|  2 +-
 .../org/apache/spark/sql/avro/AvroUtils.scala  |  1 +
 .../scala/org/apache/spark/sql/SparkSession.scala  |  2 +-
 .../spark/sql/ClientDataFrameStatSuite.scala   |  2 +-
 .../org/apache/spark/sql/connect/dsl/package.scala |  2 +-
 .../sql/connect/planner/SparkConnectPlanner.scala  | 13 ++
 .../sql/kafka010/KafkaMicroBatchSourceSuite.scala  |  3 ++-
 .../apache/spark/sql/kafka010/KafkaTestUtils.scala |  2 +-
 .../streaming/kafka010/ConsumerStrategy.scala  |  6 ++---
 .../kafka010/DirectKafkaInputDStream.scala |  2 +-
 .../kafka010/DirectKafkaStreamSuite.scala  |  2 +-
 .../spark/streaming/kafka010/KafkaTestUtils.scala  |  2 +-
 .../spark/streaming/kinesis/KinesisTestUtils.scala |  2 +-
 .../kinesis/KPLBasedKinesisTestUtils.scala |  2 +-
 .../kinesis/KinesisBackedBlockRDDSuite.scala   |  4 +--
 .../spark/sql/protobuf/utils/ProtobufUtils.scala   |  1 +
 .../org/apache/spark/api/java/JavaPairRDD.scala|  4 +--
 .../apache/spark/api/java/JavaSparkContext.scala   |  2 +-
 .../spark/api/python/PythonWorkerFactory.scala |  2 +-
 .../apache/spark/scheduler/InputFormatInfo.scala   |  2 +-
 .../apache/spark/scheduler/TaskSchedulerImpl.scala |  2 +-
 .../cluster/CoarseGrainedSchedulerBackend.scala|  2 +-
 ...plicationEnvironmentInfoWrapperSerializer.scala |  5 ++--
 .../ExecutorSummaryWrapperSerializer.scala |  3 ++-
 .../status/protobuf/JobDataWrapperSerializer.scala |  2 +-
 .../protobuf/StageDataWrapperSerializer.scala  |  6 ++---
 .../org/apache/spark/SparkThrowableSuite.scala |  2 +-
 .../apache/spark/rdd/PairRDDFunctionsSuite.scala   |  2 +-
 .../test/scala/org/apache/spark/rdd/RDDSuite.scala |  1 +
 .../scheduler/ExecutorResourceInfoSuite.scala  |  1 +
 .../BlockManagerDecommissionIntegrationSuite.scala |  2 +-
 .../storage/ShuffleBlockFetcherIteratorSuite.scala |  2 +-
 .../util/collection/ExternalSorterSuite.scala  |  2 +-
 .../apache/spark/examples/DFSReadWriteTest.scala   |  1 +
 .../apache/spark/examples/MiniReadWriteTest.scala  |  1 +
 .../mllib/PowerIterationClusteringExample.scala|  2 +-
 .../spark/graphx/lib/ShortestPathsSuite.scala  |  2 +-
 .../spark/ml/evaluation/ClusteringMetrics.scala|  1 +
 .../apache/spark/ml/feature/VectorIndexer.scala|  2 +-
 .../org/apache/spark/ml/feature/Word2Vec.scala |  2 +-
 .../apache/spark/ml/tree/impl/RandomForest.scala   |  4 +--
 .../spark/mllib/clustering/BisectingKMeans.scala   |  2 +-
 .../mllib/linalg/distributed/BlockMatrix.scala |  4 +--
 .../apache/spark/mllib/stat/test/ChiSqTest.scala   |  1 +
 .../apache/spark/ml/recommendation/ALSSuite.scala  |  8 +++---
 .../apache/spark/mllib/feature/Word2VecSuite.scala | 12 -
 .../org/apache/spark/sql/types/Metadata.scala  |  2 +-
 .../spark/sql/catalyst/analysis/Analyzer.scala |  3 ++-
 .../catalyst/catalog/ExternalCatalogUtils.scala|  2 +-
 .../sql/catalyst/catalog/SessionCatalog.scala  |  2 +-
 .../spark/sql/catalyst/expressions/package.scala   |  2 +-

(spark) branch master updated: [SPARK-45636][BUILD] Upgrade jersey to 2.41

2023-10-29 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 4ae99f9320c [SPARK-45636][BUILD] Upgrade jersey to 2.41
4ae99f9320c is described below

commit 4ae99f9320ca29193f7c0d6d54d61e5d3fd0b323
Author: YangJie 
AuthorDate: Sun Oct 29 09:18:07 2023 -0500

[SPARK-45636][BUILD] Upgrade jersey to 2.41

### What changes were proposed in this pull request?
This pr aims upgrade Jersey from 2.40 to 2.41.

### Why are the changes needed?
The new version bring some improvements, like:
- https://github.com/eclipse-ee4j/jersey/pull/5350
- https://github.com/eclipse-ee4j/jersey/pull/5365
- https://github.com/eclipse-ee4j/jersey/pull/5436
- https://github.com/eclipse-ee4j/jersey/pull/5296

and some bug fix, like:
- https://github.com/eclipse-ee4j/jersey/pull/5359
- https://github.com/eclipse-ee4j/jersey/pull/5405
- https://github.com/eclipse-ee4j/jersey/pull/5423
- https://github.com/eclipse-ee4j/jersey/pull/5435
- https://github.com/eclipse-ee4j/jersey/pull/5445

The full release notes as follows:
- https://github.com/eclipse-ee4j/jersey/releases/tag/2.41

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GitHub Actions

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #43490 from LuciferYang/SPARK-45636.

Lead-authored-by: YangJie 
Co-authored-by: yangjie01 
Signed-off-by: Sean Owen 
---
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 12 ++--
 pom.xml   |  2 +-
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index c6fa77c84ca..2bfd94b9d46 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -122,12 +122,12 @@ jaxb-runtime/2.3.2//jaxb-runtime-2.3.2.jar
 jcl-over-slf4j/2.0.9//jcl-over-slf4j-2.0.9.jar
 jdo-api/3.0.1//jdo-api-3.0.1.jar
 jdom2/2.0.6//jdom2-2.0.6.jar
-jersey-client/2.40//jersey-client-2.40.jar
-jersey-common/2.40//jersey-common-2.40.jar
-jersey-container-servlet-core/2.40//jersey-container-servlet-core-2.40.jar
-jersey-container-servlet/2.40//jersey-container-servlet-2.40.jar
-jersey-hk2/2.40//jersey-hk2-2.40.jar
-jersey-server/2.40//jersey-server-2.40.jar
+jersey-client/2.41//jersey-client-2.41.jar
+jersey-common/2.41//jersey-common-2.41.jar
+jersey-container-servlet-core/2.41//jersey-container-servlet-core-2.41.jar
+jersey-container-servlet/2.41//jersey-container-servlet-2.41.jar
+jersey-hk2/2.41//jersey-hk2-2.41.jar
+jersey-server/2.41//jersey-server-2.41.jar
 jettison/1.5.4//jettison-1.5.4.jar
 jetty-util-ajax/9.4.53.v20231009//jetty-util-ajax-9.4.53.v20231009.jar
 jetty-util/9.4.53.v20231009//jetty-util-9.4.53.v20231009.jar
diff --git a/pom.xml b/pom.xml
index 6488918326f..71c3044dd42 100644
--- a/pom.xml
+++ b/pom.xml
@@ -206,7 +206,7 @@
   Please don't upgrade the version to 3.0.0+,
   Because it transitions Jakarta REST API from javax to jakarta package.
 -->
-2.40
+2.41
 2.12.5
 3.5.2
 3.0.0


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark-website) branch asf-site updated: [SPARK-45706][PYTHON][DOCS] Fix the links for Binder builds for Spark 3.5.0

2023-10-27 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/spark-website.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new 0da360e961 [SPARK-45706][PYTHON][DOCS] Fix the links for Binder builds 
for Spark 3.5.0
0da360e961 is described below

commit 0da360e9615eda835230931f83a2c4d82165050d
Author: Hyukjin Kwon 
AuthorDate: Fri Oct 27 08:20:42 2023 -0500

[SPARK-45706][PYTHON][DOCS] Fix the links for Binder builds for Spark 3.5.0

This PR cherry-picks https://github.com/apache/spark/pull/43553 into Spark 
3.5.0 PySpark documentation to recover the live notebooks

Author: Hyukjin Kwon 

Closes #484 from HyukjinKwon/fix-binder-build.
---
 site/docs/3.5.0/api/python/getting_started/index.html | 8 
 site/docs/3.5.0/api/python/index.html | 8 
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/site/docs/3.5.0/api/python/getting_started/index.html 
b/site/docs/3.5.0/api/python/getting_started/index.html
index b5e4e54b66..0bd7ae9a7c 100644
--- a/site/docs/3.5.0/api/python/getting_started/index.html
+++ b/site/docs/3.5.0/api/python/getting_started/index.html
@@ -215,9 +215,9 @@ There are more guides shared with other languages such as
 at https://spark.apache.org/docs/latest/index.html#where-to-go-from-here";>the
 Spark documentation.
 There are live notebooks where you can try PySpark out without any other 
step:
 
-https://mybinder.org/v2/gh/apache/spark/ce5ddad9903?filepath=python%2Fdocs%2Fsource%2Fgetting_started%2Fquickstart_df.ipynb";>Live
 Notebook: DataFrame
-https://mybinder.org/v2/gh/apache/spark/ce5ddad9903?filepath=python%2Fdocs%2Fsource%2Fgetting_started%2Fquickstart_connect.ipynb";>Live
 Notebook: Spark Connect
-https://mybinder.org/v2/gh/apache/spark/ce5ddad9903?filepath=python%2Fdocs%2Fsource%2Fgetting_started%2Fquickstart_ps.ipynb";>Live
 Notebook: pandas API on Spark
+https://mybinder.org/v2/gh/apache/spark/270861a3cd6?filepath=python%2Fdocs%2Fsource%2Fgetting_started%2Fquickstart_df.ipynb";>Live
 Notebook: DataFrame
+https://mybinder.org/v2/gh/apache/spark/270861a3cd6?filepath=python%2Fdocs%2Fsource%2Fgetting_started%2Fquickstart_connect.ipynb";>Live
 Notebook: Spark Connect
+https://mybinder.org/v2/gh/apache/spark/270861a3cd6?filepath=python%2Fdocs%2Fsource%2Fgetting_started%2Fquickstart_ps.ipynb";>Live
 Notebook: pandas API on Spark
 
 The list below is the contents of this quickstart page:
 
@@ -313,4 +313,4 @@ Created using http://sphinx-doc.org/";>Sphinx 
3.0.4.
   
 
   
-
\ No newline at end of file
+
diff --git a/site/docs/3.5.0/api/python/index.html 
b/site/docs/3.5.0/api/python/index.html
index faf6e558a5..1c757dc92b 100644
--- a/site/docs/3.5.0/api/python/index.html
+++ b/site/docs/3.5.0/api/python/index.html
@@ -183,7 +183,7 @@
 PySpark Overview¶
 Date: Sep 09, 2023 Version: 3.5.0
 Useful links:
-https://mybinder.org/v2/gh/apache/spark/ce5ddad9903?filepath=python%2Fdocs%2Fsource%2Fgetting_started%2Fquickstart_df.ipynb";>Live
 Notebook | https://github.com/apache/spark";>GitHub | https://issues.apache.org/jira/projects/SPARK/issues";>Issues | https://github.com/apache/spark/tree/ce5ddad9903/examples/src/main/python";>Examples
 |  [...]
+https://mybinder.org/v2/gh/apache/spark/270861a3cd6?filepath=python%2Fdocs%2Fsource%2Fgetting_started%2Fquickstart_df.ipynb";>Live
 Notebook | https://github.com/apache/spark";>GitHub | https://issues.apache.org/jira/projects/SPARK/issues";>Issues | https://github.com/apache/spark/tree/270861a3cd6/examples/src/main/python";>Examples
 |  [...]
 PySpark is the Python API for Apache Spark. It enables you to perform 
real-time,
 large-scale data processing in a distributed environment using Python. It also 
provides a PySpark
 shell for interactively analyzing your data.
@@ -237,7 +237,7 @@ Whether you use Python or SQL, the same underlying execution
 engine is used so you will always leverage the full power of Spark.
 
 Quickstart: 
DataFrame
-https://mybinder.org/v2/gh/apache/spark/ce5ddad9903?filepath=python%2Fdocs%2Fsource%2Fgetting_started%2Fquickstart_df.ipynb";>Live
 Notebook: DataFrame
+https://mybinder.org/v2/gh/apache/spark/270861a3cd6?filepath=python%2Fdocs%2Fsource%2Fgetting_started%2Fquickstart_df.ipynb";>Live
 Notebook: DataFrame
 Spark SQL API 
Reference
 
 Pandas API on Spark
@@ -253,7 +253,7 @@ if you are new to Spark or deciding which API to use, we 
recommend using PySpark
 (see Spark SQL 
and DataFrames).
 
 Quickstart: 
Pandas API on Spark
-https://mybinder.org/v2/gh/apache/spark/ce5ddad9903?filepath=python%2Fdocs%2Fsource%2Fgetting_started%2Fquickstart_ps.ipynb";>Live
 Notebook: pandas API on Spark
+https://mybinder.org/v2/gh/apache/spark/270861a3cd6?filepath=python%2Fdocs%2

[spark] branch branch-3.4 updated: [SPARK-40154][PYTHON][DOCS] Correct storage level in Dataframe.cache docstring

2023-10-25 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new ecdb69f3db3 [SPARK-40154][PYTHON][DOCS] Correct storage level in 
Dataframe.cache docstring
ecdb69f3db3 is described below

commit ecdb69f3db3370aa7cf6ae8a52130379e465ca73
Author: Paul Staab 
AuthorDate: Wed Oct 25 07:36:15 2023 -0500

[SPARK-40154][PYTHON][DOCS] Correct storage level in Dataframe.cache 
docstring

### What changes were proposed in this pull request?
Corrects the docstring `DataFrame.cache` to give the correct storage level 
after it changed with Spark 3.0. It seems that the docstring of 
`DataFrame.persist` was updated, but `cache` was forgotten.

### Why are the changes needed?
The doctoring claims that `cache` uses serialised storage, but it actually 
uses deserialised storage. I confirmed that this is still the case with Spark 
3.5.0 using the example code from the Jira ticket.

### Does this PR introduce _any_ user-facing change?
Yes, the docstring changes.

### How was this patch tested?
The Github actions workflow succeeded.

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #43229 from paulstaab/SPARK-40154.

Authored-by: Paul Staab 
Signed-off-by: Sean Owen 
(cherry picked from commit 94607dd001b133a25dc9865f25b3f9e7f5a5daa3)
Signed-off-by: Sean Owen 
---
 python/pyspark/sql/dataframe.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/python/pyspark/sql/dataframe.py b/python/pyspark/sql/dataframe.py
index 518bc9867d7..14426c51439 100644
--- a/python/pyspark/sql/dataframe.py
+++ b/python/pyspark/sql/dataframe.py
@@ -1404,7 +1404,7 @@ class DataFrame(PandasMapOpsMixin, PandasConversionMixin):
 self.rdd.foreachPartition(f)  # type: ignore[arg-type]
 
 def cache(self) -> "DataFrame":
-"""Persists the :class:`DataFrame` with the default storage level 
(`MEMORY_AND_DISK`).
+"""Persists the :class:`DataFrame` with the default storage level 
(`MEMORY_AND_DISK_DESER`).
 
 .. versionadded:: 1.3.0
 
@@ -1413,7 +1413,7 @@ class DataFrame(PandasMapOpsMixin, PandasConversionMixin):
 
 Notes
 -
-The default storage level has changed to `MEMORY_AND_DISK` to match 
Scala in 2.0.
+The default storage level has changed to `MEMORY_AND_DISK_DESER` to 
match Scala in 3.0.
 
 Returns
 ---


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.5 updated: [SPARK-40154][PYTHON][DOCS] Correct storage level in Dataframe.cache docstring

2023-10-25 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 9e4411e2450 [SPARK-40154][PYTHON][DOCS] Correct storage level in 
Dataframe.cache docstring
9e4411e2450 is described below

commit 9e4411e2450d0503933626207b5e03308c30bc72
Author: Paul Staab 
AuthorDate: Wed Oct 25 07:36:15 2023 -0500

[SPARK-40154][PYTHON][DOCS] Correct storage level in Dataframe.cache 
docstring

### What changes were proposed in this pull request?
Corrects the docstring `DataFrame.cache` to give the correct storage level 
after it changed with Spark 3.0. It seems that the docstring of 
`DataFrame.persist` was updated, but `cache` was forgotten.

### Why are the changes needed?
The doctoring claims that `cache` uses serialised storage, but it actually 
uses deserialised storage. I confirmed that this is still the case with Spark 
3.5.0 using the example code from the Jira ticket.

### Does this PR introduce _any_ user-facing change?
Yes, the docstring changes.

### How was this patch tested?
The Github actions workflow succeeded.

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #43229 from paulstaab/SPARK-40154.

Authored-by: Paul Staab 
Signed-off-by: Sean Owen 
(cherry picked from commit 94607dd001b133a25dc9865f25b3f9e7f5a5daa3)
Signed-off-by: Sean Owen 
---
 python/pyspark/sql/dataframe.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/python/pyspark/sql/dataframe.py b/python/pyspark/sql/dataframe.py
index 30ed73d3c47..5707ae2a31f 100644
--- a/python/pyspark/sql/dataframe.py
+++ b/python/pyspark/sql/dataframe.py
@@ -1485,7 +1485,7 @@ class DataFrame(PandasMapOpsMixin, PandasConversionMixin):
 self.rdd.foreachPartition(f)  # type: ignore[arg-type]
 
 def cache(self) -> "DataFrame":
-"""Persists the :class:`DataFrame` with the default storage level 
(`MEMORY_AND_DISK`).
+"""Persists the :class:`DataFrame` with the default storage level 
(`MEMORY_AND_DISK_DESER`).
 
 .. versionadded:: 1.3.0
 
@@ -1494,7 +1494,7 @@ class DataFrame(PandasMapOpsMixin, PandasConversionMixin):
 
 Notes
 -
-The default storage level has changed to `MEMORY_AND_DISK` to match 
Scala in 2.0.
+The default storage level has changed to `MEMORY_AND_DISK_DESER` to 
match Scala in 3.0.
 
 Returns
 ---


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (a073bf38c7d -> 94607dd001b)

2023-10-25 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from a073bf38c7d [SPARK-45209][CORE][UI] Flame Graph Support For Executor 
Thread Dump Page
 add 94607dd001b [SPARK-40154][PYTHON][DOCS] Correct storage level in 
Dataframe.cache docstring

No new revisions were added by this update.

Summary of changes:
 python/pyspark/sql/dataframe.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (2709426f0f6 -> 48e207f4a21)

2023-10-22 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 2709426f0f6 [SPARK-45541][CORE] Add SSLFactory
 add 48e207f4a21 
[SPARK-45610][BUILD][CORE][SQL][SS][CONNECT][GRAPHX][DSTREAM][ML][MLLIB][K8S][YARN][SHELL][PYTHON][R][AVRO][UI][EXAMPLES]
 Fix the compilation warning "Auto-application to `()` is deprecated" and turn 
it into a compilation error

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/avro/AvroSuite.scala  |  38 ++---
 .../execution/benchmark/AvroReadBenchmark.scala|  10 +-
 .../execution/benchmark/AvroWriteBenchmark.scala   |   4 +-
 .../org/apache/spark/sql/ClientE2ETestSuite.scala  |  12 +-
 .../spark/sql/DataFrameNaFunctionSuite.scala   |   2 +-
 .../sql/UserDefinedFunctionE2ETestSuite.scala  |   2 +-
 .../spark/sql/connect/client/ArtifactManager.scala |   2 +-
 .../sql/connect/client/GrpcRetryHandler.scala  |   2 +-
 .../execution/ExecuteResponseObserver.scala|   4 +-
 .../execution/SparkConnectPlanExecution.scala  |   2 +-
 .../sql/connect/planner/SparkConnectPlanner.scala  |  12 +-
 .../sql/connect/service/ExecuteEventsManager.scala |   2 +-
 .../sql/connect/service/SparkConnectServer.scala   |   2 +-
 .../connect/planner/SparkConnectServiceSuite.scala |   2 +-
 .../connect/service/AddArtifactsHandlerSuite.scala |  12 +-
 .../service/ArtifactStatusesHandlerSuite.scala |   2 +-
 .../service/FetchErrorDetailsHandlerSuite.scala|   2 +-
 .../connect/service/InterceptorRegistrySuite.scala |  12 +-
 .../spark/sql/jdbc/DB2IntegrationSuite.scala   |   8 +-
 .../sql/jdbc/MsSqlServerIntegrationSuite.scala |   6 +-
 .../spark/sql/jdbc/MySQLIntegrationSuite.scala |   4 +-
 .../spark/sql/jdbc/OracleIntegrationSuite.scala|   8 +-
 .../spark/sql/jdbc/PostgresIntegrationSuite.scala  |   4 +-
 .../sql/kafka010/KafkaOffsetReaderConsumer.scala   |   2 +-
 .../sql/kafka010/consumer/KafkaDataConsumer.scala  |   2 +-
 .../sql/kafka010/KafkaContinuousSourceSuite.scala  |  16 +-
 .../sql/kafka010/KafkaMicroBatchSourceSuite.scala  |   8 +-
 .../spark/sql/kafka010/KafkaRelationSuite.scala|  36 ++--
 .../kafka010/consumer/KafkaDataConsumerSuite.scala |   2 +-
 .../org/apache/spark/kafka010/KafkaTokenUtil.scala |   4 +-
 .../kafka010/DirectKafkaInputDStream.scala |  12 +-
 .../streaming/kafka010/KafkaDataConsumer.scala |   2 +-
 .../apache/spark/streaming/kafka010/KafkaRDD.scala |  14 +-
 .../kafka010/DirectKafkaStreamSuite.scala  |  14 +-
 .../kafka010/KafkaDataConsumerSuite.scala  |   4 +-
 .../spark/streaming/kafka010/KafkaRDDSuite.scala   |  36 ++--
 .../kinesis/KPLBasedKinesisTestUtils.scala |   2 +-
 .../kinesis/KinesisInputDStreamBuilderSuite.scala  |   2 +-
 .../org/apache/spark/BarrierCoordinator.scala  |   2 +-
 .../org/apache/spark/BarrierTaskContext.scala  |  19 ++-
 .../main/scala/org/apache/spark/Heartbeater.scala  |   2 +-
 .../scala/org/apache/spark/SecurityManager.scala   |   4 +-
 .../main/scala/org/apache/spark/SparkContext.scala |  12 +-
 .../main/scala/org/apache/spark/TestUtils.scala|   2 +-
 .../apache/spark/api/java/JavaSparkContext.scala   |   2 +-
 .../org/apache/spark/api/python/PythonRunner.scala |  22 +--
 .../scala/org/apache/spark/api/r/BaseRRunner.scala |   2 +-
 .../org/apache/spark/deploy/JsonProtocol.scala |   2 +-
 .../apache/spark/deploy/SparkSubmitArguments.scala |   2 +-
 .../spark/deploy/history/FsHistoryProvider.scala   |   2 +-
 .../apache/spark/deploy/history/HistoryPage.scala  |   2 +-
 .../apache/spark/deploy/worker/CommandUtils.scala  |   4 +-
 .../org/apache/spark/deploy/worker/Worker.scala|   2 +-
 .../spark/executor/ProcfsMetricsGetter.scala   |   4 +-
 .../apache/spark/input/PortableDataStream.scala|   2 +-
 .../spark/internal/io/SparkHadoopWriter.scala  |   4 +-
 .../apache/spark/launcher/LauncherBackend.scala|   4 +-
 .../apache/spark/memory/UnifiedMemoryManager.scala |   2 +-
 .../org/apache/spark/metrics/MetricsSystem.scala   |   4 +-
 .../org/apache/spark/rdd/AsyncRDDActions.scala |   2 +-
 .../scala/org/apache/spark/rdd/HadoopRDD.scala |   2 +-
 .../main/scala/org/apache/spark/rdd/PipedRDD.scala |   4 +-
 .../apache/spark/rdd/ReliableCheckpointRDD.scala   |   6 +-
 .../org/apache/spark/scheduler/DAGScheduler.scala  |  16 +-
 .../spark/scheduler/StatsReportListener.scala  |   2 +-
 .../scala/org/apache/spark/scheduler/Task.scala|   4 +-
 .../apache/spark/scheduler/TaskSchedulerImpl.scala |   4 +-
 .../apache/spark/scheduler/TaskSetManager.scala|   4 +-
 .../cluster/CoarseGrainedSchedulerBackend.scala|   6 +-
 .../cluster/StandaloneSchedulerBackend.scala   |   2 +-
 .../apache/spark/serializer/KryoSerializer.scala   |  10 +-
 .../apache/spark/status/AppStatusListener.scala|  10 +-
 .../apache/spark/status/ElementTrackingS

[spark] branch master updated: [SPARK-45484][SQL][FOLLOWUP][DOCS] Update the document of parquet compression codec

2023-10-21 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 4023ec9bb44 [SPARK-45484][SQL][FOLLOWUP][DOCS] Update the document of 
parquet compression codec
4023ec9bb44 is described below

commit 4023ec9bb4471efee36afcec041c114a4b86a2c8
Author: Jiaan Geng 
AuthorDate: Sat Oct 21 16:39:13 2023 -0500

[SPARK-45484][SQL][FOLLOWUP][DOCS] Update the document of parquet 
compression codec

### What changes were proposed in this pull request?
This PR follows up https://github.com/apache/spark/pull/43310 to update the 
document of parquet compression codec.

### Why are the changes needed?
Update the document of parquet compression codec.

### Does this PR introduce _any_ user-facing change?
'No'.

### How was this patch tested?
N/A

### Was this patch authored or co-authored using generative AI tooling?
'No'.

Closes #43464 from beliefer/SPARK-45484_followup.

Authored-by: Jiaan Geng 
Signed-off-by: Sean Owen 
---
 docs/sql-data-sources-parquet.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/sql-data-sources-parquet.md b/docs/sql-data-sources-parquet.md
index 925e47504e5..c2af58248ea 100644
--- a/docs/sql-data-sources-parquet.md
+++ b/docs/sql-data-sources-parquet.md
@@ -423,7 +423,7 @@ Data source options of Parquet can be set via:
   
 compression
 snappy
-Compression codec to use when saving to file. This can be one of the 
known case-insensitive shorten names (none, uncompressed, snappy, gzip, lzo, 
brotli, lz4, and zstd). This will override 
spark.sql.parquet.compression.codec.
+Compression codec to use when saving to file. This can be one of the 
known case-insensitive shorten names (none, uncompressed, snappy, gzip, lzo, 
brotli, lz4, lz4_raw, and zstd). This will override 
spark.sql.parquet.compression.codec.
 write
   
 
@@ -484,7 +484,7 @@ Configuration of Parquet can be done using the `setConf` 
method on `SparkSession
 Sets the compression codec used when writing Parquet files. If either 
compression or
 parquet.compression is specified in the table-specific 
options/properties, the precedence would be
 compression, parquet.compression, 
spark.sql.parquet.compression.codec. Acceptable values include:
-none, uncompressed, snappy, gzip, lzo, brotli, lz4, zstd.
+none, uncompressed, snappy, gzip, lzo, brotli, lz4, lz4_raw, zstd.
 Note that brotli requires BrotliCodec to be 
installed.
   
   1.1.1


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [MINOR] Fix typos

2023-10-21 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 920fb673b26 [MINOR] Fix typos
920fb673b26 is described below

commit 920fb673b264c0bdcad0426020dedf57d8b11cc7
Author: shuoer86 <129674997+shuoe...@users.noreply.github.com>
AuthorDate: Sat Oct 21 16:37:27 2023 -0500

[MINOR] Fix typos

Closes #43434 from shuoer86/master.

Authored-by: shuoer86 <129674997+shuoe...@users.noreply.github.com>
Signed-off-by: Sean Owen 
---
 binder/postBuild| 4 ++--
 .../scala/org/apache/spark/sql/connect/service/SessionHolder.scala  | 2 +-
 .../spark/sql/connect/plugin/SparkConnectPluginRegistrySuite.scala  | 2 +-
 core/src/main/scala/org/apache/spark/storage/BlockInfoManager.scala | 2 +-
 core/src/main/scala/org/apache/spark/storage/DiskBlockManager.scala | 6 +++---
 .../main/scala/org/apache/spark/ui/jobs/TaskThreadDumpPage.scala| 2 +-
 .../scala/org/apache/spark/status/AutoCleanupLiveUIDirSuite.scala   | 2 +-
 docs/sql-ref-syntax-ddl-declare-variable.md | 2 +-
 8 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/binder/postBuild b/binder/postBuild
index 70ae23b3937..b6bdf72324c 100644
--- a/binder/postBuild
+++ b/binder/postBuild
@@ -38,7 +38,7 @@ else
   pip install plotly "pandas<2.0.0" 
"pyspark[sql,ml,mllib,pandas_on_spark]$SPECIFIER$VERSION"
 fi
 
-# Set 'PYARROW_IGNORE_TIMEZONE' to surpress warnings from PyArrow.
+# Set 'PYARROW_IGNORE_TIMEZONE' to suppress warnings from PyArrow.
 echo "export PYARROW_IGNORE_TIMEZONE=1" >> ~/.profile
 
 # Add sbin to PATH to run `start-connect-server.sh`.
@@ -50,7 +50,7 @@ echo "export SPARK_HOME=${SPARK_HOME}" >> ~/.profile
 SPARK_VERSION=$(python -c "import pyspark; print(pyspark.__version__)")
 echo "export SPARK_VERSION=${SPARK_VERSION}" >> ~/.profile
 
-# Surpress warnings from Spark jobs, and UI progress bar.
+# Suppress warnings from Spark jobs, and UI progress bar.
 mkdir -p ~/.ipython/profile_default/startup
 echo """from pyspark.sql import SparkSession
 SparkSession.builder.config('spark.ui.showConsoleProgress', 
'false').getOrCreate().sparkContext.setLogLevel('FATAL')
diff --git 
a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SessionHolder.scala
 
b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SessionHolder.scala
index 27f471233f1..dcced21f371 100644
--- 
a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SessionHolder.scala
+++ 
b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SessionHolder.scala
@@ -77,7 +77,7 @@ case class SessionHolder(userId: String, sessionId: String, 
session: SparkSessio
   private[service] def addExecuteHolder(executeHolder: ExecuteHolder): Unit = {
 val oldExecute = executions.putIfAbsent(executeHolder.operationId, 
executeHolder)
 if (oldExecute != null) {
-  // the existance of this should alrady be checked by 
SparkConnectExecutionManager
+  // the existence of this should alrady be checked by 
SparkConnectExecutionManager
   throw new IllegalStateException(
 s"ExecuteHolder with opId=${executeHolder.operationId} already 
exists!")
 }
diff --git 
a/connector/connect/server/src/test/scala/org/apache/spark/sql/connect/plugin/SparkConnectPluginRegistrySuite.scala
 
b/connector/connect/server/src/test/scala/org/apache/spark/sql/connect/plugin/SparkConnectPluginRegistrySuite.scala
index ea9ae3ed9d9..e1de6b04d21 100644
--- 
a/connector/connect/server/src/test/scala/org/apache/spark/sql/connect/plugin/SparkConnectPluginRegistrySuite.scala
+++ 
b/connector/connect/server/src/test/scala/org/apache/spark/sql/connect/plugin/SparkConnectPluginRegistrySuite.scala
@@ -226,7 +226,7 @@ class SparkConnectPluginRegistrySuite extends 
SharedSparkSession with SparkConne
 }
   }
 
-  test("Emtpy registries are really empty and work") {
+  test("Empty registries are really empty and work") {
 assert(SparkConnectPluginRegistry.loadRelationPlugins().isEmpty)
 assert(SparkConnectPluginRegistry.loadExpressionPlugins().isEmpty)
 assert(SparkConnectPluginRegistry.loadCommandPlugins().isEmpty)
diff --git 
a/core/src/main/scala/org/apache/spark/storage/BlockInfoManager.scala 
b/core/src/main/scala/org/apache/spark/storage/BlockInfoManager.scala
index f80190c96e8..73e72b7f1df 100644
--- a/core/src/main/scala/org/apache/spark/storage/BlockInfoManager.scala
+++ b/core/src/main/scala/org/apache/spark/storage/BlockInfoManager.scala
@@ -259,7 +259,7 @@ private[storage] class 
BlockInfoManager(trackingCacheVisibility:

[spark] branch master updated: [MINOR][DOCS] Fix one typo

2023-10-17 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new f1ae56b152b [MINOR][DOCS] Fix one typo
f1ae56b152b is described below

commit f1ae56b152bdf19246d698b65e553790ad54306b
Author: Ruifeng Zheng 
AuthorDate: Tue Oct 17 13:49:41 2023 -0500

[MINOR][DOCS] Fix one typo

### What changes were proposed in this pull request?
Fix one typo

### Why are the changes needed?
for doc

### Does this PR introduce _any_ user-facing change?
yes

### How was this patch tested?
I didn't find other similar typos in this page, so only one fix

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #43401 from zhengruifeng/minor_typo_connect_overview.

Authored-by: Ruifeng Zheng 
Signed-off-by: Sean Owen 
---
 docs/spark-connect-overview.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/spark-connect-overview.md b/docs/spark-connect-overview.md
index 82d84f39ca1..c7bad0994a8 100644
--- a/docs/spark-connect-overview.md
+++ b/docs/spark-connect-overview.md
@@ -261,7 +261,7 @@ spark-connect-repl --host myhost.com --port 443 --token 
ABCDEFG
 
 The supported list of CLI arguments may be found 
[here](https://github.com/apache/spark/blob/master/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/SparkConnectClientParser.scala#L48).
 
- Configure programmatically with a connection ctring
+ Configure programmatically with a connection string
 
 The connection may also be programmatically created using 
_SparkSession#builder_ as in this example:
 {% highlight scala %}


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-45564][SQL] Simplify 'DataFrameStatFunctions.bloomFilter' with 'BloomFilterAggregate' expression

2023-10-17 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 922844fff65 [SPARK-45564][SQL] Simplify 
'DataFrameStatFunctions.bloomFilter' with 'BloomFilterAggregate' expression
922844fff65 is described below

commit 922844fff65ac38fd93bd0c914dcc7e5cf879996
Author: Ruifeng Zheng 
AuthorDate: Tue Oct 17 10:11:36 2023 -0500

[SPARK-45564][SQL] Simplify 'DataFrameStatFunctions.bloomFilter' with 
'BloomFilterAggregate' expression

### What changes were proposed in this pull request?
Simplify 'DataFrameStatFunctions.bloomFilter' function with 
'BloomFilterAggregate' expression

### Why are the changes needed?
existing implementation was based on RDD, and it can be simplified by 
dataframe operations

### Does this PR introduce _any_ user-facing change?
when the input parameters or datatypes are invalid, throw 
`AnalysisException` instead of `IllegalArgumentException`

### How was this patch tested?
ci

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #43391 from zhengruifeng/sql_reimpl_stat_bloomFilter.

Authored-by: Ruifeng Zheng 
Signed-off-by: Sean Owen 
---
 .../apache/spark/sql/DataFrameStatFunctions.scala  | 68 +-
 1 file changed, 14 insertions(+), 54 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala
index 9d4f83c53a3..de3b100cd6a 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala
@@ -23,6 +23,8 @@ import scala.jdk.CollectionConverters._
 
 import org.apache.spark.annotation.Stable
 import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.Literal
+import org.apache.spark.sql.catalyst.expressions.aggregate.BloomFilterAggregate
 import org.apache.spark.sql.execution.stat._
 import org.apache.spark.sql.functions.col
 import org.apache.spark.sql.types._
@@ -535,7 +537,7 @@ final class DataFrameStatFunctions private[sql](df: 
DataFrame) {
* @since 2.0.0
*/
   def bloomFilter(colName: String, expectedNumItems: Long, fpp: Double): 
BloomFilter = {
-buildBloomFilter(Column(colName), expectedNumItems, -1L, fpp)
+bloomFilter(Column(colName), expectedNumItems, fpp)
   }
 
   /**
@@ -547,7 +549,8 @@ final class DataFrameStatFunctions private[sql](df: 
DataFrame) {
* @since 2.0.0
*/
   def bloomFilter(col: Column, expectedNumItems: Long, fpp: Double): 
BloomFilter = {
-buildBloomFilter(col, expectedNumItems, -1L, fpp)
+val numBits = BloomFilter.optimalNumOfBits(expectedNumItems, fpp)
+bloomFilter(col, expectedNumItems, numBits)
   }
 
   /**
@@ -559,7 +562,7 @@ final class DataFrameStatFunctions private[sql](df: 
DataFrame) {
* @since 2.0.0
*/
   def bloomFilter(colName: String, expectedNumItems: Long, numBits: Long): 
BloomFilter = {
-buildBloomFilter(Column(colName), expectedNumItems, numBits, Double.NaN)
+bloomFilter(Column(colName), expectedNumItems, numBits)
   }
 
   /**
@@ -571,57 +574,14 @@ final class DataFrameStatFunctions private[sql](df: 
DataFrame) {
* @since 2.0.0
*/
   def bloomFilter(col: Column, expectedNumItems: Long, numBits: Long): 
BloomFilter = {
-buildBloomFilter(col, expectedNumItems, numBits, Double.NaN)
-  }
-
-  private def buildBloomFilter(col: Column, expectedNumItems: Long,
-   numBits: Long,
-   fpp: Double): BloomFilter = {
-val singleCol = df.select(col)
-val colType = singleCol.schema.head.dataType
-
-require(colType == StringType || colType.isInstanceOf[IntegralType],
-  s"Bloom filter only supports string type and integral types, but got 
$colType.")
-
-val updater: (BloomFilter, InternalRow) => Unit = colType match {
-  // For string type, we can get bytes of our `UTF8String` directly, and 
call the `putBinary`
-  // instead of `putString` to avoid unnecessary conversion.
-  case StringType => (filter, row) => 
filter.putBinary(row.getUTF8String(0).getBytes)
-  case ByteType => (filter, row) => filter.putLong(row.getByte(0))
-  case ShortType => (filter, row) => filter.putLong(row.getShort(0))
-  case IntegerType => (filter, row) => filter.putLong(row.getInt(0))
-  case LongType => (filter, row) => filter.putLong(row.getLong(0))
-  case _ =>
-throw new IllegalArgumentException(
-  s"Bloom filter only supports string type and integral types, " +
-s"and does not sup

[spark] branch master updated: [SPARK-45512][CORE][SQL][SS][DSTREAM] Fix compilation warnings related to `other-nullary-override`

2023-10-17 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 3b46cc81614 [SPARK-45512][CORE][SQL][SS][DSTREAM] Fix compilation 
warnings related to `other-nullary-override`
3b46cc81614 is described below

commit 3b46cc816143d5bb553e86e8b716c28982cb5748
Author: YangJie 
AuthorDate: Tue Oct 17 07:34:06 2023 -0500

[SPARK-45512][CORE][SQL][SS][DSTREAM] Fix compilation warnings related to 
`other-nullary-override`

### What changes were proposed in this pull request?
This PR fixes two compilation warnings related to `other-nullary-override`

```
[error] 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/CloseableIterator.scala:36:16:
 method with a single empty parameter list overrides method hasNext in trait 
Iterator defined without a parameter list [quickfixable]
[error] Applicable -Wconf / nowarn filters for this fatal warning: 
msg=, cat=other-nullary-override, 
site=org.apache.spark.sql.connect.client.WrappedCloseableIterator
[error]   override def hasNext(): Boolean = innerIterator.hasNext
[error]^
[error] 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/ExecutePlanResponseReattachableIterator.scala:136:16:
 method without a parameter list overrides method hasNext in class 
WrappedCloseableIterator defined with a single empty parameter list 
[quickfixable]
[error] Applicable -Wconf / nowarn filters for this fatal warning: 
msg=, cat=other-nullary-override, 
site=org.apache.spark.sql.connect.client.ExecutePlanResponseReattachableIterator
[error]   override def hasNext: Boolean = synchronized {
[error]^
[error] 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/GrpcExceptionConverter.scala:73:20:
 method without a parameter list overrides method hasNext in class 
WrappedCloseableIterator defined with a single empty parameter list 
[quickfixable]
[error] Applicable -Wconf / nowarn filters for this fatal warning: 
msg=, cat=other-nullary-override, 
site=org.apache.spark.sql.connect.client.GrpcExceptionConverter.convertIterator
[error]   override def hasNext: Boolean = {
[error]^
[error] 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/GrpcRetryHandler.scala:77:18:
 method without a parameter list overrides method next in class 
WrappedCloseableIterator defined with a single empty parameter list 
[quickfixable]
[error] Applicable -Wconf / nowarn filters for this fatal warning: 
msg=, cat=other-nullary-override, 
site=org.apache.spark.sql.connect.client.GrpcRetryHandler.RetryIterator
[error] override def next: U = {
[error]  ^
[error] 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/GrpcRetryHandler.scala:81:18:
 method without a parameter list overrides method hasNext in class 
WrappedCloseableIterator defined with a single empty parameter list 
[quickfixable]
[error] Applicable -Wconf / nowarn filters for this fatal warning: 
msg=, cat=other-nullary-override, 
site=org.apache.spark.sql.connect.client.GrpcRetryHandler.RetryIterator
[error] override def hasNext: Boolean = {
[error]
```

and removes the corresponding suppression rules from the compilation options

```
"-Wconf:cat=other-nullary-override:wv",
```

On the other hand, the code corresponding to the following three 
suppression rules no longer exists, so the corresponding suppression rules were 
also cleaned up in this pr.

```
"-Wconf:cat=lint-multiarg-infix:wv",
"-Wconf:msg=method with a single empty parameter list overrides method 
without any parameter list:s",
"-Wconf:msg=method without a parameter list overrides a method with a 
single empty one:s",
```

### Why are the changes needed?
Code clean up.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GitHub Actions

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #43332 from LuciferYang/other-nullary-override.

Lead-authored-by: YangJie 
Co-authored-by: yangjie01 
Signed-off-by: Sean Owen 
---
 .../org/apache/spark/sql/avro/AvroRowReaderSuite.scala | 10 +-
 .../spark/sql/connect/client/CloseableIterator.scala   |  2 +-
 .../ExecutePlanResponseReattachableIterator.scala  |  4 ++--
 .../spark/sql/conne

[spark] branch master updated: [SPARK-45467][CORE] Replace `Proxy.getProxyClass()` with `Proxy.newProxyInstance().getClass`

2023-10-11 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new acd5dc499d1 [SPARK-45467][CORE] Replace `Proxy.getProxyClass()` with 
`Proxy.newProxyInstance().getClass`
acd5dc499d1 is described below

commit acd5dc499d139ce8b2571a69beab0f971947adb4
Author: YangJie 
AuthorDate: Wed Oct 11 08:49:09 2023 -0500

[SPARK-45467][CORE] Replace `Proxy.getProxyClass()` with 
`Proxy.newProxyInstance().getClass`

### What changes were proposed in this pull request?
This pr replace `Proxy.getProxyClass()` with 
`Proxy.newProxyInstance().getClass` to clean up deprecated api usage ref to


https://github.com/openjdk/jdk/blob/dfacda488bfbe2e11e8d607a6d08527710286982/src/java.base/share/classes/java/lang/reflect/Proxy.java#L376-L391

```
 * deprecated Proxy classes generated in a named module are encapsulated
 *  and not accessible to code outside its module.
 *  {link Constructor#newInstance(Object...) 
Constructor.newInstance}
 *  will throw {code IllegalAccessException} when it is called on
 *  an inaccessible proxy class.
 *  Use {link #newProxyInstance(ClassLoader, Class[], 
InvocationHandler)}
 *  to create a proxy instance instead.
 *
 * see Package and Module Membership of Proxy 
Class
 * revised 9
 */
Deprecated
CallerSensitive
public static Class getProxyClass(ClassLoader loader,
 Class... interfaces)
throws IllegalArgumentException
```

For the `InvocationHandler`, since the `invoke` method  doesn't need to be 
actually called in the current scenario, but the `InvocationHandler` can't be 
null, a new `DummyInvocationHandler` has been added as follows:

```
private[spark] object DummyInvocationHandler extends InvocationHandler {
  override def invoke(proxy: Any, method: Method, args: Array[AnyRef]): 
AnyRef = {
throw new UnsupportedOperationException("Not implemented")
  }
}
```

### Why are the changes needed?
Clean up deprecated API usage.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GitHub Actions

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #43291 from LuciferYang/SPARK-45467.

Lead-authored-by: YangJie 
Co-authored-by: yangjie01 
Signed-off-by: Sean Owen 
---
 .../main/scala/org/apache/spark/serializer/JavaSerializer.scala  | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git 
a/core/src/main/scala/org/apache/spark/serializer/JavaSerializer.scala 
b/core/src/main/scala/org/apache/spark/serializer/JavaSerializer.scala
index 95d2bdc39e1..856e639fcd9 100644
--- a/core/src/main/scala/org/apache/spark/serializer/JavaSerializer.scala
+++ b/core/src/main/scala/org/apache/spark/serializer/JavaSerializer.scala
@@ -18,6 +18,7 @@
 package org.apache.spark.serializer
 
 import java.io._
+import java.lang.reflect.{InvocationHandler, Method, Proxy}
 import java.nio.ByteBuffer
 
 import scala.reflect.ClassTag
@@ -79,7 +80,7 @@ private[spark] class JavaDeserializationStream(in: 
InputStream, loader: ClassLoa
   // scalastyle:off classforname
   val resolved = ifaces.map(iface => Class.forName(iface, false, loader))
   // scalastyle:on classforname
-  java.lang.reflect.Proxy.getProxyClass(loader, resolved: _*)
+  Proxy.newProxyInstance(loader, resolved, DummyInvocationHandler).getClass
 }
 
   }
@@ -88,6 +89,12 @@ private[spark] class JavaDeserializationStream(in: 
InputStream, loader: ClassLoa
   def close(): Unit = { objIn.close() }
 }
 
+private[spark] object DummyInvocationHandler extends InvocationHandler {
+  override def invoke(proxy: Any, method: Method, args: Array[AnyRef]): AnyRef 
= {
+throw new UnsupportedOperationException("Not implemented")
+  }
+}
+
 private object JavaDeserializationStream {
 
   val primitiveMappings = Map[String, Class[_]](


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (11af786b35c -> 97218051308)

2023-10-11 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 11af786b35c [SPARK-45451][SQL] Make the default storage level of 
dataset cache configurable
 add 97218051308 [SPARK-45496][CORE][DSTREAM] Fix the compilation warning 
related to `other-pure-statement`

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/scheduler/OutputCommitCoordinatorSuite.scala | 2 +-
 pom.xml   | 3 ---
 project/SparkBuild.scala  | 4 
 .../org/apache/spark/streaming/util/FileBasedWriteAheadLog.scala  | 2 +-
 4 files changed, 2 insertions(+), 9 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-45377][CORE] Handle InputStream in NettyLogger

2023-10-02 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new cdbb301143d [SPARK-45377][CORE] Handle InputStream in NettyLogger
cdbb301143d is described below

commit cdbb301143de2e9a0ea525d20867948f49863842
Author: Hasnain Lakhani 
AuthorDate: Mon Oct 2 08:27:50 2023 -0500

[SPARK-45377][CORE] Handle InputStream in NettyLogger

### What changes were proposed in this pull request?

Handle `InputStream`s in the `NettyLogger` so we can print out how many 
available bytes there are.

### Why are the changes needed?

As part of the SSL support we are going to transfer `InputStream`s via 
Netty, and this functionality makes it easy to see the size of the streams in 
the log at a glance.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

CI. Tested as part of the changes in 
https://github.com/apache/spark/pull/42685 which this is split out of, I 
observed the logs there.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #43165 from hasnain-db/spark-tls-netty-logger.

Authored-by: Hasnain Lakhani 
Signed-off-by: Sean Owen 
---
 .../main/java/org/apache/spark/network/util/NettyLogger.java  | 11 +++
 1 file changed, 11 insertions(+)

diff --git 
a/common/network-common/src/main/java/org/apache/spark/network/util/NettyLogger.java
 
b/common/network-common/src/main/java/org/apache/spark/network/util/NettyLogger.java
index 9398726a926..f4c0df6239d 100644
--- 
a/common/network-common/src/main/java/org/apache/spark/network/util/NettyLogger.java
+++ 
b/common/network-common/src/main/java/org/apache/spark/network/util/NettyLogger.java
@@ -17,6 +17,9 @@
 
 package org.apache.spark.network.util;
 
+import java.io.IOException;
+import java.io.InputStream;
+
 import io.netty.buffer.ByteBuf;
 import io.netty.buffer.ByteBufHolder;
 import io.netty.channel.ChannelHandlerContext;
@@ -42,6 +45,14 @@ public class NettyLogger {
   } else if (arg instanceof ByteBufHolder) {
 return format(ctx, eventName) + " " +
   ((ByteBufHolder) arg).content().readableBytes() + "B";
+  } else if (arg instanceof InputStream) {
+int available = -1;
+try {
+  available = ((InputStream) arg).available();
+} catch (IOException ex) {
+  // Swallow, but return -1 to indicate an error happened
+}
+return format(ctx, eventName, arg) + " " + available + "B";
   } else {
 return super.format(ctx, eventName, arg);
   }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-45389][SQL][HIVE] Correct MetaException matching rule on getting partition metadata

2023-10-02 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 8b3ad2fc329 [SPARK-45389][SQL][HIVE] Correct MetaException matching 
rule on getting partition metadata
8b3ad2fc329 is described below

commit 8b3ad2fc329e1813366430df7189d27b17133283
Author: Cheng Pan 
AuthorDate: Mon Oct 2 08:25:51 2023 -0500

[SPARK-45389][SQL][HIVE] Correct MetaException matching rule on getting 
partition metadata

### What changes were proposed in this pull request?

This PR aims to fix the HMS call fallback logic introduced in SPARK-35437.

```patch
try {
  ...
  hive.getPartitionNames
  ...
  hive.getPartitionsByNames
} catch {
- case ex: InvocationTargetException if 
ex.getCause.isInstanceOf[MetaException] =>
+ case ex: HiveException if ex.getCause.isInstanceOf[MetaException] =>
  ...
}
```

### Why are the changes needed?

Directly method call won't throw `InvocationTargetException`, and check the 
code of `hive.getPartitionNames` and `hive.getPartitionsByNames`, both of them 
will wrap a `HiveException` if `MetaException` throws.

### Does this PR introduce _any_ user-facing change?

Yes, it should be a bug fix.

### How was this patch tested?

Pass GA and code review. (I'm not sure how to construct/simulate a 
MetaException during the HMS thrift call with the current HMS testing 
infrastructure)

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #43191 from pan3793/SPARK-45389.

Authored-by: Cheng Pan 
Signed-off-by: Sean Owen 
---
 sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala
index 64aa7d2d6fa..9943c0178fc 100644
--- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala
+++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala
@@ -438,7 +438,7 @@ private[client] class Shim_v2_0 extends Shim with Logging {
 recordHiveCall()
 hive.getPartitionsByNames(table, partNames.asJava)
   } catch {
-case ex: InvocationTargetException if 
ex.getCause.isInstanceOf[MetaException] =>
+case ex: HiveException if ex.getCause.isInstanceOf[MetaException] =>
   logWarning("Caught Hive MetaException attempting to get partition 
metadata by " +
 "filter from client side. Falling back to fetching all partition 
metadata", ex)
   recordHiveCall()


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.5 updated: [MINOR][DOCS] Fix Python code sample for StreamingQueryListener: Reporting Metrics programmatically using Asynchronous APIs

2023-10-02 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 845e4f6c5bc [MINOR][DOCS] Fix Python code sample for 
StreamingQueryListener: Reporting Metrics programmatically using Asynchronous 
APIs
845e4f6c5bc is described below

commit 845e4f6c5bcf3a368ee78757f3a74b390cdce5c0
Author: Peter Kaszt 
AuthorDate: Mon Oct 2 07:48:56 2023 -0500

[MINOR][DOCS] Fix Python code sample for StreamingQueryListener: Reporting 
Metrics programmatically using Asynchronous APIs

Fix Python language code sample in the docs for _StreamingQueryListener_:
Reporting Metrics programmatically using Asynchronous APIs section.

### What changes were proposed in this pull request?
The code sample in the [Reporting Metrics programmatically using 
Asynchronous 
APIs](https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#reporting-metrics-programmatically-using-asynchronous-apis)
 section was this:
```
spark = ...

class Listener(StreamingQueryListener):
def onQueryStarted(self, event):
print("Query started: " + queryStarted.id)

def onQueryProgress(self, event):
println("Query terminated: " + queryTerminated.id)

def onQueryTerminated(self, event):
println("Query made progress: " + queryProgress.progress)

spark.streams.addListener(Listener())
```

Which is not a proper Python code, and has QueryProgress and 
QueryTerminated prints mixed. Proposed change/fix:
```
spark = ...

class Listener(StreamingQueryListener):
def onQueryStarted(self, event):
print("Query started: " + queryStarted.id)

def onQueryProgress(self, event):
print("Query made progress: " + queryProgress.progress)

def onQueryTerminated(self, event):
print("Query terminated: " + queryTerminated.id)

spark.streams.addListener(Listener())
```

### Why are the changes needed?
To fix docimentation errors.

### Does this PR introduce _any_ user-facing change?
Yes. -> Sample python code snippet is fixed in docs (see above).

### How was this patch tested?
Checked with github's .md preview, and built the docs according to the 
readme.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #43190 from kasztp/master.

Authored-by: Peter Kaszt 
Signed-off-by: Sean Owen 
(cherry picked from commit d708fd7b68bf0c9964e861cb2c81818d17d7136e)
Signed-off-by: Sean Owen 
---
 docs/structured-streaming-programming-guide.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/structured-streaming-programming-guide.md 
b/docs/structured-streaming-programming-guide.md
index 76a22621a0e..3e87c45a349 100644
--- a/docs/structured-streaming-programming-guide.md
+++ b/docs/structured-streaming-programming-guide.md
@@ -3831,10 +3831,10 @@ class Listener(StreamingQueryListener):
 print("Query started: " + queryStarted.id)
 
 def onQueryProgress(self, event):
-println("Query terminated: " + queryTerminated.id)
+print("Query made progress: " + queryProgress.progress)
 
 def onQueryTerminated(self, event):
-println("Query made progress: " + queryProgress.progress)
+   print("Query terminated: " + queryTerminated.id)
 
 
 spark.streams.addListener(Listener())


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [MINOR][DOCS] Fix Python code sample for StreamingQueryListener: Reporting Metrics programmatically using Asynchronous APIs

2023-10-02 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new d708fd7b68b [MINOR][DOCS] Fix Python code sample for 
StreamingQueryListener: Reporting Metrics programmatically using Asynchronous 
APIs
d708fd7b68b is described below

commit d708fd7b68bf0c9964e861cb2c81818d17d7136e
Author: Peter Kaszt 
AuthorDate: Mon Oct 2 07:48:56 2023 -0500

[MINOR][DOCS] Fix Python code sample for StreamingQueryListener: Reporting 
Metrics programmatically using Asynchronous APIs

Fix Python language code sample in the docs for _StreamingQueryListener_:
Reporting Metrics programmatically using Asynchronous APIs section.

### What changes were proposed in this pull request?
The code sample in the [Reporting Metrics programmatically using 
Asynchronous 
APIs](https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#reporting-metrics-programmatically-using-asynchronous-apis)
 section was this:
```
spark = ...

class Listener(StreamingQueryListener):
def onQueryStarted(self, event):
print("Query started: " + queryStarted.id)

def onQueryProgress(self, event):
println("Query terminated: " + queryTerminated.id)

def onQueryTerminated(self, event):
println("Query made progress: " + queryProgress.progress)

spark.streams.addListener(Listener())
```

Which is not a proper Python code, and has QueryProgress and 
QueryTerminated prints mixed. Proposed change/fix:
```
spark = ...

class Listener(StreamingQueryListener):
def onQueryStarted(self, event):
print("Query started: " + queryStarted.id)

def onQueryProgress(self, event):
print("Query made progress: " + queryProgress.progress)

def onQueryTerminated(self, event):
print("Query terminated: " + queryTerminated.id)

spark.streams.addListener(Listener())
```

### Why are the changes needed?
To fix docimentation errors.

### Does this PR introduce _any_ user-facing change?
Yes. -> Sample python code snippet is fixed in docs (see above).

### How was this patch tested?
Checked with github's .md preview, and built the docs according to the 
readme.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #43190 from kasztp/master.

Authored-by: Peter Kaszt 
Signed-off-by: Sean Owen 
---
 docs/structured-streaming-programming-guide.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/structured-streaming-programming-guide.md 
b/docs/structured-streaming-programming-guide.md
index 70e763be0d7..774422a9cd9 100644
--- a/docs/structured-streaming-programming-guide.md
+++ b/docs/structured-streaming-programming-guide.md
@@ -3837,10 +3837,10 @@ class Listener(StreamingQueryListener):
 print("Query started: " + queryStarted.id)
 
 def onQueryProgress(self, event):
-println("Query terminated: " + queryTerminated.id)
+print("Query made progress: " + queryProgress.progress)
 
 def onQueryTerminated(self, event):
-println("Query made progress: " + queryProgress.progress)
+   print("Query terminated: " + queryTerminated.id)
 
 
 spark.streams.addListener(Listener())


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-45338][SQL][FOLLOWUP] Remove useless `toSeq`

2023-09-28 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 5c4aef4d4ca [SPARK-45338][SQL][FOLLOWUP] Remove useless `toSeq`
5c4aef4d4ca is described below

commit 5c4aef4d4caf753ce9c45d07472df67479371738
Author: Jia Fan 
AuthorDate: Thu Sep 28 19:10:03 2023 -0500

[SPARK-45338][SQL][FOLLOWUP] Remove useless `toSeq`

### What changes were proposed in this pull request?
This is a follow up PR for #43126 , remove useless  invoke `toSeq`

### Why are the changes needed?
Remove useless convert.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
exist test

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #43172 from Hisoka-X/SPARK-45338-followup-remove-toseq.

Authored-by: Jia Fan 
Signed-off-by: Sean Owen 
---
 .../apache/spark/sql/hive/thriftserver/SparkGetColumnsOperation.scala   | 2 +-
 .../apache/spark/sql/hive/thriftserver/SparkGetFunctionsOperation.scala | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git 
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkGetColumnsOperation.scala
 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkGetColumnsOperation.scala
index 990a7162ea4..5dd8caf3f22 100644
--- 
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkGetColumnsOperation.scala
+++ 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkGetColumnsOperation.scala
@@ -87,7 +87,7 @@ private[hive] class SparkGetColumnsOperation(
 }.toMap
 
 if (isAuthV2Enabled) {
-  val privObjs = getPrivObjs(db2Tabs).toSeq.asJava
+  val privObjs = getPrivObjs(db2Tabs).asJava
   authorizeMetaGets(HiveOperationType.GET_COLUMNS, privObjs, cmdStr)
 }
 
diff --git 
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkGetFunctionsOperation.scala
 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkGetFunctionsOperation.scala
index 7fa492befa0..53a94a128c0 100644
--- 
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkGetFunctionsOperation.scala
+++ 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkGetFunctionsOperation.scala
@@ -68,7 +68,7 @@ private[hive] class SparkGetFunctionsOperation(
 if (isAuthV2Enabled) {
   // authorize this call on the schema objects
   val privObjs =
-HivePrivilegeObjectUtils.getHivePrivDbObjects(matchingDbs.toSeq.asJava)
+HivePrivilegeObjectUtils.getHivePrivDbObjects(matchingDbs.asJava)
   authorizeMetaGets(HiveOperationType.GET_FUNCTIONS, privObjs, cmdStr)
 }
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-44895][CORE][UI] Add 'daemon', 'priority' for ThreadStackTrace

2023-09-28 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 6341310711e [SPARK-44895][CORE][UI] Add 'daemon', 'priority' for 
ThreadStackTrace
6341310711e is described below

commit 6341310711ee0e3edbdd42aaeaf806cad4edefb5
Author: Kent Yao 
AuthorDate: Thu Sep 28 18:04:03 2023 -0500

[SPARK-44895][CORE][UI] Add 'daemon', 'priority' for ThreadStackTrace

### What changes were proposed in this pull request?

Since version 9, Java has supported the 'daemon' and 'priority' fields in 
ThreadInfo. In this PR, we extract them from ThreadInfo to ThreadStackTrace

### Why are the changes needed?

more information for thread pages in UI and rest APIs

### Does this PR introduce _any_ user-facing change?

yes, ThreadStackTrace changes

### How was this patch tested?

new tests

### Was this patch authored or co-authored using generative AI tooling?

no

Closes #43095 from yaooqinn/SPARK-44895.

Authored-by: Kent Yao 
Signed-off-by: Sean Owen 
---
 .../main/scala/org/apache/spark/status/api/v1/api.scala| 10 ++
 core/src/main/scala/org/apache/spark/util/Utils.scala  |  4 +++-
 .../test/scala/org/apache/spark/ui/UISeleniumSuite.scala   | 14 ++
 3 files changed, 23 insertions(+), 5 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/status/api/v1/api.scala 
b/core/src/main/scala/org/apache/spark/status/api/v1/api.scala
index 3e4e2f17a77..7a0c69e2948 100644
--- a/core/src/main/scala/org/apache/spark/status/api/v1/api.scala
+++ b/core/src/main/scala/org/apache/spark/status/api/v1/api.scala
@@ -540,19 +540,21 @@ case class ThreadStackTrace(
 lockName: Option[String],
 lockOwnerName: Option[String],
 suspended: Boolean,
-inNative: Boolean) {
+inNative: Boolean,
+isDaemon: Boolean,
+priority: Int) {
 
   /**
* Returns a string representation of this thread stack trace
* w.r.t java.lang.management.ThreadInfo(JDK 8)'s toString.
*
-   * TODO(SPARK-44895): Considering 'daemon', 'priority' from higher JDKs
-   *
* TODO(SPARK-44896): Also considering adding information os_prio, cpu, 
elapsed, tid, nid, etc.,
*   from the jstack tool
*/
   override def toString: String = {
-val sb = new StringBuilder(s""""$threadName" Id=$threadId $threadState""")
+val daemon = if (isDaemon) " daemon" else ""
+val sb = new StringBuilder(
+  s""""$threadName"$daemon prio=$priority Id=$threadId $threadState""")
 lockName.foreach(lock => sb.append(s" on $lock"))
 lockOwnerName.foreach {
   owner => sb.append(s"""owned by "$owner"""")
diff --git a/core/src/main/scala/org/apache/spark/util/Utils.scala 
b/core/src/main/scala/org/apache/spark/util/Utils.scala
index 48dfbecb7cd..dcffa99dc64 100644
--- a/core/src/main/scala/org/apache/spark/util/Utils.scala
+++ b/core/src/main/scala/org/apache/spark/util/Utils.scala
@@ -2196,7 +2196,9 @@ private[spark] object Utils
   Option(threadInfo.getLockName),
   Option(threadInfo.getLockOwnerName),
   threadInfo.isSuspended,
-  threadInfo.isInNative)
+  threadInfo.isInNative,
+  threadInfo.isDaemon,
+  threadInfo.getPriority)
   }
 
   /**
diff --git a/core/src/test/scala/org/apache/spark/ui/UISeleniumSuite.scala 
b/core/src/test/scala/org/apache/spark/ui/UISeleniumSuite.scala
index dd9927d7ba1..7e74cc9287f 100644
--- a/core/src/test/scala/org/apache/spark/ui/UISeleniumSuite.scala
+++ b/core/src/test/scala/org/apache/spark/ui/UISeleniumSuite.scala
@@ -885,6 +885,20 @@ class UISeleniumSuite extends SparkFunSuite with 
WebBrowser with Matchers {
 }
   }
 
+  test("SPARK-44895: Add 'daemon', 'priority' for ThreadStackTrace") {
+withSpark(newSparkContext()) { sc =>
+  val uiThreads = getJson(sc.ui.get, "executors/driver/threads")
+.children
+.filter(v => (v \ 
"threadName").extract[String].matches("SparkUI-\\d+"))
+  val priority = Thread.currentThread().getPriority
+
+  uiThreads.foreach { v =>
+assert((v \ "isDaemon").extract[Boolean])
+assert((v \ "priority").extract[Int] === priority)
+  }
+}
+  }
+
   def goToUi(sc: SparkContext, path: String): Unit = {
 goToUi(sc.ui.get, path)
   }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-45364][INFRA][BUILD] Clean up the unnecessary Scala 2.12 logical in SparkBuild

2023-09-28 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 187e9a85175 [SPARK-45364][INFRA][BUILD] Clean up the unnecessary Scala 
2.12 logical in SparkBuild
187e9a85175 is described below

commit 187e9a851758c0e9cec11edab2bc07d6f4404001
Author: panbingkun 
AuthorDate: Thu Sep 28 08:36:08 2023 -0500

[SPARK-45364][INFRA][BUILD] Clean up the unnecessary Scala 2.12 logical in 
SparkBuild

### What changes were proposed in this pull request?
The pr aims to clean up the unnecessary Scala 2.12 logical in SparkBuild.

### Why are the changes needed?
Spark 4.0 no longer supports Scala 2.12.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #43158 from panbingkun/SPARK-45364.

Authored-by: panbingkun 
Signed-off-by: Sean Owen 
---
 project/SparkBuild.scala | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/project/SparkBuild.scala b/project/SparkBuild.scala
index 85ffda304bc..13c92142d46 100644
--- a/project/SparkBuild.scala
+++ b/project/SparkBuild.scala
@@ -352,10 +352,7 @@ object SparkBuild extends PomBuild {
 "org.apache.spark.util.collection"
   ).mkString(":"),
   "-doc-title", "Spark " + version.value.replaceAll("-SNAPSHOT", "") + " 
ScalaDoc"
-) ++ {
-  // Do not attempt to scaladoc javadoc comments under 2.12 since it can't 
handle inner classes
-  if (scalaBinaryVersion.value == "2.12") Seq("-no-java-comments") else 
Seq.empty
-},
+),
 
 // disable Mima check for all modules,
 // to be enabled in specific ones that have previous artifacts


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (6a9d35f766d -> c5967310740)

2023-09-27 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 6a9d35f766d [SPARK-45354][SQL] Resolve functions bottom-up
 add c5967310740 [SPARK-2][MESOS] Remove Mesos support

No new revisions were added by this update.

Summary of changes:
 .github/labeler.yml|   3 -
 .github/workflows/benchmark.yml|   2 +-
 .github/workflows/build_and_test.yml   |   4 +-
 .github/workflows/maven_test.yml   |  12 +-
 LICENSE-binary |   3 +-
 NOTICE-binary  |   3 -
 R/pkg/tests/fulltests/test_sparkR.R|   6 +-
 README.md  |   2 +-
 assembly/pom.xml   |  10 -
 .../shuffle/protocol/BlockTransferMessage.java |   4 -
 .../shuffle/protocol/mesos/RegisterDriver.java |  77 --
 .../protocol/mesos/ShuffleServiceHeartbeat.java|  53 --
 conf/spark-env.sh.template |   1 -
 .../main/scala/org/apache/spark/SparkConf.scala|   2 +-
 .../main/scala/org/apache/spark/SparkContext.scala |  14 +-
 .../apache/spark/api/java/JavaSparkContext.scala   |  10 +-
 .../org/apache/spark/deploy/PythonRunner.scala |   1 -
 .../org/apache/spark/deploy/SparkSubmit.scala  |  73 +-
 .../apache/spark/deploy/SparkSubmitArguments.scala |   8 +-
 .../spark/deploy/history/HistoryServer.scala   |   2 +-
 .../spark/deploy/rest/RestSubmissionClient.scala   |   4 +-
 .../org/apache/spark/deploy/security/README.md |   2 +-
 .../scala/org/apache/spark/executor/Executor.scala |   5 +-
 .../org/apache/spark/internal/config/package.scala |   9 +-
 .../org/apache/spark/metrics/MetricsSystem.scala   |   3 -
 .../apache/spark/resource/ResourceProfile.scala|   4 +-
 .../apache/spark/scheduler/SchedulerBackend.scala  |   2 +-
 .../apache/spark/scheduler/TaskSchedulerImpl.scala |   2 +-
 .../apache/spark/scheduler/TaskSetManager.scala|   1 -
 .../cluster/CoarseGrainedSchedulerBackend.scala|   5 +-
 .../main/scala/org/apache/spark/util/Utils.scala   |  13 +-
 .../org/apache/spark/SecurityManagerSuite.scala|   2 +-
 .../org/apache/spark/deploy/SparkSubmitSuite.scala |  22 -
 .../deploy/rest/StandaloneRestSubmitSuite.scala|   6 -
 dev/create-release/release-build.sh|   2 +-
 dev/create-release/releaseutils.py |   1 -
 dev/deps/spark-deps-hadoop-3-hive-2.3  |   1 -
 dev/lint-java  |   2 +-
 dev/mima   |   2 +-
 dev/sbt-checkstyle |   2 +-
 dev/scalastyle |   2 +-
 dev/sparktestsupport/modules.py|   8 -
 dev/test-dependencies.sh   |   2 +-
 docs/_config.yml   |   3 +-
 docs/_layouts/global.html  |   1 -
 docs/building-spark.md |   6 +-
 docs/cluster-overview.md   |   8 +-
 docs/configuration.md  |  34 +-
 docs/core-migration-guide.md   |   2 +
 docs/hardware-provisioning.md  |   3 +-
 docs/index.md  |   3 -
 docs/job-scheduling.md |  23 +-
 docs/monitoring.md |   8 -
 docs/rdd-programming-guide.md  |   2 +-
 docs/running-on-mesos.md   | 901 
 docs/security.md   |  26 +-
 docs/spark-standalone.md   |   2 +-
 docs/streaming-programming-guide.md|  21 +-
 docs/submitting-applications.md|  16 -
 .../spark/launcher/AbstractCommandBuilder.java |   1 -
 .../spark/launcher/SparkClassCommandBuilder.java   |  13 +-
 .../launcher/SparkSubmitCommandBuilderSuite.java   |   4 -
 pom.xml|   7 -
 project/SparkBuild.scala   |   4 +-
 python/README.md   |   2 +-
 python/docs/source/user_guide/python_packaging.rst |   2 +-
 python/pyspark/context.py  |   2 +-
 .../scala/org/apache/spark/repl/ReplSuite.scala|  24 -
 .../deploy/k8s/features/LocalDirsFeatureStep.scala |   2 +-
 resource-managers/mesos/pom.xml| 128 ---
 .../mesos/MesosExternalBlockStoreClient.java   | 124 ---
 ...g.apache.spark.scheduler.ExternalClusterManager |  18 -
 .../deploy/mesos/MesosClusterDispatcher.scala  | 136 
 .../mesos/MesosClusterDispatcherArguments.scala| 149 
 .../deploy/mesos/MesosDriverDescription.scala  |  70 --
 .../deploy/mesos/MesosExternalShuffleService.scala

[spark] branch master updated: [SPARK-44539][BUILD] Upgrade RoaringBitmap to 1.0.0

2023-09-27 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 8399dd321af [SPARK-44539][BUILD] Upgrade RoaringBitmap to 1.0.0
8399dd321af is described below

commit 8399dd321afce0cb0051501de55da296595fdf53
Author: panbingkun 
AuthorDate: Wed Sep 27 11:53:53 2023 -0500

[SPARK-44539][BUILD] Upgrade RoaringBitmap to 1.0.0

### What changes were proposed in this pull request?
- The pr aims to upgrade RoaringBitmap from 0.9.45 to 1.0.0.
- From version 1.0.0, the `ArraysShim` class has been moved from 
`shims-x.x.x.jar` jar to `RoaringBitmap-x.x.x.jar` jar, so we no longer need to 
rely on it.

### Why are the changes needed?
- The newest brings some improvments, eg:
Add zero-garbage deserialiser for ByteBuffer to RoaringBitmap by shikharid 
in https://github.com/RoaringBitmap/RoaringBitmap/pull/650
More specialized method for value decrementation by xtonik in 
https://github.com/RoaringBitmap/RoaringBitmap/pull/640
Duplicated small array sort routine by xtonik in 
https://github.com/RoaringBitmap/RoaringBitmap/pull/638
Avoid intermediate byte array creation by xtonik in 
https://github.com/RoaringBitmap/RoaringBitmap/pull/635
Useless back and forth BD bytes conversion by xtonik in 
https://github.com/RoaringBitmap/RoaringBitmap/pull/636

- The full release notes:
https://github.com/RoaringBitmap/RoaringBitmap/releases/tag/1.0.0
https://github.com/RoaringBitmap/RoaringBitmap/releases/tag/0.9.49
https://github.com/RoaringBitmap/RoaringBitmap/releases/tag/0.9.48
https://github.com/RoaringBitmap/RoaringBitmap/releases/tag/0.9.47
https://github.com/RoaringBitmap/RoaringBitmap/releases/tag/0.9.46

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA.

Closes #42143 from panbingkun/SPARK-44539.

Authored-by: panbingkun 
Signed-off-by: Sean Owen 
---
 core/benchmarks/MapStatusesConvertBenchmark-jdk21-results.txt | 6 +++---
 core/benchmarks/MapStatusesConvertBenchmark-results.txt   | 8 
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 3 +--
 pom.xml   | 2 +-
 4 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/core/benchmarks/MapStatusesConvertBenchmark-jdk21-results.txt 
b/core/benchmarks/MapStatusesConvertBenchmark-jdk21-results.txt
index 48dbc8e0241..416aaf5b7aa 100644
--- a/core/benchmarks/MapStatusesConvertBenchmark-jdk21-results.txt
+++ b/core/benchmarks/MapStatusesConvertBenchmark-jdk21-results.txt
@@ -6,8 +6,8 @@ OpenJDK 64-Bit Server VM 21+35 on Linux 5.15.0-1046-azure
 Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
 MapStatuses Convert:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
 

-Num Maps: 5 Fetch partitions:500813900 
129  0.0   812807240.0   1.0X
-Num Maps: 5 Fetch partitions:1000  2226   2238 
 17  0.0  2226321250.0   0.4X
-Num Maps: 5 Fetch partitions:1500  3149   3300 
133  0.0  3148506179.0   0.3X
+Num Maps: 5 Fetch partitions:500899949 
 74  0.0   898941184.0   1.0X
+Num Maps: 5 Fetch partitions:1000  1947   2043 
115  0.0  1947362412.0   0.5X
+Num Maps: 5 Fetch partitions:1500  3079   3122 
 75  0.0  3078809212.0   0.3X
 
 
diff --git a/core/benchmarks/MapStatusesConvertBenchmark-results.txt 
b/core/benchmarks/MapStatusesConvertBenchmark-results.txt
index 5ed55c839eb..bd87f4876e4 100644
--- a/core/benchmarks/MapStatusesConvertBenchmark-results.txt
+++ b/core/benchmarks/MapStatusesConvertBenchmark-results.txt
@@ -3,11 +3,11 @@ MapStatuses Convert Benchmark
 

 
 OpenJDK 64-Bit Server VM 17.0.8+7-LTS on Linux 5.15.0-1046-azure
-Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
 MapStatuses Convert:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
 

-Num Maps: 5 Fetch partitions:500   1127   1138 
 13  0.0  1127479807.0   1.0X
-Num Maps: 5 Fetch partitions:1000  2146   2183 
 49  0.0  2146214882.0   0.5X
-Num Maps

[spark] branch master updated: [SPARK-45343][DOCS] Clarify behavior of multiLine in CSV options

2023-09-27 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new ab92cae78e3 [SPARK-45343][DOCS] Clarify behavior of multiLine in CSV 
options
ab92cae78e3 is described below

commit ab92cae78e3cdf58ba96b0b98e7958287c2d5cd1
Author: Bill Schneider 
AuthorDate: Wed Sep 27 08:25:02 2023 -0500

[SPARK-45343][DOCS] Clarify behavior of multiLine in CSV options

### What changes were proposed in this pull request?

this is a documentation-only change to clarify CSV `multiLine` option:
https://issues.apache.org/jira/browse/SPARK-45343

### Why are the changes needed?

documentation clarity

### Does this PR introduce _any_ user-facing change?

Documentation only

### How was this patch tested?

N/A, documentation only

### Was this patch authored or co-authored using generative AI tooling?

Documentation only

Closes #43132 from wrschneider/SPARK-45343-csv-multiline-doc-clarification.

Lead-authored-by: Bill Schneider 
Co-authored-by: Bill Schneider 
Signed-off-by: Sean Owen 
---
 docs/sql-data-sources-csv.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/sql-data-sources-csv.md b/docs/sql-data-sources-csv.md
index 31167f55143..721563d1681 100644
--- a/docs/sql-data-sources-csv.md
+++ b/docs/sql-data-sources-csv.md
@@ -213,7 +213,7 @@ Data source options of CSV can be set via:
   
 multiLine
 false
-Parse one record, which may span multiple lines, per file. CSV 
built-in functions ignore this option.
+Allows a row to span multiple lines, by parsing line breaks within 
quoted values as part of the value itself. CSV built-in functions ignore this 
option.
 read
   
   


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (9f817379c68 -> b7763a7eae2)

2023-09-27 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 9f817379c68 [SPARK-45341][CORE] Correct the title level in the 
comments of KVStore.java to make `sbt doc` run successfully with Java 17
 add b7763a7eae2 [SPARK-45338][CORE][SQL] Replace 
`scala.collection.JavaConverters` to `scala.jdk.CollectionConverters`

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/ErrorClassesJSONReader.scala  |  2 +-
 .../src/main/scala/org/apache/spark/SparkException.scala |  2 +-
 .../scala/org/apache/spark/SparkThrowableHelper.scala|  2 +-
 .../main/scala/org/apache/spark/internal/Logging.scala   |  2 +-
 .../org/apache/spark/sql/avro/AvroDeserializer.scala |  2 +-
 .../org/apache/spark/sql/avro/AvroOutputWriter.scala |  2 +-
 .../scala/org/apache/spark/sql/avro/AvroSerializer.scala |  2 +-
 .../main/scala/org/apache/spark/sql/avro/AvroUtils.scala |  2 +-
 .../org/apache/spark/sql/avro/SchemaConverters.scala |  2 +-
 .../main/scala/org/apache/spark/sql/avro/functions.scala |  2 +-
 .../scala/org/apache/spark/sql/v2/avro/AvroScan.scala|  2 +-
 .../scala/org/apache/spark/sql/v2/avro/AvroTable.scala   |  2 +-
 .../org/apache/spark/sql/avro/AvroFunctionsSuite.scala   |  2 +-
 .../test/scala/org/apache/spark/sql/avro/AvroSuite.scala |  2 +-
 .../jvm/src/main/scala/org/apache/spark/sql/Column.scala |  2 +-
 .../org/apache/spark/sql/DataFrameNaFunctions.scala  |  2 +-
 .../scala/org/apache/spark/sql/DataFrameReader.scala |  2 +-
 .../org/apache/spark/sql/DataFrameStatFunctions.scala|  2 +-
 .../scala/org/apache/spark/sql/DataFrameWriter.scala |  2 +-
 .../scala/org/apache/spark/sql/DataFrameWriterV2.scala   |  2 +-
 .../src/main/scala/org/apache/spark/sql/Dataset.scala|  2 +-
 .../org/apache/spark/sql/KeyValueGroupedDataset.scala|  2 +-
 .../org/apache/spark/sql/RelationalGroupedDataset.scala  |  2 +-
 .../main/scala/org/apache/spark/sql/SparkSession.scala   |  2 +-
 .../main/scala/org/apache/spark/sql/avro/functions.scala |  2 +-
 .../scala/org/apache/spark/sql/catalog/Catalog.scala |  2 +-
 .../spark/sql/expressions/UserDefinedFunction.scala  |  2 +-
 .../org/apache/spark/sql/expressions/WindowSpec.scala|  2 +-
 .../src/main/scala/org/apache/spark/sql/functions.scala  |  2 +-
 .../scala/org/apache/spark/sql/protobuf/functions.scala  |  2 +-
 .../apache/spark/sql/streaming/DataStreamReader.scala|  2 +-
 .../apache/spark/sql/streaming/DataStreamWriter.scala|  2 +-
 .../org/apache/spark/sql/streaming/StreamingQuery.scala  |  2 +-
 .../spark/sql/streaming/StreamingQueryManager.scala  |  2 +-
 .../scala/org/apache/spark/sql/streaming/progress.scala  |  2 +-
 .../scala/org/apache/spark/sql/ClientE2ETestSuite.scala  |  2 +-
 .../scala/org/apache/spark/sql/ColumnTestSuite.scala |  2 +-
 .../org/apache/spark/sql/DataFrameNaFunctionSuite.scala  |  2 +-
 .../scala/org/apache/spark/sql/FunctionTestSuite.scala   |  2 +-
 .../org/apache/spark/sql/PlanGenerationTestSuite.scala   |  2 +-
 .../spark/sql/UserDefinedFunctionE2ETestSuite.scala  |  2 +-
 .../apache/spark/sql/connect/client/ArtifactSuite.scala  |  2 +-
 .../sql/connect/client/SparkConnectClientSuite.scala |  2 +-
 .../spark/sql/streaming/ClientStreamingQuerySuite.scala  |  2 +-
 .../spark/sql/connect/client/ArtifactManager.scala   |  2 +-
 .../apache/spark/sql/connect/client/ClassFinder.scala|  2 +-
 .../connect/client/CustomSparkConnectBlockingStub.scala  |  2 +-
 .../client/ExecutePlanResponseReattachableIterator.scala |  2 +-
 .../sql/connect/client/GrpcExceptionConverter.scala  |  2 +-
 .../spark/sql/connect/client/SparkConnectClient.scala|  2 +-
 .../sql/connect/client/arrow/ArrowEncoderUtils.scala |  2 +-
 .../spark/sql/connect/client/arrow/ArrowSerializer.scala |  2 +-
 .../sql/connect/common/LiteralValueProtoConverter.scala  |  2 +-
 .../org/apache/spark/sql/connect/common/ProtoUtils.scala |  2 +-
 .../org/apache/spark/sql/connect/common/UdfUtils.scala   |  2 +-
 .../apache/spark/sql/connect/SparkConnectPlugin.scala|  2 +-
 .../connect/artifact/SparkConnectArtifactManager.scala   |  2 +-
 .../scala/org/apache/spark/sql/connect/dsl/package.scala |  2 +-
 .../connect/execution/SparkConnectPlanExecution.scala|  2 +-
 .../spark/sql/connect/planner/SparkConnectPlanner.scala  |  2 +-
 .../connect/planner/StreamingForeachBatchHelper.scala|  2 +-
 .../apache/spark/sql/connect/service/ExecuteHolder.scala |  2 +-
 .../apache/spark/sql/connect/service/SessionHolder.scala |  2 +-
 .../sql/connect/service/SparkConnectAnalyzeHandler.scala |  2 +-
 .../service/SparkConnectArtifactStatusesHandler.scala|  2 +-
 .../sql/connect/service/SparkConnectConfigHandler.scala  |  2 +-
 .../connect/service/SparkConnectExecutionManager.scala   |  2 +-
 .../connect/service/SparkConnectInterruptHandler.scala   |  2

[spark] branch master updated: [SPARK-45341][CORE] Correct the title level in the comments of KVStore.java to make `sbt doc` run successfully with Java 17

2023-09-27 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git

The following commit(s) were added to refs/heads/master by this push:
new 9f817379c68 [SPARK-45341][CORE] Correct the title level in the
comments of KVStore.java to make `sbt doc` run successfully with Java 17
9f817379c68 is described below

commit 9f817379c68e551680e60900f1d61b70e1b62960
Author: yangjie01
AuthorDate: Wed Sep 27 08:21:05 2023 -0500

[SPARK-45341][CORE] Correct the title level in the comments of KVStore.java
to make `sbt doc` run successfully with Java 17

### What changes were proposed in this pull request?
This pr aims to correct the title level in the comments of `KVStore.java`
to make `sbt doc` run successfully with Java 17.

### Why are the changes needed?
Make the `sbt doc` command execute successfully with Java 17

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
- Manually check.

run `build/sbt clean doc -Phadoop-3 -Phadoop-cloud -Pmesos -Pyarn
-Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive
-Pvolcano`

**Before**

```
[error] /Users/yangjie01/SourceCode/git/spark-mine-sbt/Picked up
JAVA_TOOL_OPTIONS:-Duser.language=en
[error] Loading source file
/Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBTypeInfo.java...
[error] Loading source file
/Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/src/main/java/org/apache/spark/util/kvstore/ArrayWrappers.java...
[error] Loading source file
/Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/src/main/java/org/apache/spark/util/kvstore/KVIndex.java...
[error] Loading source file
/Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/src/main/java/org/apache/spark/util/kvstore/InMemoryStore.java...
[error] Loading source file
/Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBIterator.java...
[error] Loading source file
/Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/src/main/java/org/apache/spark/util/kvstore/RocksDB.java...
[error] Loading source file
/Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/src/main/java/org/apache/spark/util/kvstore/RocksDBTypeInfo.java...
[error] Loading source file
/Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/src/main/java/org/apache/spark/util/kvstore/UnsupportedStoreVersionException.java...
[error] Loading source file
/Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDB.java...
[error] Loading source file
/Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/src/main/java/org/apache/spark/util/kvstore/KVStoreIterator.java...
[error] Loading source file
/Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/src/main/java/org/apache/spark/util/kvstore/KVStore.java...
[error] Loading source file
/Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/src/main/java/org/apache/spark/util/kvstore/KVStoreView.java...
[error] Loading source file
/Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/src/main/java/org/apache/spark/util/kvstore/KVTypeInfo.java...
[error] Loading source file
/Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/src/main/java/org/apache/spark/util/kvstore/RocksDBIterator.java...
[error] Loading source file
/Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/src/main/java/org/apache/spark/util/kvstore/KVStoreSerializer.java...
[error] Constructing Javadoc information...
[error] Building index for all the packages and classes...
[error] Standard Doclet version 17.0.8+7-LTS
[error] Building tree for all the packages and classes...
[error]
/Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/src/main/java/org/apache/spark/util/kvstore/KVStore.java:32:1:
error: heading used out of sequence: , compared to implicit preceding
heading:
[error] * Serialization
[error]^Generating
/Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/target/scala-2.13/api/org/apache/spark/util/kvstore/InMemoryStore.html...
[error] Generating
/Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/target/scala-2.13/api/org/apache/spark/util/kvstore/KVIndex.html...
[error] Generating
/Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/target/scala-2.13/api/org/apache/spark/util/kvstore/KVStore.html...
[error] Generating
/Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/target/scala-2.13/api/org/apache/spark/util/kvstore/KVStoreIterator.html...
[error] Generating
/Users/yangjie01/SourceCode/git/spark-mine-sbt/common

[spark] branch master updated: [SPARK-45334][SQL] Remove misleading comment in parquetSchemaConverter

2023-09-26 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 7e8aafd2c0f [SPARK-45334][SQL] Remove misleading comment in 
parquetSchemaConverter
7e8aafd2c0f is described below

commit 7e8aafd2c0f1f6fcd03a69afe2b85fd3fda95d20
Author: lanmengran1 
AuthorDate: Tue Sep 26 21:01:02 2023 -0500

[SPARK-45334][SQL] Remove misleading comment in parquetSchemaConverter

### What changes were proposed in this pull request?

Remove one line of comment, the detail info is described in JIRA 
https://issues.apache.org/jira/browse/SPARK-45334

### Why are the changes needed?

The comment is outdated and misleading.
- the parquet-hive module has been removed from the parquet-mr project 
https://issues.apache.org/jira/browse/PARQUET-1676
- Hive always uses "array_element" as the name

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

No need

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #43119 from 
amoylan2/remove_misleading_comment_in_parquetSchemaConverter.

Authored-by: lanmengran1 
Signed-off-by: Sean Owen 
---
 .../spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala | 1 -
 1 file changed, 1 deletion(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala
index 9c9e7ce729c..eedd165278a 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala
@@ -646,7 +646,6 @@ class SparkToParquetSchemaConverter(
   .buildGroup(repetition).as(LogicalTypeAnnotation.listType())
   .addField(Types
 .buildGroup(REPEATED)
-// "array" is the name chosen by parquet-hive (1.7.0 and prior 
version)
 .addField(convertField(StructField("array", elementType, 
nullable)))
 .named("bag"))
   .named(field.name)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-44366][BUILD] Upgrade antlr4 to 4.13.1

2023-09-26 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 13cd291c354 [SPARK-44366][BUILD] Upgrade antlr4 to 4.13.1
13cd291c354 is described below

commit 13cd291c3549467dfd5d10a665e2d6a577f35bcb
Author: yangjie01 
AuthorDate: Tue Sep 26 11:14:21 2023 -0500

[SPARK-44366][BUILD] Upgrade antlr4 to 4.13.1

### What changes were proposed in this pull request?
This pr is aims upgrade `antlr4` from 4.9.3 to 4.13.1

### Why are the changes needed?
After 4.10, antlr4 is using Java 11 for the source code and the compiled 
.class files for the ANTLR tool. There are some bug fix and Improvements after 
4.9.3:
- https://github.com/antlr/antlr4/pull/3399
- https://github.com/antlr/antlr4/issues/1105
- https://github.com/antlr/antlr4/issues/2788
- https://github.com/antlr/antlr4/pull/3957
- https://github.com/antlr/antlr4/pull/4394

The full release notes as follows:

- https://github.com/antlr/antlr4/releases/tag/4.13.1
- https://github.com/antlr/antlr4/releases/tag/4.13.0
- https://github.com/antlr/antlr4/releases/tag/4.12.0
- https://github.com/antlr/antlr4/releases/tag/4.11.1
- https://github.com/antlr/antlr4/releases/tag/4.11.0
- https://github.com/antlr/antlr4/releases/tag/4.10.1
- https://github.com/antlr/antlr4/releases/tag/4.10

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GitHub Actions

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #43075 from LuciferYang/antlr4-4131.

Authored-by: yangjie01 
Signed-off-by: Sean Owen 
---
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +-
 pom.xml   | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index 206361e1efa..5c17d727b0a 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -12,7 +12,7 @@ aliyun-java-sdk-ram/3.1.0//aliyun-java-sdk-ram-3.1.0.jar
 aliyun-sdk-oss/3.13.0//aliyun-sdk-oss-3.13.0.jar
 annotations/17.0.0//annotations-17.0.0.jar
 antlr-runtime/3.5.2//antlr-runtime-3.5.2.jar
-antlr4-runtime/4.9.3//antlr4-runtime-4.9.3.jar
+antlr4-runtime/4.13.1//antlr4-runtime-4.13.1.jar
 aopalliance-repackaged/2.6.1//aopalliance-repackaged-2.6.1.jar
 arpack/3.0.3//arpack-3.0.3.jar
 arpack_combined_all/0.1//arpack_combined_all-0.1.jar
diff --git a/pom.xml b/pom.xml
index 5fd3e173857..1d0ab387900 100644
--- a/pom.xml
+++ b/pom.xml
@@ -212,7 +212,7 @@
 3.0.0
 0.12.0
 
-4.9.3
+4.13.1
 1.1
 4.12.1
 4.12.0


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-45248][CORE] Set the timeout for spark ui server

2023-09-25 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 273a375cd314 [SPARK-45248][CORE] Set the timeout for spark ui server
273a375cd314 is described below

commit 273a375cd314fbf52b5f2538526374f6b24fb2cf
Author: chenyu <119398199+chenyu-opensou...@users.noreply.github.com>
AuthorDate: Mon Sep 25 22:38:27 2023 -0500

[SPARK-45248][CORE] Set the timeout for spark ui server

**What changes were proposed in this pull request?**
The PR supports to set the timeout for spark ui server.

**Why are the changes needed?**
It can avoid slow HTTP Denial of Service Attack because the jetty server's 
timeout is 30 for deafult.

**Does this PR introduce any user-facing change?**
No

**How was this patch tested?**
Manual review

**Was this patch authored or co-authored using generative AI tooling?**
No

Closes #43078 from chenyu-opensource/branch-SPARK-45248-new.

Authored-by: chenyu <119398199+chenyu-opensou...@users.noreply.github.com>
Signed-off-by: Sean Owen 
---
 core/src/main/scala/org/apache/spark/ui/JettyUtils.scala | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala 
b/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala
index 9582bdbf5264..22adcbc32ed8 100644
--- a/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala
+++ b/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala
@@ -296,6 +296,8 @@ private[spark] object JettyUtils extends Logging {
 connector.setPort(port)
 connector.setHost(hostName)
 connector.setReuseAddress(!Utils.isWindows)
+ // spark-45248: set the idle timeout to prevent slow DoS
+connector.setIdleTimeout(8000)
 
 // Currently we only use "SelectChannelConnector"
 // Limit the max acceptor number to 8 so that we don't waste a lot of 
threads


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.3 updated: [SPARK-45286][DOCS] Add back Matomo analytics

2023-09-24 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch branch-3.3
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.3 by this push:
 new 9a28200f6e4 [SPARK-45286][DOCS] Add back Matomo analytics
9a28200f6e4 is described below

commit 9a28200f6e461c4929dd6e05b6dd55fe984c0924
Author: Sean Owen 
AuthorDate: Sun Sep 24 14:17:55 2023 -0500

[SPARK-45286][DOCS] Add back Matomo analytics

### What changes were proposed in this pull request?

Add analytics to doc pages using the ASF's Matomo service

### Why are the changes needed?

We had previously removed Google Analytics from the website and release 
docs, per ASF policy: https://github.com/apache/spark/pull/36310

We just restored analytics using the ASF-hosted Matomo service on the 
website:

https://github.com/apache/spark-website/commit/a1548627b48a62c2e51870d1488ca3e09397bd30

This change would put the same new tracking code back into the release 
docs. It would let us see what docs and resources are most used, I suppose.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

N/A

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #43063 from srowen/SPARK-45286.

Authored-by: Sean Owen 
Signed-off-by: Sean Owen 
(cherry picked from commit a881438114ea3e8e918d981ef89ed1ab956d6fca)
Signed-off-by: Sean Owen 
---
 docs/_layouts/global.html | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/docs/_layouts/global.html b/docs/_layouts/global.html
index d4463922766..2d139f5e0fb 100755
--- a/docs/_layouts/global.html
+++ b/docs/_layouts/global.html
@@ -33,6 +33,25 @@
 https://cdn.jsdelivr.net/npm/docsearch.js@2/dist/cdn/docsearch.min.css"; />
 
 
+{% production %}
+
+
+var _paq = window._paq = window._paq || [];
+/* tracker methods like "setCustomDimension" should be called 
before "trackPageView" */
+_paq.push(["disableCookies"]);
+_paq.push(['trackPageView']);
+_paq.push(['enableLinkTracking']);
+(function() {
+  var u="<a  rel="nofollow" href="https://analytics.apache.org/"">https://analytics.apache.org/"</a>;;
+  _paq.push(['setTrackerUrl', u+'matomo.php']);
+  _paq.push(['setSiteId', '40']);
+  var d=document, g=d.createElement('script'), 
s=d.getElementsByTagName('script')[0];
+  g.async=true; g.src=u+'matomo.js'; 
s.parentNode.insertBefore(g,s);
+})();
+
+
+{% endproduction %}
+

[spark] branch branch-3.4 updated: [SPARK-45286][DOCS] Add back Matomo analytics

2023-09-24 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new 20924aa581a [SPARK-45286][DOCS] Add back Matomo analytics
20924aa581a is described below

commit 20924aa581a2c5c49ec700689f1888dd7db79e6b
Author: Sean Owen 
AuthorDate: Sun Sep 24 14:17:55 2023 -0500

[SPARK-45286][DOCS] Add back Matomo analytics

### What changes were proposed in this pull request?

Add analytics to doc pages using the ASF's Matomo service

### Why are the changes needed?

We had previously removed Google Analytics from the website and release 
docs, per ASF policy: https://github.com/apache/spark/pull/36310

We just restored analytics using the ASF-hosted Matomo service on the 
website:

https://github.com/apache/spark-website/commit/a1548627b48a62c2e51870d1488ca3e09397bd30

This change would put the same new tracking code back into the release 
docs. It would let us see what docs and resources are most used, I suppose.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

N/A

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #43063 from srowen/SPARK-45286.

Authored-by: Sean Owen 
Signed-off-by: Sean Owen 
(cherry picked from commit a881438114ea3e8e918d981ef89ed1ab956d6fca)
Signed-off-by: Sean Owen 
---
 docs/_layouts/global.html | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/docs/_layouts/global.html b/docs/_layouts/global.html
index d4463922766..2d139f5e0fb 100755
--- a/docs/_layouts/global.html
+++ b/docs/_layouts/global.html
@@ -33,6 +33,25 @@
 https://cdn.jsdelivr.net/npm/docsearch.js@2/dist/cdn/docsearch.min.css"; />
 
 
+{% production %}
+
+
+var _paq = window._paq = window._paq || [];
+/* tracker methods like "setCustomDimension" should be called 
before "trackPageView" */
+_paq.push(["disableCookies"]);
+_paq.push(['trackPageView']);
+_paq.push(['enableLinkTracking']);
+(function() {
+  var u="<a  rel="nofollow" href="https://analytics.apache.org/"">https://analytics.apache.org/"</a>;;
+  _paq.push(['setTrackerUrl', u+'matomo.php']);
+  _paq.push(['setSiteId', '40']);
+  var d=document, g=d.createElement('script'), 
s=d.getElementsByTagName('script')[0];
+  g.async=true; g.src=u+'matomo.js'; 
s.parentNode.insertBefore(g,s);
+})();
+
+
+{% endproduction %}
+

[spark] branch branch-3.5 updated: [SPARK-45286][DOCS] Add back Matomo analytics

2023-09-24 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 609306ff5da [SPARK-45286][DOCS] Add back Matomo analytics
609306ff5da is described below

commit 609306ff5daa8ff7c2212088d33c0911ad0f4989
Author: Sean Owen 
AuthorDate: Sun Sep 24 14:17:55 2023 -0500

[SPARK-45286][DOCS] Add back Matomo analytics

### What changes were proposed in this pull request?

Add analytics to doc pages using the ASF's Matomo service

### Why are the changes needed?

We had previously removed Google Analytics from the website and release 
docs, per ASF policy: https://github.com/apache/spark/pull/36310

We just restored analytics using the ASF-hosted Matomo service on the 
website:

https://github.com/apache/spark-website/commit/a1548627b48a62c2e51870d1488ca3e09397bd30

This change would put the same new tracking code back into the release 
docs. It would let us see what docs and resources are most used, I suppose.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

N/A

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #43063 from srowen/SPARK-45286.

Authored-by: Sean Owen 
Signed-off-by: Sean Owen 
(cherry picked from commit a881438114ea3e8e918d981ef89ed1ab956d6fca)
Signed-off-by: Sean Owen 
---
 docs/_layouts/global.html | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/docs/_layouts/global.html b/docs/_layouts/global.html
index 9b7c4692461..8c4435fdf31 100755
--- a/docs/_layouts/global.html
+++ b/docs/_layouts/global.html
@@ -32,6 +32,25 @@
 https://cdn.jsdelivr.net/npm/docsearch.js@2/dist/cdn/docsearch.min.css"; />
 
 
+{% production %}
+
+
+var _paq = window._paq = window._paq || [];
+/* tracker methods like "setCustomDimension" should be called 
before "trackPageView" */
+_paq.push(["disableCookies"]);
+_paq.push(['trackPageView']);
+_paq.push(['enableLinkTracking']);
+(function() {
+  var u="<a  rel="nofollow" href="https://analytics.apache.org/"">https://analytics.apache.org/"</a>;;
+  _paq.push(['setTrackerUrl', u+'matomo.php']);
+  _paq.push(['setSiteId', '40']);
+  var d=document, g=d.createElement('script'), 
s=d.getElementsByTagName('script')[0];
+  g.async=true; g.src=u+'matomo.js'; 
s.parentNode.insertBefore(g,s);
+})();
+
+
+{% endproduction %}
+

[spark] branch master updated: [SPARK-45286][DOCS] Add back Matomo analytics

2023-09-24 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new a881438114e [SPARK-45286][DOCS] Add back Matomo analytics
a881438114e is described below

commit a881438114ea3e8e918d981ef89ed1ab956d6fca
Author: Sean Owen 
AuthorDate: Sun Sep 24 14:17:55 2023 -0500

[SPARK-45286][DOCS] Add back Matomo analytics

### What changes were proposed in this pull request?

Add analytics to doc pages using the ASF's Matomo service

### Why are the changes needed?

We had previously removed Google Analytics from the website and release 
docs, per ASF policy: https://github.com/apache/spark/pull/36310

We just restored analytics using the ASF-hosted Matomo service on the 
website:

https://github.com/apache/spark-website/commit/a1548627b48a62c2e51870d1488ca3e09397bd30

This change would put the same new tracking code back into the release 
docs. It would let us see what docs and resources are most used, I suppose.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

N/A

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #43063 from srowen/SPARK-45286.

Authored-by: Sean Owen 
Signed-off-by: Sean Owen 
---
 docs/_layouts/global.html | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/docs/_layouts/global.html b/docs/_layouts/global.html
index e857efad6f0..c2f05cfd6bb 100755
--- a/docs/_layouts/global.html
+++ b/docs/_layouts/global.html
@@ -32,6 +32,25 @@
 https://cdn.jsdelivr.net/npm/docsearch.js@2/dist/cdn/docsearch.min.css"; />
 
 
+{% production %}
+
+
+var _paq = window._paq = window._paq || [];
+/* tracker methods like "setCustomDimension" should be called 
before "trackPageView" */
+_paq.push(["disableCookies"]);
+_paq.push(['trackPageView']);
+_paq.push(['enableLinkTracking']);
+(function() {
+  var u="<a  rel="nofollow" href="https://analytics.apache.org/"">https://analytics.apache.org/"</a>;;
+  _paq.push(['setTrackerUrl', u+'matomo.php']);
+  _paq.push(['setSiteId', '40']);
+  var d=document, g=d.createElement('script'), 
s=d.getElementsByTagName('script')[0];
+  g.async=true; g.src=u+'matomo.js'; 
s.parentNode.insertBefore(g,s);
+})();
+
+
+{% endproduction %}
+

[spark] branch master updated: [SPARK-45148][BUILD] Upgrade scalatest related dependencies to the 3.2.17 series

2023-09-18 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new c1c58698d3d [SPARK-45148][BUILD] Upgrade scalatest related 
dependencies to the 3.2.17 series
c1c58698d3d is described below

commit c1c58698d3d6b1447045fad592f8dfb0395989d1
Author: yangjie01 
AuthorDate: Mon Sep 18 10:01:47 2023 -0500

[SPARK-45148][BUILD] Upgrade scalatest related dependencies to the 3.2.17 
series

### What changes were proposed in this pull request?
This pr aims upgrade `scalatest` related test dependencies to 3.2.17:

- scalatest: upgrade scalatest to 3.2.17
- scalatestplus
   - scalacheck: upgrade to `scalacheck-1-17` 3.2.17.0
   - mockito: upgrade to `mockito-4-11` to 3.2.17.0
   - selenium: uprade to `selenium-4-12` to 3.2.17.0 and `selenium-java` to 
4.12.1, `htmlunit-driver` to 4.12.0, byte-buddy and byte-buddy-agent to 1.14.5

### Why are the changes needed?
The release notes as follows:

- 
scalatest:https://github.com/scalatest/scalatest/releases/tag/release-3.2.17
- scalatestplus
   - scalacheck-1-17: 
https://github.com/scalatest/scalatestplus-scalacheck/releases/tag/release-3.2.17.0-for-scalacheck-1.17
   - mockito-4-11: 
https://github.com/scalatest/scalatestplus-mockito/releases/tag/release-3.2.17.0-for-mockito-4.11
   - selenium-4-12: 
https://github.com/scalatest/scalatestplus-selenium/releases/tag/release-3.2.17.0-for-selenium-4.12

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?

- Pass GitHub Actions
- Manual test:
   - ChromeUISeleniumSuite
   - RocksDBBackendChromeUIHistoryServerSuite

```
build/sbt -Dguava.version=32.1.2-jre 
-Dspark.test.webdriver.chrome.driver=/Users/yangjie01/Tools/chromedriver 
-Dtest.default.exclude.tags="" -Phive -Phive-thriftserver "core/testOnly 
org.apache.spark.ui.ChromeUISeleniumSuite"

build/sbt -Dguava.version=32.1.2-jre 
-Dspark.test.webdriver.chrome.driver=/Users/yangjie01/Tools/chromedriver 
-Dtest.default.exclude.tags="" -Phive -Phive-thriftserver "core/testOnly 
org.apache.spark.deploy.history.RocksDBBackendChromeUIHistoryServerSuite"
```

```
[info] ChromeUISeleniumSuite:
[info] - SPARK-31534: text for tooltip should be escaped (1 second, 809 
milliseconds)
[info] - SPARK-31882: Link URL for Stage DAGs should not depend on paged 
table. (604 milliseconds)
[info] - SPARK-31886: Color barrier execution mode RDD correctly (252 
milliseconds)
[info] - Search text for paged tables should not be saved (1 second, 309 
milliseconds)
[info] Run completed in 6 seconds, 116 milliseconds.
[info] Total number of tests run: 4
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 4, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
```

```
[info] RocksDBBackendChromeUIHistoryServerSuite:
[info] - ajax rendered relative links are prefixed with uiRoot 
(spark.ui.proxyBase) (1 second, 615 milliseconds)
[info] Run completed in 5 seconds, 130 milliseconds.
[info] Total number of tests run: 1
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 27 s, completed 2023-9-14 11:29:27
```

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #42906 from LuciferYang/SPARK-45148.

Lead-authored-by: yangjie01 
Co-authored-by: YangJie 
Signed-off-by: Sean Owen 
---
 pom.xml | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/pom.xml b/pom.xml
index 779f9e64f1d..971cb07ea40 100644
--- a/pom.xml
+++ b/pom.xml
@@ -214,8 +214,8 @@
 
 4.9.3
 1.1
-4.9.1
-4.9.1
+4.12.1
+4.12.0
 2.70.0
 3.1.0
 1.1.0
@@ -413,7 +413,7 @@
 
 
   org.scalatestplus
-  selenium-4-9_${scala.binary.version}
+  selenium-4-12_${scala.binary.version}
   test
 
 
@@ -1137,25 +1137,25 @@
   
 org.scalatest
 scalatest_${scala.binary.version}
-3.2.16
+3.2.17
 test
   
   
 org.scalatestplus
 scalacheck-1-17_${scala.binary.version}
-3.2.16.0
+3.2.17.0
 test
   
   
 org.scalatestplus
 mockito-4-11_${scala.binary.version}
-3.2.16.0
+3.2.17.0
 test
   
   
 org.scalatestplus
-selenium-4-9_${scala.binary.version}
-3.2.16.0
+selenium-4-12_${scala.binary.version}
+3.2.17.0
 test
   
   
@@ -1173,13 +1173,13 @@
   
 net.byteb

[spark-website] branch asf-site updated: [SPARK-45195] Update examples with docker official image

2023-09-18 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/spark-website.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new 6b10f7fd85 [SPARK-45195] Update examples with docker official image
6b10f7fd85 is described below

commit 6b10f7fd85327f97cc12bede9ce5c60a744d9063
Author: Ruifeng Zheng 
AuthorDate: Mon Sep 18 07:31:46 2023 -0500

[SPARK-45195] Update examples with docker official image

1, add `docker run` commands for PySpark and SparkR;
2, switch to docker official image for SQL, Scala and Java;

refer to https://hub.docker.com/_/spark

also manually checked all the commands, e,g,:
```
ruifeng.zhengx:~$ docker run -it --rm spark:python3 
/opt/spark/bin/pyspark
Python 3.8.10 (default, May 26 2023, 14:05:08)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
setLogLevel(newLevel).
23/09/18 06:02:30 WARN NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
Welcome to
    __
 / __/__  ___ _/ /__
_\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 3.5.0
  /_/

Using Python version 3.8.10 (default, May 26 2023 14:05:08)
Spark context Web UI available at http://4861f70118ab:4040
Spark context available as 'sc' (master = local[*], app id = 
local-1695016951087).
SparkSession available as 'spark'.
>>> spark.range(0, 10).show()
+---+
| id|
+---+
|  0|
|  1|
|  2|
|  3|
|  4|
|  5|
|  6|
|  7|
|  8|
|  9|
+---+

```

Author: Ruifeng Zheng 

Closes #477 from zhengruifeng/offical_image.
---
 index.md| 12 +++-
 site/index.html | 12 +++-
 2 files changed, 14 insertions(+), 10 deletions(-)

diff --git a/index.md b/index.md
index a41e4c9e81..ada6242742 100644
--- a/index.md
+++ b/index.md
@@ -88,11 +88,13 @@ navigation:
 
 
 Run now
-Installing with 'pip'
+Install with 'pip' or try 
offical image
 
 
 $ pip install pyspark
 $ pyspark
+$ 
+$ docker run -it --rm spark:python3 
/opt/spark/bin/pyspark
 
 
 
 Run now
 
-$ docker run -it --rm apache/spark 
/opt/spark/bin/spark-sql
+$ docker run -it --rm spark /opt/spark/bin/spark-sql
 spark-sql>
 
 
@@ -175,7 +177,7 @@ FROM json.`logs.json`
 
 Run now
 
-$ docker run -it --rm apache/spark 
/opt/spark/bin/spark-shell
+$ docker run -it --rm spark 
/opt/spark/bin/spark-shell
 scala>
 
 
@@ -193,7 +195,7 @@ df.where("age > 21")
 
 Run now
 
-$ docker run -it --rm apache/spark 
/opt/spark/bin/spark-shell
+$ docker run -it --rm spark 
/opt/spark/bin/spark-shell
 scala>
 
 
@@ -210,7 +212,7 @@ df.where("age > 21")
 
 Run now
 
-$ SPARK-HOME/bin/sparkR
+$ docker run -it --rm spark:r /opt/spark/bin/sparkR
 >
 
 
diff --git a/site/index.html b/site/index.html
index e1b0b7e416..3ccc7104ce 100644
--- a/site/index.html
+++ b/site/index.html
@@ -213,11 +213,13 @@
 
 
 Run now
-Installing with 'pip'
+Install with 'pip' or try 
offical image
 
 
 $ pip install pyspark
 $ pyspark
+$ 
+$ docker run -it --rm spark:python3 
/opt/spark/bin/pyspark
 
 
 
@@ -273,7 +275,7 @@
 
 Run now
 
-$ docker run -it --rm apache/spark 
/opt/spark/bin/spark-sql
+$ docker run -it --rm spark /opt/spark/bin/spark-sql
 spark-sql>
 
 
@@ -293,7 +295,7 @@
 
 Run now

[spark] branch branch-3.3 updated: [SPARK-45127][DOCS] Exclude README.md from document build

2023-09-16 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch branch-3.3
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.3 by this push:
 new 6dcab1fe0f6 [SPARK-45127][DOCS] Exclude README.md from document build
6dcab1fe0f6 is described below

commit 6dcab1fe0f64458d76060e38fad974d6b84c4ff7
Author: panbingkun 
AuthorDate: Sat Sep 16 09:04:38 2023 -0500

[SPARK-45127][DOCS] Exclude README.md from document build

### What changes were proposed in this pull request?
The pr aims to exclude `README.md` from document build.

### Why are the changes needed?
- Currently, our document `README.html` does not have any CSS style applied 
to it, as shown below:
   https://spark.apache.org/docs/latest/README.html
   https://github.com/apache/spark/assets/15246973/1dfe5f69-30d9-4ce4-8d82-1bba5e721ccd";>

   **If we do not intend to display the above page to users, we should 
remove it during the document build process.**

- As we saw in the project `spark-website`, it has already set the 
following configuration:
   
https://github.com/apache/spark-website/blob/642d1fb834817014e1799e73882d53650c1c1662/_config.yml#L7
https://github.com/apache/spark/assets/15246973/421b7be5-4ece-407e-9d49-8e7487b74a47";>
   Let's stay consistent.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
- Manually test.
   After this pr, the README.html file will no longer be generated
   ```
(base) panbingkun:~/Developer/spark/spark-community/docs/_site$ls -al 
README.html
ls: README.html: No such file or directory
```
- Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #42883 from panbingkun/SPARK-45127.

Authored-by: panbingkun 
Signed-off-by: Sean Owen 
(cherry picked from commit 804f741453fb146b5261084fa3baf26631badb79)
Signed-off-by: Sean Owen 
---
 docs/_config.yml | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/docs/_config.yml b/docs/_config.yml
index 82da6b4ddff..c0f752a6155 100644
--- a/docs/_config.yml
+++ b/docs/_config.yml
@@ -46,3 +46,5 @@ DOCSEARCH_SCRIPT: |
   });
 
 permalink: 404.html
+
+exclude: ['README.md']


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.4 updated: [SPARK-45127][DOCS] Exclude README.md from document build

2023-09-16 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new 12020943ec9 [SPARK-45127][DOCS] Exclude README.md from document build
12020943ec9 is described below

commit 12020943ec95ce3e5dc3aeb2e3ae201cf25e0233
Author: panbingkun 
AuthorDate: Sat Sep 16 09:04:38 2023 -0500

[SPARK-45127][DOCS] Exclude README.md from document build

### What changes were proposed in this pull request?
The pr aims to exclude `README.md` from document build.

### Why are the changes needed?
- Currently, our document `README.html` does not have any CSS style applied 
to it, as shown below:
   https://spark.apache.org/docs/latest/README.html
   https://github.com/apache/spark/assets/15246973/1dfe5f69-30d9-4ce4-8d82-1bba5e721ccd";>

   **If we do not intend to display the above page to users, we should 
remove it during the document build process.**

- As we saw in the project `spark-website`, it has already set the 
following configuration:
   
https://github.com/apache/spark-website/blob/642d1fb834817014e1799e73882d53650c1c1662/_config.yml#L7
https://github.com/apache/spark/assets/15246973/421b7be5-4ece-407e-9d49-8e7487b74a47";>
   Let's stay consistent.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
- Manually test.
   After this pr, the README.html file will no longer be generated
   ```
(base) panbingkun:~/Developer/spark/spark-community/docs/_site$ls -al 
README.html
ls: README.html: No such file or directory
```
- Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #42883 from panbingkun/SPARK-45127.

Authored-by: panbingkun 
Signed-off-by: Sean Owen 
(cherry picked from commit 804f741453fb146b5261084fa3baf26631badb79)
Signed-off-by: Sean Owen 
---
 docs/_config.yml | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/docs/_config.yml b/docs/_config.yml
index c0c54b50e80..cb7ce91fa57 100644
--- a/docs/_config.yml
+++ b/docs/_config.yml
@@ -46,3 +46,5 @@ DOCSEARCH_SCRIPT: |
   });
 
 permalink: 404.html
+
+exclude: ['README.md']


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.5 updated: [SPARK-45127][DOCS] Exclude README.md from document build

2023-09-16 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new a3f50e74250 [SPARK-45127][DOCS] Exclude README.md from document build
a3f50e74250 is described below

commit a3f50e742506e07473c281255d1b13ab8ae78cd6
Author: panbingkun 
AuthorDate: Sat Sep 16 09:04:38 2023 -0500

[SPARK-45127][DOCS] Exclude README.md from document build

### What changes were proposed in this pull request?
The pr aims to exclude `README.md` from document build.

### Why are the changes needed?
- Currently, our document `README.html` does not have any CSS style applied 
to it, as shown below:
   https://spark.apache.org/docs/latest/README.html
   https://github.com/apache/spark/assets/15246973/1dfe5f69-30d9-4ce4-8d82-1bba5e721ccd";>

   **If we do not intend to display the above page to users, we should 
remove it during the document build process.**

- As we saw in the project `spark-website`, it has already set the 
following configuration:
   
https://github.com/apache/spark-website/blob/642d1fb834817014e1799e73882d53650c1c1662/_config.yml#L7
https://github.com/apache/spark/assets/15246973/421b7be5-4ece-407e-9d49-8e7487b74a47";>
   Let's stay consistent.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
- Manually test.
   After this pr, the README.html file will no longer be generated
   ```
(base) panbingkun:~/Developer/spark/spark-community/docs/_site$ls -al 
README.html
ls: README.html: No such file or directory
```
- Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #42883 from panbingkun/SPARK-45127.

Authored-by: panbingkun 
Signed-off-by: Sean Owen 
(cherry picked from commit 804f741453fb146b5261084fa3baf26631badb79)
Signed-off-by: Sean Owen 
---
 docs/_config.yml | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/docs/_config.yml b/docs/_config.yml
index afe015b2972..e346833722b 100644
--- a/docs/_config.yml
+++ b/docs/_config.yml
@@ -46,3 +46,5 @@ DOCSEARCH_SCRIPT: |
   });
 
 permalink: 404.html
+
+exclude: ['README.md']


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-45127][DOCS] Exclude README.md from document build

2023-09-16 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 804f741453f [SPARK-45127][DOCS] Exclude README.md from document build
804f741453f is described below

commit 804f741453fb146b5261084fa3baf26631badb79
Author: panbingkun 
AuthorDate: Sat Sep 16 09:04:38 2023 -0500

[SPARK-45127][DOCS] Exclude README.md from document build

### What changes were proposed in this pull request?
The pr aims to exclude `README.md` from document build.

### Why are the changes needed?
- Currently, our document `README.html` does not have any CSS style applied 
to it, as shown below:
   https://spark.apache.org/docs/latest/README.html
   https://github.com/apache/spark/assets/15246973/1dfe5f69-30d9-4ce4-8d82-1bba5e721ccd";>

   **If we do not intend to display the above page to users, we should 
remove it during the document build process.**

- As we saw in the project `spark-website`, it has already set the 
following configuration:
   
https://github.com/apache/spark-website/blob/642d1fb834817014e1799e73882d53650c1c1662/_config.yml#L7
https://github.com/apache/spark/assets/15246973/421b7be5-4ece-407e-9d49-8e7487b74a47";>
   Let's stay consistent.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
- Manually test.
   After this pr, the README.html file will no longer be generated
   ```
(base) panbingkun:~/Developer/spark/spark-community/docs/_site$ls -al 
README.html
ls: README.html: No such file or directory
```
- Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #42883 from panbingkun/SPARK-45127.

Authored-by: panbingkun 
Signed-off-by: Sean Owen 
---
 docs/_config.yml | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/docs/_config.yml b/docs/_config.yml
index 8c256af5bb3..fcc50d22e2e 100644
--- a/docs/_config.yml
+++ b/docs/_config.yml
@@ -46,3 +46,5 @@ DOCSEARCH_SCRIPT: |
   });
 
 permalink: 404.html
+
+exclude: ['README.md']


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (ab46dc048ba -> 33979829db9)

2023-09-16 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from ab46dc048ba [SPARK-44872][CONNECT][FOLLOWUP] Deflake 
ReattachableExecuteSuite and increase retry buffer
 add 33979829db9 [SPARK-45146][DOCS] Update the default value of 
'spark.executor.logs.rolling.strategy'

No new revisions were added by this update.

Summary of changes:
 docs/configuration.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.3 updated: [SPARK-45146][DOCS] Update the default value of 'spark.submit.deployMode'

2023-09-13 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch branch-3.3
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.3 by this push:
 new 9ee184ad5cf [SPARK-45146][DOCS] Update the default value of 
'spark.submit.deployMode'
9ee184ad5cf is described below

commit 9ee184ad5cf1ea808143cffd6fa982ca8ef503fe
Author: chenyu-opensource <119398199+chenyu-opensou...@users.noreply.github.com>
AuthorDate: Wed Sep 13 08:48:14 2023 -0500

[SPARK-45146][DOCS] Update the default value of 'spark.submit.deployMode'

**What changes were proposed in this pull request?**
The PR updates the default value of 'spark.submit.deployMode' in 
configuration.html on the website

**Why are the changes needed?**
The default value of 'spark.submit.deployMode' is 'client', but the website 
is wrong.

**Does this PR introduce any user-facing change?**
No

**How was this patch tested?**
It doesn't need to.

**Was this patch authored or co-authored using generative AI tooling?**
No

Closes #42902 from chenyu-opensource/branch-SPARK-45146.

Authored-by: chenyu-opensource 
<119398199+chenyu-opensou...@users.noreply.github.com>
Signed-off-by: Sean Owen 
(cherry picked from commit 076cb7aabac2f0ff11ca77ca530b7b8db5310a5e)
Signed-off-by: Sean Owen 
---
 docs/configuration.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/configuration.md b/docs/configuration.md
index cb1f5212439..9e243635baf 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -394,7 +394,7 @@ of the most common options to set are:
 
 
   spark.submit.deployMode
-  (none)
+  client
   
 The deploy mode of Spark driver program, either "client" or "cluster",
 Which means to launch driver program locally ("client")


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.4 updated: [SPARK-45146][DOCS] Update the default value of 'spark.submit.deployMode'

2023-09-13 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new 7544bdb12d1 [SPARK-45146][DOCS] Update the default value of 
'spark.submit.deployMode'
7544bdb12d1 is described below

commit 7544bdb12d1d0449aaa7e7a5f8124a5cf662712f
Author: chenyu-opensource <119398199+chenyu-opensou...@users.noreply.github.com>
AuthorDate: Wed Sep 13 08:48:14 2023 -0500

[SPARK-45146][DOCS] Update the default value of 'spark.submit.deployMode'

**What changes were proposed in this pull request?**
The PR updates the default value of 'spark.submit.deployMode' in 
configuration.html on the website

**Why are the changes needed?**
The default value of 'spark.submit.deployMode' is 'client', but the website 
is wrong.

**Does this PR introduce any user-facing change?**
No

**How was this patch tested?**
It doesn't need to.

**Was this patch authored or co-authored using generative AI tooling?**
No

Closes #42902 from chenyu-opensource/branch-SPARK-45146.

Authored-by: chenyu-opensource 
<119398199+chenyu-opensou...@users.noreply.github.com>
Signed-off-by: Sean Owen 
(cherry picked from commit 076cb7aabac2f0ff11ca77ca530b7b8db5310a5e)
Signed-off-by: Sean Owen 
---
 docs/configuration.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/configuration.md b/docs/configuration.md
index f099cea7eb9..d61f726130b 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -394,7 +394,7 @@ of the most common options to set are:
 
 
   spark.submit.deployMode
-  (none)
+  client
   
 The deploy mode of Spark driver program, either "client" or "cluster",
 Which means to launch driver program locally ("client")


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-45146][DOCS] Update the default value of 'spark.submit.deployMode'

2023-09-13 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 076cb7aabac [SPARK-45146][DOCS] Update the default value of 
'spark.submit.deployMode'
076cb7aabac is described below

commit 076cb7aabac2f0ff11ca77ca530b7b8db5310a5e
Author: chenyu-opensource <119398199+chenyu-opensou...@users.noreply.github.com>
AuthorDate: Wed Sep 13 08:48:14 2023 -0500

[SPARK-45146][DOCS] Update the default value of 'spark.submit.deployMode'

**What changes were proposed in this pull request?**
The PR updates the default value of 'spark.submit.deployMode' in 
configuration.html on the website

**Why are the changes needed?**
The default value of 'spark.submit.deployMode' is 'client', but the website 
is wrong.

**Does this PR introduce any user-facing change?**
No

**How was this patch tested?**
It doesn't need to.

**Was this patch authored or co-authored using generative AI tooling?**
No

Closes #42902 from chenyu-opensource/branch-SPARK-45146.

Authored-by: chenyu-opensource 
<119398199+chenyu-opensou...@users.noreply.github.com>
Signed-off-by: Sean Owen 
---
 docs/configuration.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/configuration.md b/docs/configuration.md
index 6f7e12555e8..3ca9b704eba 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -394,7 +394,7 @@ of the most common options to set are:
 
 
   spark.submit.deployMode
-  (none)
+  client
   
 The deploy mode of Spark driver program, either "client" or "cluster",
 Which means to launch driver program locally ("client")


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.5 updated: [SPARK-45146][DOCS] Update the default value of 'spark.submit.deployMode'

2023-09-13 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new e72ae794e69 [SPARK-45146][DOCS] Update the default value of 
'spark.submit.deployMode'
e72ae794e69 is described below

commit e72ae794e69d8182291655d023aee903a913571b
Author: chenyu-opensource <119398199+chenyu-opensou...@users.noreply.github.com>
AuthorDate: Wed Sep 13 08:48:14 2023 -0500

[SPARK-45146][DOCS] Update the default value of 'spark.submit.deployMode'

**What changes were proposed in this pull request?**
The PR updates the default value of 'spark.submit.deployMode' in 
configuration.html on the website

**Why are the changes needed?**
The default value of 'spark.submit.deployMode' is 'client', but the website 
is wrong.

**Does this PR introduce any user-facing change?**
No

**How was this patch tested?**
It doesn't need to.

**Was this patch authored or co-authored using generative AI tooling?**
No

Closes #42902 from chenyu-opensource/branch-SPARK-45146.

Authored-by: chenyu-opensource 
<119398199+chenyu-opensou...@users.noreply.github.com>
Signed-off-by: Sean Owen 
(cherry picked from commit 076cb7aabac2f0ff11ca77ca530b7b8db5310a5e)
Signed-off-by: Sean Owen 
---
 docs/configuration.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/configuration.md b/docs/configuration.md
index dfded480c99..1139beb6646 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -394,7 +394,7 @@ of the most common options to set are:
 
 
   spark.submit.deployMode
-  (none)
+  client
   
 The deploy mode of Spark driver program, either "client" or "cluster",
 Which means to launch driver program locally ("client")


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-45111][BUILD] Upgrade maven to 3.9.4

2023-09-11 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 169aa4bee95 [SPARK-45111][BUILD] Upgrade maven to 3.9.4
169aa4bee95 is described below

commit 169aa4bee950e2249d853f00b4e5fca67edfaa80
Author: yangjie01 
AuthorDate: Mon Sep 11 10:59:57 2023 -0500

[SPARK-45111][BUILD] Upgrade maven to 3.9.4

### What changes were proposed in this pull request?
This PR aims to upgrade Maven to 3.8.8 from 3.9.4.

### Why are the changes needed?
The new version [lift JDK minimum to JDK 
8](https://issues.apache.org/jira/browse/MNG-7452) and [make the build work on 
JDK 20](https://issues.apache.org/jira/browse/MNG-7743) . It also brings a 
series of bug fixes, such as [Fix deadlock during forked lifecycle 
executions](https://issues.apache.org/jira/browse/MNG-7487), along with a 
number of new optimizations like [Profile activation by 
packaging](https://issues.apache.org/jira/browse/MNG-6609). On the other hand, 
the new version re [...]

For other updates, refer to the corresponding release notes:

- https://maven.apache.org/docs/3.9.0/release-notes.html | 
https://github.com/apache/maven/releases/tag/maven-3.9.0
- https://maven.apache.org/docs/3.9.1/release-notes.html | 
https://github.com/apache/maven/releases/tag/maven-3.9.1
- https://maven.apache.org/docs/3.9.2/release-notes.html | 
https://github.com/apache/maven/releases/tag/maven-3.9.2
- https://maven.apache.org/docs/3.9.3/release-notes.html | 
https://github.com/apache/maven/releases/tag/maven-3.9.3
- https://maven.apache.org/docs/3.9.4/release-notes.html | 
https://github.com/apache/maven/releases/tag/maven-3.9.4

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?

- Pass GitHub Actions
- Manual test :

run `build/mvn -version` wll trigger download 
`apache-maven-3.9.4-bin.tar.gz`

```
exec: curl --silent --show-error -L 
https://www.apache.org/dyn/closer.lua/maven/maven-3/3.9.4/binaries/apache-maven-3.9.4-bin.tar.gz?action=download
```

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #42827 from LuciferYang/maven-394.

Authored-by: yangjie01 
Signed-off-by: Sean Owen 
---
 dev/appveyor-install-dependencies.ps1 | 2 +-
 docs/building-spark.md| 2 +-
 pom.xml   | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/dev/appveyor-install-dependencies.ps1 
b/dev/appveyor-install-dependencies.ps1
index db154cd51da..682d388bdf9 100644
--- a/dev/appveyor-install-dependencies.ps1
+++ b/dev/appveyor-install-dependencies.ps1
@@ -81,7 +81,7 @@ if (!(Test-Path $tools)) {
 # == Maven
 # Push-Location $tools
 #
-# $mavenVer = "3.8.8"
+# $mavenVer = "3.9.4"
 # Start-FileDownload 
"https://archive.apache.org/dist/maven/maven-3/$mavenVer/binaries/apache-maven-$mavenVer-bin.zip";
 "maven.zip"
 #
 # # extract
diff --git a/docs/building-spark.md b/docs/building-spark.md
index 4b8e70655d5..bbbc51d8c22 100644
--- a/docs/building-spark.md
+++ b/docs/building-spark.md
@@ -27,7 +27,7 @@ license: |
 ## Apache Maven
 
 The Maven-based build is the build of reference for Apache Spark.
-Building Spark using Maven requires Maven 3.8.8 and Java 8/11/17.
+Building Spark using Maven requires Maven 3.9.4 and Java 8/11/17.
 Spark requires Scala 2.12/2.13; support for Scala 2.11 was removed in Spark 
3.0.0.
 
 ### Setting up Maven's Memory Usage
diff --git a/pom.xml b/pom.xml
index a61d603fe1c..02920c0ae74 100644
--- a/pom.xml
+++ b/pom.xml
@@ -115,7 +115,7 @@
 1.8
 ${java.version}
 ${java.version}
-3.8.8
+3.9.4
 3.1.0
 spark
 9.5


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (c8fa821a873 -> 445c5417ea1)

2023-09-09 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from c8fa821a873 [SPARK-44866][SQL] Add `SnowflakeDialect` to handle 
BOOLEAN type correctly
 add 445c5417ea1 [SPARK-45105][DOCS] Make hyperlinks in documents clickable

No new revisions were added by this update.

Summary of changes:
 docs/running-on-mesos.md | 4 ++--
 docs/running-on-yarn.md  | 6 +++---
 2 files changed, 5 insertions(+), 5 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark-website] branch asf-site updated: make hyperlinks clickable & fix link

2023-09-09 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/spark-website.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new 642d1fb834 make hyperlinks clickable & fix link
642d1fb834 is described below

commit 642d1fb834817014e1799e73882d53650c1c1662
Author: panbingkun 
AuthorDate: Sat Sep 9 08:24:02 2023 -0500

make hyperlinks clickable & fix link

The pr aims to:
- make hyperlinks clickable to improve document usability
- fix some link to reduce one jump.

Author: panbingkun 

Closes #475 from panbingkun/make_hyperlinks_clickable.
---
 README.md |  2 +-
 committers.md |  9 +++--
 developer-tools.md|  2 +-
 release-process.md| 16 ++--
 security.md   |  2 +-
 site/committers.html  |  9 +++--
 site/developer-tools.html |  2 +-
 site/release-process.html | 16 ++--
 site/security.html|  2 +-
 9 files changed, 31 insertions(+), 29 deletions(-)

diff --git a/README.md b/README.md
index ea34048ae7..3e6492c921 100644
--- a/README.md
+++ b/README.md
@@ -7,7 +7,7 @@ Building the site requires [Jekyll](http://jekyllrb.com/docs)
 The easiest way to install the right version of these tools is using
 [Bundler](https://bundler.io/) and running `bundle install` in this directory.
 
-See also https://github.com/apache/spark/blob/master/docs/README.md
+See also 
[https://github.com/apache/spark/blob/master/docs/README.md](https://github.com/apache/spark/blob/master/docs/README.md)
 
 A site build will update the directories and files in the `site` directory 
with the generated files.
 Using Jekyll via `bundle exec jekyll` locks it to the right version.
diff --git a/committers.md b/committers.md
index 2431d73f84..a555424026 100644
--- a/committers.md
+++ b/committers.md
@@ -197,8 +197,8 @@ origin  g...@github.com:[your username]/spark.git (push)
 For the `apache` repo, you will need to set up command-line authentication to 
GitHub. This may
 include setting up an SSH key and/or personal access token. See:
 
-- https://help.github.com/articles/connecting-to-github-with-ssh/
-- 
https://help.github.com/articles/creating-a-personal-access-token-for-the-command-line/
+- 
[https://docs.github.com/en/authentication/connecting-to-github-with-ssh](https://docs.github.com/en/authentication/connecting-to-github-with-ssh)
+- 
[https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens)
 
 To check whether the necessary write access are already granted please visit 
[GitBox](https://gitbox.apache.org/setup/).
 
@@ -219,10 +219,7 @@ Then, in a separate window, modify the code and push a 
commit. Run `git rebase -
 You can verify the result is one change with `git log`. Then resume the script 
in the other window.
 
 Also, please remember to set Assignee on JIRAs where applicable when they are 
resolved. The script 
-can do this automatically in most cases. However where the contributor is not 
yet a part of the
-Contributors group for the Spark project in ASF JIRA, it won't work until they 
are added. Ask
-an admin to add the person to Contributors at 
-https://issues.apache.org/jira/plugins/servlet/project-config/SPARK/roles .
+can do this automatically in most cases.
 
 Once a PR is merged please leave a comment on the PR stating which branch(es) 
it has been merged with.
 
diff --git a/developer-tools.md b/developer-tools.md
index 73e708116e..59850dbe19 100644
--- a/developer-tools.md
+++ b/developer-tools.md
@@ -193,7 +193,7 @@ Please check other available options via 
`python/run-tests[-with-coverage] --hel
 
 Although GitHub Action provide both K8s unit test and integration test 
coverage, you can run it locally. For example, Volcano batch scheduler 
integration test should be done manually. Please refer the integration test 
documentation for the detail.
 
-https://github.com/apache/spark/blob/master/resource-managers/kubernetes/integration-tests/README.md
+[https://github.com/apache/spark/blob/master/resource-managers/kubernetes/integration-tests/README.md](https://github.com/apache/spark/blob/master/resource-managers/kubernetes/integration-tests/README.md)
 
 Testing with GitHub actions workflow
 
diff --git a/release-process.md b/release-process.md
index 101db9a8b3..87a1ab6778 100644
--- a/release-process.md
+++ b/release-process.md
@@ -31,9 +31,9 @@ The release manager role in Spark means you are responsible 
for a few different
 
 If you are a new Release Manager, you can read up on the process from the 
followings:
 
-- release signing https://www.apache.org/dev/release-signing.html
-- gpg for signing https://www.apache.org/dev/openpgp.html
-- svn https://www.a

[spark] branch master updated: [SPARK-44732][XML][FOLLOWUP] Partial backport of spark-xml "Shortcut common type inference cases to fail fast"

2023-09-07 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new a37c265371d [SPARK-44732][XML][FOLLOWUP] Partial backport of spark-xml 
"Shortcut common type inference cases to fail fast"
a37c265371d is described below

commit a37c265371dc861fa478dd63deaa38a86415fe3b
Author: Sean Owen 
AuthorDate: Thu Sep 7 15:21:36 2023 -0700

[SPARK-44732][XML][FOLLOWUP] Partial backport of spark-xml "Shortcut common 
type inference cases to fail fast"

### What changes were proposed in this pull request?

Partial back-port of 
https://github.com/databricks/spark-xml/commit/994e357f7666956b5d0e63627716b2c092d9abbd?diff=split
 from spark-xml

### Why are the changes needed?

Though no more development was intended on spark-xml, there was a 
non-trivial improvement to inference speed that I committed anyway to resolve a 
customer issue. Part of it can be 'backported' here to sync the code. I 
attached this as a follow-up to the main code port JIRA.

There is still, in general, no intent to commit more to spark-xml in the 
meantime unless it's significantly important.

### Does this PR introduce _any_ user-facing change?

No, this should only speed up schema inference without behavior change.

### How was this patch tested?

Tested in spark-xml, and will be tested by tests here too

Closes #42844 from srowen/SPARK-44732.2.

Authored-by: Sean Owen 
Signed-off-by: Sean Owen 
---
 .../org/apache/spark/sql/catalyst/xml/TypeCast.scala | 16 
 1 file changed, 16 insertions(+)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/xml/TypeCast.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/xml/TypeCast.scala
index a00f372da7f..b065dd41f28 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/xml/TypeCast.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/xml/TypeCast.scala
@@ -155,6 +155,12 @@ private[sql] object TypeCast {
 } else {
   value
 }
+// A little shortcut to avoid trying many formatters in the common case 
that
+// the input isn't a double. All built-in formats will start with a digit 
or period.
+if (signSafeValue.isEmpty ||
+  !(Character.isDigit(signSafeValue.head) || signSafeValue.head == '.')) {
+  return false
+}
 // Rule out strings ending in D or F, as they will parse as double but 
should be disallowed
 if (value.nonEmpty && (value.last match {
   case 'd' | 'D' | 'f' | 'F' => true
@@ -171,6 +177,11 @@ private[sql] object TypeCast {
 } else {
   value
 }
+// A little shortcut to avoid trying many formatters in the common case 
that
+// the input isn't a number. All built-in formats will start with a digit.
+if (signSafeValue.isEmpty || !Character.isDigit(signSafeValue.head)) {
+  return false
+}
 (allCatch opt signSafeValue.toInt).isDefined
   }
 
@@ -180,6 +191,11 @@ private[sql] object TypeCast {
 } else {
   value
 }
+// A little shortcut to avoid trying many formatters in the common case 
that
+// the input isn't a number. All built-in formats will start with a digit.
+if (signSafeValue.isEmpty || !Character.isDigit(signSafeValue.head)) {
+  return false
+}
 (allCatch opt signSafeValue.toLong).isDefined
   }
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (0e6e15ca633 -> b8b58e0b95b)

2023-09-07 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 0e6e15ca633 [SPARK-45080][SS] Explicitly call out support for columnar 
in DSv2 streaming data sources
 add b8b58e0b95b [SPARK-45077][UI] Upgrade dagre-d3.js from 0.4.3 to 0.6.4

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/ui/static/dagre-d3.min.js | 4836 +++-
 1 file changed, 4829 insertions(+), 7 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-45067][BUILD] Upgrade slf4j to 2.0.9

2023-09-04 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 85d1c7f3a5d [SPARK-45067][BUILD] Upgrade slf4j to 2.0.9
85d1c7f3a5d is described below

commit 85d1c7f3a5dd0a9162d93b80812a193d8ccfef18
Author: yangjie01 
AuthorDate: Mon Sep 4 09:15:44 2023 -0500

[SPARK-45067][BUILD] Upgrade slf4j to 2.0.9

### What changes were proposed in this pull request?
This pr aims upgrade slf4j from 2.0.7 to 2.0.9

### Why are the changes needed?
The release notes as follows:

- https://www.slf4j.org/news.html#2.0.9

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GitHub Actions

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #42796 from LuciferYang/SPARK-45067.

Authored-by: yangjie01 
Signed-off-by: Sean Owen 
---
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 6 +++---
 pom.xml   | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index 59164c1f8f4..652127a9bb8 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -118,7 +118,7 @@ javassist/3.29.2-GA//javassist-3.29.2-GA.jar
 javax.jdo/3.2.0-m3//javax.jdo-3.2.0-m3.jar
 javolution/5.5.1//javolution-5.5.1.jar
 jaxb-runtime/2.3.2//jaxb-runtime-2.3.2.jar
-jcl-over-slf4j/2.0.7//jcl-over-slf4j-2.0.7.jar
+jcl-over-slf4j/2.0.9//jcl-over-slf4j-2.0.9.jar
 jdo-api/3.0.1//jdo-api-3.0.1.jar
 jdom2/2.0.6//jdom2-2.0.6.jar
 jersey-client/2.40//jersey-client-2.40.jar
@@ -141,7 +141,7 @@ 
json4s-jackson_2.12/3.7.0-M11//json4s-jackson_2.12-3.7.0-M11.jar
 json4s-scalap_2.12/3.7.0-M11//json4s-scalap_2.12-3.7.0-M11.jar
 jsr305/3.0.0//jsr305-3.0.0.jar
 jta/1.1//jta-1.1.jar
-jul-to-slf4j/2.0.7//jul-to-slf4j-2.0.7.jar
+jul-to-slf4j/2.0.9//jul-to-slf4j-2.0.9.jar
 kryo-shaded/4.0.2//kryo-shaded-4.0.2.jar
 kubernetes-client-api/6.8.1//kubernetes-client-api-6.8.1.jar
 kubernetes-client/6.8.1//kubernetes-client-6.8.1.jar
@@ -233,7 +233,7 @@ 
scala-parser-combinators_2.12/2.3.0//scala-parser-combinators_2.12-2.3.0.jar
 scala-reflect/2.12.18//scala-reflect-2.12.18.jar
 scala-xml_2.12/2.2.0//scala-xml_2.12-2.2.0.jar
 shims/0.9.45//shims-0.9.45.jar
-slf4j-api/2.0.7//slf4j-api-2.0.7.jar
+slf4j-api/2.0.9//slf4j-api-2.0.9.jar
 snakeyaml-engine/2.6//snakeyaml-engine-2.6.jar
 snakeyaml/2.0//snakeyaml-2.0.jar
 snappy-java/1.1.10.3//snappy-java-1.1.10.3.jar
diff --git a/pom.xml b/pom.xml
index efd1c6ffdb9..a61d603fe1c 100644
--- a/pom.xml
+++ b/pom.xml
@@ -119,7 +119,7 @@
 3.1.0
 spark
 9.5
-2.0.7
+2.0.9
 2.20.0
 
 3.3.6


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-44890][BUILD] Update miswritten remarks

2023-09-04 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new ba1c2f3b383 [SPARK-44890][BUILD] Update miswritten remarks
ba1c2f3b383 is described below

commit ba1c2f3b38396c01739375d6e83ac84b581d951e
Author: chenyu-opensource <119398199+chenyu-opensou...@users.noreply.github.com>
AuthorDate: Mon Sep 4 09:12:33 2023 -0500

[SPARK-44890][BUILD] Update miswritten remarks

### What changes were proposed in this pull request?

The PR updates miswritten remarks in pom.xml

### Why are the changes needed?

More accurate and standardized

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

It doesn't need to. It is annotation information that does not affect 
actual operation

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #42598 from chenyu-opensource/master.

Authored-by: chenyu-opensource 
<119398199+chenyu-opensou...@users.noreply.github.com>
Signed-off-by: Sean Owen 
---
 pom.xml | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/pom.xml b/pom.xml
index 8edc3fd550c..efd1c6ffdb9 100644
--- a/pom.xml
+++ b/pom.xml
@@ -153,7 +153,7 @@
 2.5.1
 2.0.8
 
 4.2.19
@@ -175,7 +175,7 @@
 2.12.18
 2.12
 2.2.0
-   
+   
 4.8.0
 false
 2.16.0
@@ -204,7 +204,7 @@
 3.1.9
 
 2.40
 2.12.5


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.5 updated: [SPARK-45042][BUILD][3.5] Upgrade jetty to 9.4.52.v20230823

2023-09-04 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 950b2f29105 [SPARK-45042][BUILD][3.5] Upgrade jetty to 9.4.52.v20230823
950b2f29105 is described below

commit 950b2f29105cd66355eef10503a93d678087c79e
Author: panbingkun 
AuthorDate: Mon Sep 4 09:01:50 2023 -0500

[SPARK-45042][BUILD][3.5] Upgrade jetty to 9.4.52.v20230823

### What changes were proposed in this pull request?
The pr aims to Upgrade jetty from 9.4.51.v20230217 to 9.4.52.v20230823. 
(Backport to Spark 3.5.0)

### Why are the changes needed?
- This is a release of the 
https://github.com/eclipse/jetty.project/issues/7958 that was sponsored by a 
[support contract from Webtide.com](mailto:saleswebtide.com)

- The newest version fix a possible security issue:
   This release provides a workaround for Security Advisory 
https://github.com/advisories/GHSA-58qw-p7qm-5rvh

- The release note as follows:
   
https://github.com/eclipse/jetty.project/releases/tag/jetty-9.4.52.v20230823

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #42795 from panbingkun/branch-3.5_SPARK-45042.

Authored-by: panbingkun 
Signed-off-by: Sean Owen 
---
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 4 ++--
 pom.xml   | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index b6aba589d5f..1d02f8dba56 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -130,8 +130,8 @@ 
jersey-container-servlet/2.40//jersey-container-servlet-2.40.jar
 jersey-hk2/2.40//jersey-hk2-2.40.jar
 jersey-server/2.40//jersey-server-2.40.jar
 jettison/1.1//jettison-1.1.jar
-jetty-util-ajax/9.4.51.v20230217//jetty-util-ajax-9.4.51.v20230217.jar
-jetty-util/9.4.51.v20230217//jetty-util-9.4.51.v20230217.jar
+jetty-util-ajax/9.4.52.v20230823//jetty-util-ajax-9.4.52.v20230823.jar
+jetty-util/9.4.52.v20230823//jetty-util-9.4.52.v20230823.jar
 jline/2.14.6//jline-2.14.6.jar
 joda-time/2.12.5//joda-time-2.12.5.jar
 jodd-core/3.5.2//jodd-core-3.5.2.jar
diff --git a/pom.xml b/pom.xml
index 154ca4005f6..8fc4b89a78c 100644
--- a/pom.xml
+++ b/pom.xml
@@ -143,7 +143,7 @@
 1.13.1
 1.9.1
 shaded-protobuf
-9.4.51.v20230217
+9.4.52.v20230823
 4.0.3
 0.10.0

[spark] branch master updated: [SPARK-44956][BUILD] Upgrade Jekyll to 4.3.2 & Webrick to 1.8.1

2023-09-02 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 967aac1171a [SPARK-44956][BUILD] Upgrade Jekyll to 4.3.2 & Webrick to 
1.8.1
967aac1171a is described below

commit 967aac1171a49c8e98c992512487d77c2b1c4565
Author: panbingkun 
AuthorDate: Sat Sep 2 08:19:38 2023 -0500

[SPARK-44956][BUILD] Upgrade Jekyll to 4.3.2 & Webrick to 1.8.1

### What changes were proposed in this pull request?
The pr aims to upgrade
- Jekyll  from 4.2.1 to 4.3.2.
- Webrick from 1.7 to 1.8.1.

### Why are the changes needed?
1.The `4.2.1` version was released on Sep 27, 2021, and it has been 2 years 
since now.

2.Jekyll 4.3.2 was released in `Jan 21, 2023`, which includes the fix of a 
regression bug.
- https://github.com/jekyll/jekyll/releases/tag/v4.3.2
- https://github.com/jekyll/jekyll/releases/tag/v4.3.1
- https://github.com/jekyll/jekyll/releases/tag/v4.3.0
   Fix regression in Convertible module from v4.2.0 
(https://github.com/jekyll/jekyll/pull/8786)
- https://github.com/jekyll/jekyll/releases/tag/v4.2.2

3.The webrick newest version include some big fixed.
https://github.com/ruby/webrick/releases/tag/v1.8.1
https://github.com/ruby/webrick/releases/tag/v1.8.0

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
- Pass GA.
- Manually test.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #42669 from panbingkun/SPARK-44956.

Authored-by: panbingkun 
Signed-off-by: Sean Owen 
---
 docs/Gemfile  |  4 ++--
 docs/Gemfile.lock | 62 +--
 2 files changed, 35 insertions(+), 31 deletions(-)

diff --git a/docs/Gemfile b/docs/Gemfile
index 6c352012964..6c676037116 100644
--- a/docs/Gemfile
+++ b/docs/Gemfile
@@ -18,7 +18,7 @@
 source "https://rubygems.org";
 
 gem "ffi", "1.15.5"
-gem "jekyll", "4.2.1"
+gem "jekyll", "4.3.2"
 gem "rouge", "3.26.0"
 gem "jekyll-redirect-from", "0.16.0"
-gem "webrick", "1.7"
+gem "webrick", "1.8.1"
diff --git a/docs/Gemfile.lock b/docs/Gemfile.lock
index 6654e6c47c6..eda31f85747 100644
--- a/docs/Gemfile.lock
+++ b/docs/Gemfile.lock
@@ -1,74 +1,78 @@
 GEM
   remote: https://rubygems.org/
   specs:
-addressable (2.8.0)
-  public_suffix (>= 2.0.2, < 5.0)
+addressable (2.8.5)
+  public_suffix (>= 2.0.2, < 6.0)
 colorator (1.1.0)
-concurrent-ruby (1.1.9)
-em-websocket (0.5.2)
+concurrent-ruby (1.2.2)
+em-websocket (0.5.3)
   eventmachine (>= 0.12.9)
-  http_parser.rb (~> 0.6.0)
+  http_parser.rb (~> 0)
 eventmachine (1.2.7)
 ffi (1.15.5)
 forwardable-extended (2.6.0)
-http_parser.rb (0.6.0)
-i18n (1.8.11)
+google-protobuf (3.24.2)
+http_parser.rb (0.8.0)
+i18n (1.14.1)
   concurrent-ruby (~> 1.0)
-jekyll (4.2.1)
+jekyll (4.3.2)
   addressable (~> 2.4)
   colorator (~> 1.0)
   em-websocket (~> 0.5)
   i18n (~> 1.0)
-  jekyll-sass-converter (~> 2.0)
+  jekyll-sass-converter (>= 2.0, < 4.0)
   jekyll-watch (~> 2.0)
-  kramdown (~> 2.3)
+  kramdown (~> 2.3, >= 2.3.1)
   kramdown-parser-gfm (~> 1.0)
   liquid (~> 4.0)
-  mercenary (~> 0.4.0)
+  mercenary (>= 0.3.6, < 0.5)
   pathutil (~> 0.9)
-  rouge (~> 3.0)
+  rouge (>= 3.0, < 5.0)
   safe_yaml (~> 1.0)
-  terminal-table (~> 2.0)
+  terminal-table (>= 1.8, < 4.0)
+  webrick (~> 1.7)
 jekyll-redirect-from (0.16.0)
   jekyll (>= 3.3, < 5.0)
-jekyll-sass-converter (2.1.0)
-  sassc (> 2.0.1, < 3.0)
+jekyll-sass-converter (3.0.0)
+  sass-embedded (~> 1.54)
 jekyll-watch (2.2.1)
   listen (~> 3.0)
-kramdown (2.3.1)
+kramdown (2.4.0)
   rexml
 kramdown-parser-gfm (1.1.0)
   kramdown (~> 2.0)
-liquid (4.0.3)
-listen (3.7.0)
+liquid (4.0.4)
+listen (3.8.0)
   rb-fsevent (~> 0.10, >= 0.10.3)
   rb-inotify (~> 0.9, >= 0.9.10)
 mercenary (0.4.0)
 pathutil (0.16.2)
   forwardable-extended (~> 2.6)
-public_suffix (4.0.6)
-rb-fsevent (0.11.0)
+public_suffix (5.0.3)
+rake (13.0.6)
+rb-fsevent (0.11.2)
 rb-inotify (0.10.1)
   ffi (~> 1.0)
-rexml (3.2.5)
+rexml (3.2.6)
 rouge (3.26.0)
 safe_yaml (1.0.5)
-sassc (2.4.0)
-  ffi (~> 1.9)
-terminal-table (2.0.0)
-  unicode-display_width (~> 1.1, >= 1.1.1)
-

[spark] branch master updated: [SPARK-45043][BUILD] Upgrade `scalafmt` to 3.7.13

2023-09-02 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 82d54fc8924 [SPARK-45043][BUILD] Upgrade `scalafmt` to 3.7.13
82d54fc8924 is described below

commit 82d54fc8924618777992ee9a4d939b1fb336f20d
Author: panbingkun 
AuthorDate: Sat Sep 2 08:18:43 2023 -0500

[SPARK-45043][BUILD] Upgrade `scalafmt` to 3.7.13

### What changes were proposed in this pull request?
The pr aims to upgrade `scalafmt` from 3.7.5 to 3.7.13.

### Why are the changes needed?
1.The newest version include some bug fixed, eg:
- FormatWriter: accumulate align shift correctly 
(https://github.com/scalameta/scalafmt/pull/3615)
- Indents: ignore fewerBraces if indentation is 1 
(https://github.com/scalameta/scalafmt/pull/3592)
- RemoveScala3OptionalBraces: handle infix on rbrace 
(https://github.com/scalameta/scalafmt/pull/3576)

2.The full release notes:
https://github.com/scalameta/scalafmt/releases/tag/v3.7.13
https://github.com/scalameta/scalafmt/releases/tag/v3.7.12
https://github.com/scalameta/scalafmt/releases/tag/v3.7.11
https://github.com/scalameta/scalafmt/releases/tag/v3.7.10
https://github.com/scalameta/scalafmt/releases/tag/v3.7.9
https://github.com/scalameta/scalafmt/releases/tag/v3.7.8
https://github.com/scalameta/scalafmt/releases/tag/v3.7.7
https://github.com/scalameta/scalafmt/releases/tag/v3.7.6

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #42764 from panbingkun/SPARK-45043.

Authored-by: panbingkun 
Signed-off-by: Sean Owen 
---
 dev/.scalafmt.conf | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/dev/.scalafmt.conf b/dev/.scalafmt.conf
index c3b26002a76..721dec28990 100644
--- a/dev/.scalafmt.conf
+++ b/dev/.scalafmt.conf
@@ -32,4 +32,4 @@ fileOverride {
 runner.dialect = scala213
   }
 }
-version = 3.7.5
+version = 3.7.13


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-44782][INFRA] Adjust PR template to Generative Tooling Guidance recommendations

2023-08-18 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 2e2f5e9c28b [SPARK-44782][INFRA] Adjust PR template to Generative 
Tooling Guidance recommendations
2e2f5e9c28b is described below

commit 2e2f5e9c28b4e88171949006937c094304581738
Author: zero323 
AuthorDate: Fri Aug 18 21:13:36 2023 -0500

[SPARK-44782][INFRA] Adjust PR template to Generative Tooling Guidance 
recommendations

### What changes were proposed in this pull request?

This PR adds _Was this patch authored or co-authored using generative AI 
tooling?_ section to the PR template.

### Why are the changes needed?

To reflect recommendations of the [ASF Generative Tooling 
Guidance](https://www.apache.org/legal/generative-tooling.html).

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manual inspection.

Closes #42469 from zero323/SPARK-44782.

Authored-by: zero323 
Signed-off-by: Sean Owen 
---
 .github/PULL_REQUEST_TEMPLATE | 9 +
 1 file changed, 9 insertions(+)

diff --git a/.github/PULL_REQUEST_TEMPLATE b/.github/PULL_REQUEST_TEMPLATE
index 1548696a3ca..a80bf21312a 100644
--- a/.github/PULL_REQUEST_TEMPLATE
+++ b/.github/PULL_REQUEST_TEMPLATE
@@ -47,3 +47,12 @@ If it was tested in a way different from regular unit tests, 
please clarify how
 If tests were not added, please describe why they were not added and/or why it 
was difficult to add.
 If benchmark tests were added, please run the benchmarks in GitHub Actions for 
the consistent environment, and the instructions could accord to: 
https://spark.apache.org/developer-tools.html#github-workflow-benchmarks.
 -->
+
+
+### Was this patch authored or co-authored using generative AI tooling?
+


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.3 updated: [SPARK-44813][INFRA] The Jira Python misses our assignee when it searches users again

2023-08-18 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch branch-3.3
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.3 by this push:
 new 7e7c41bf100 [SPARK-44813][INFRA] The Jira Python misses our assignee 
when it searches users again
7e7c41bf100 is described below

commit 7e7c41bf1007ca05ffc3d818d34d75570d234a6d
Author: Kent Yao 
AuthorDate: Fri Aug 18 10:02:43 2023 -0500

[SPARK-44813][INFRA] The Jira Python misses our assignee when it searches 
users again

### What changes were proposed in this pull request?

This PR creates an alternative to the assign_issue function in 
jira.client.JIRA.

The original one has an issue that it will search users again and only 
choose the assignee from 20 candidates. If it's unmatched, it picks the head 
blindly.

For example,

```python
>>> assignee = asf_jira.user("yao")
>>> "SPARK-44801"
'SPARK-44801'
>>> asf_jira.assign_issue(issue.key, assignee.name)
Traceback (most recent call last):
  File "", line 1, in 
NameError: name 'issue' is not defined
>>> asf_jira.assign_issue("SPARK-44801", assignee.name)
Traceback (most recent call last):
  File "", line 1, in 
  File 
"/Users/hzyaoqin/python/lib/python3.11/site-packages/jira/client.py", line 123, 
in wrapper
result = func(*arg_list, **kwargs)
 ^
  File 
"/Users/hzyaoqin/python/lib/python3.11/site-packages/jira/client.py", line 
1891, in assign_issue
self._session.put(url, data=json.dumps(payload))
  File 
"/Users/hzyaoqin/python/lib/python3.11/site-packages/requests/sessions.py", 
line 649, in put
return self.request("PUT", url, data=data, **kwargs)
   ^
  File 
"/Users/hzyaoqin/python/lib/python3.11/site-packages/jira/resilientsession.py", 
line 246, in request
elif raise_on_error(response, **processed_kwargs):
 
  File 
"/Users/hzyaoqin/python/lib/python3.11/site-packages/jira/resilientsession.py", 
line 71, in raise_on_error
raise JIRAError(
jira.exceptions.JIRAError: JiraError HTTP 400 url: 
https://issues.apache.org/jira/rest/api/latest/issue/SPARK-44801/assignee
response text = {"errorMessages":[],"errors":{"assignee":"User 
'airhot' cannot be assigned issues."}}
```

The Jira userid 'yao' fails to return my JIRA profile as a candidate(20 in 
total) to match. So, 'airhot' from the head replaces me as an assignee.

### Why are the changes needed?

bugfix for merge_spark_pr

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

test locally

```python
>>> def assign_issue(client: jira.client.JIRA, issue: int, assignee: str) 
-> bool:
... """Assign an issue to a user.
...
... Args:
... issue (Union[int, str]): the issue ID or key to assign
... assignee (str): the user to assign the issue to. None will set 
it to unassigned. -1 will set it to Automatic.
...
... Returns:
... bool
... """
... url = getattr(client, "_get_latest_url")(f"issue/{issue}/assignee")
... payload = {"name": assignee}
... getattr(client, "_session").put(url, data=json.dumps(payload))
... return True
...

>>>
>>> assign_issue(asf_jira, "SPARK-44801", "yao")
True
```

Closes #42496 from yaooqinn/SPARK-44813.

Authored-by: Kent Yao 
Signed-off-by: Sean Owen 
(cherry picked from commit 00255bc63b1a3bbe80bedc639b88d4a8e3f88f72)
Signed-off-by: Sean Owen 
---
 dev/merge_spark_pr.py | 15 ++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/dev/merge_spark_pr.py b/dev/merge_spark_pr.py
index e21a39a6881..8555abe9bd0 100755
--- a/dev/merge_spark_pr.py
+++ b/dev/merge_spark_pr.py
@@ -372,7 +372,7 @@ def choose_jira_assignee(issue, asf_jira):
 except BaseException:
 # assume it's a user id, and try to assign (might fail, we 
just prompt again)
 assignee = asf_jira.user(raw_assignee)
-asf_jira.assign_issue(issue.key, assignee.name)
+assign_issue(issue.key, assignee.name)
 return assignee
 except KeyboardInterrupt:
 raise
@@ -381,6 +381,19 @@ def choose_jira_as

[spark] branch branch-3.4 updated: [SPARK-44813][INFRA] The Jira Python misses our assignee when it searches users again

2023-08-18 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new 3c5e57d886b [SPARK-44813][INFRA] The Jira Python misses our assignee 
when it searches users again
3c5e57d886b is described below

commit 3c5e57d886b81808370353781bfce2b2ce20a473
Author: Kent Yao 
AuthorDate: Fri Aug 18 10:02:43 2023 -0500

[SPARK-44813][INFRA] The Jira Python misses our assignee when it searches 
users again

### What changes were proposed in this pull request?

This PR creates an alternative to the assign_issue function in 
jira.client.JIRA.

The original one has an issue that it will search users again and only 
choose the assignee from 20 candidates. If it's unmatched, it picks the head 
blindly.

For example,

```python
>>> assignee = asf_jira.user("yao")
>>> "SPARK-44801"
'SPARK-44801'
>>> asf_jira.assign_issue(issue.key, assignee.name)
Traceback (most recent call last):
  File "", line 1, in 
NameError: name 'issue' is not defined
>>> asf_jira.assign_issue("SPARK-44801", assignee.name)
Traceback (most recent call last):
  File "", line 1, in 
  File 
"/Users/hzyaoqin/python/lib/python3.11/site-packages/jira/client.py", line 123, 
in wrapper
result = func(*arg_list, **kwargs)
 ^
  File 
"/Users/hzyaoqin/python/lib/python3.11/site-packages/jira/client.py", line 
1891, in assign_issue
self._session.put(url, data=json.dumps(payload))
  File 
"/Users/hzyaoqin/python/lib/python3.11/site-packages/requests/sessions.py", 
line 649, in put
return self.request("PUT", url, data=data, **kwargs)
   ^
  File 
"/Users/hzyaoqin/python/lib/python3.11/site-packages/jira/resilientsession.py", 
line 246, in request
elif raise_on_error(response, **processed_kwargs):
 
  File 
"/Users/hzyaoqin/python/lib/python3.11/site-packages/jira/resilientsession.py", 
line 71, in raise_on_error
raise JIRAError(
jira.exceptions.JIRAError: JiraError HTTP 400 url: 
https://issues.apache.org/jira/rest/api/latest/issue/SPARK-44801/assignee
response text = {"errorMessages":[],"errors":{"assignee":"User 
'airhot' cannot be assigned issues."}}
```

The Jira userid 'yao' fails to return my JIRA profile as a candidate(20 in 
total) to match. So, 'airhot' from the head replaces me as an assignee.

### Why are the changes needed?

bugfix for merge_spark_pr

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

test locally

```python
>>> def assign_issue(client: jira.client.JIRA, issue: int, assignee: str) 
-> bool:
... """Assign an issue to a user.
...
... Args:
... issue (Union[int, str]): the issue ID or key to assign
... assignee (str): the user to assign the issue to. None will set 
it to unassigned. -1 will set it to Automatic.
...
... Returns:
... bool
... """
... url = getattr(client, "_get_latest_url")(f"issue/{issue}/assignee")
... payload = {"name": assignee}
... getattr(client, "_session").put(url, data=json.dumps(payload))
... return True
...

>>>
>>> assign_issue(asf_jira, "SPARK-44801", "yao")
True
```

Closes #42496 from yaooqinn/SPARK-44813.

Authored-by: Kent Yao 
Signed-off-by: Sean Owen 
(cherry picked from commit 00255bc63b1a3bbe80bedc639b88d4a8e3f88f72)
Signed-off-by: Sean Owen 
---
 dev/merge_spark_pr.py | 15 ++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/dev/merge_spark_pr.py b/dev/merge_spark_pr.py
index 1621432c01c..8a5b6ebe8ef 100755
--- a/dev/merge_spark_pr.py
+++ b/dev/merge_spark_pr.py
@@ -372,7 +372,7 @@ def choose_jira_assignee(issue, asf_jira):
 except BaseException:
 # assume it's a user id, and try to assign (might fail, we 
just prompt again)
 assignee = asf_jira.user(raw_assignee)
-asf_jira.assign_issue(issue.key, assignee.name)
+assign_issue(issue.key, assignee.name)
 return assignee
 except KeyboardInterrupt:
 raise
@@ -381,6 +381,19 @@ def choose_jira_as

[spark] branch branch-3.5 updated: [SPARK-44813][INFRA] The Jira Python misses our assignee when it searches users again

2023-08-18 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new f7dd0a95727 [SPARK-44813][INFRA] The Jira Python misses our assignee 
when it searches users again
f7dd0a95727 is described below

commit f7dd0a95727259ff4b7a2f849798f8a93cf78b69
Author: Kent Yao 
AuthorDate: Fri Aug 18 10:02:43 2023 -0500

[SPARK-44813][INFRA] The Jira Python misses our assignee when it searches 
users again

### What changes were proposed in this pull request?

This PR creates an alternative to the assign_issue function in 
jira.client.JIRA.

The original one has an issue that it will search users again and only 
choose the assignee from 20 candidates. If it's unmatched, it picks the head 
blindly.

For example,

```python
>>> assignee = asf_jira.user("yao")
>>> "SPARK-44801"
'SPARK-44801'
>>> asf_jira.assign_issue(issue.key, assignee.name)
Traceback (most recent call last):
  File "", line 1, in 
NameError: name 'issue' is not defined
>>> asf_jira.assign_issue("SPARK-44801", assignee.name)
Traceback (most recent call last):
  File "", line 1, in 
  File 
"/Users/hzyaoqin/python/lib/python3.11/site-packages/jira/client.py", line 123, 
in wrapper
result = func(*arg_list, **kwargs)
 ^
  File 
"/Users/hzyaoqin/python/lib/python3.11/site-packages/jira/client.py", line 
1891, in assign_issue
self._session.put(url, data=json.dumps(payload))
  File 
"/Users/hzyaoqin/python/lib/python3.11/site-packages/requests/sessions.py", 
line 649, in put
return self.request("PUT", url, data=data, **kwargs)
   ^
  File 
"/Users/hzyaoqin/python/lib/python3.11/site-packages/jira/resilientsession.py", 
line 246, in request
elif raise_on_error(response, **processed_kwargs):
 
  File 
"/Users/hzyaoqin/python/lib/python3.11/site-packages/jira/resilientsession.py", 
line 71, in raise_on_error
raise JIRAError(
jira.exceptions.JIRAError: JiraError HTTP 400 url: 
https://issues.apache.org/jira/rest/api/latest/issue/SPARK-44801/assignee
response text = {"errorMessages":[],"errors":{"assignee":"User 
'airhot' cannot be assigned issues."}}
```

The Jira userid 'yao' fails to return my JIRA profile as a candidate(20 in 
total) to match. So, 'airhot' from the head replaces me as an assignee.

### Why are the changes needed?

bugfix for merge_spark_pr

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

test locally

```python
>>> def assign_issue(client: jira.client.JIRA, issue: int, assignee: str) 
-> bool:
... """Assign an issue to a user.
...
... Args:
... issue (Union[int, str]): the issue ID or key to assign
... assignee (str): the user to assign the issue to. None will set 
it to unassigned. -1 will set it to Automatic.
...
... Returns:
... bool
... """
... url = getattr(client, "_get_latest_url")(f"issue/{issue}/assignee")
... payload = {"name": assignee}
... getattr(client, "_session").put(url, data=json.dumps(payload))
... return True
...

>>>
>>> assign_issue(asf_jira, "SPARK-44801", "yao")
True
```

Closes #42496 from yaooqinn/SPARK-44813.

Authored-by: Kent Yao 
Signed-off-by: Sean Owen 
(cherry picked from commit 00255bc63b1a3bbe80bedc639b88d4a8e3f88f72)
Signed-off-by: Sean Owen 
---
 dev/merge_spark_pr.py | 15 ++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/dev/merge_spark_pr.py b/dev/merge_spark_pr.py
index bc51b8af2eb..37488557fea 100755
--- a/dev/merge_spark_pr.py
+++ b/dev/merge_spark_pr.py
@@ -373,7 +373,7 @@ def choose_jira_assignee(issue, asf_jira):
 except BaseException:
 # assume it's a user id, and try to assign (might fail, we 
just prompt again)
 assignee = asf_jira.user(raw_assignee)
-asf_jira.assign_issue(issue.key, assignee.name)
+assign_issue(issue.key, assignee.name)
 return assignee
 except KeyboardInterrupt:
 raise
@@ -382,6 +382,19 @@ def choose_jira_as

[spark] branch master updated: [SPARK-44813][INFRA] The Jira Python misses our assignee when it searches users again

2023-08-18 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 00255bc63b1 [SPARK-44813][INFRA] The Jira Python misses our assignee 
when it searches users again
00255bc63b1 is described below

commit 00255bc63b1a3bbe80bedc639b88d4a8e3f88f72
Author: Kent Yao 
AuthorDate: Fri Aug 18 10:02:43 2023 -0500

[SPARK-44813][INFRA] The Jira Python misses our assignee when it searches 
users again

### What changes were proposed in this pull request?

This PR creates an alternative to the assign_issue function in 
jira.client.JIRA.

The original one has an issue that it will search users again and only 
choose the assignee from 20 candidates. If it's unmatched, it picks the head 
blindly.

For example,

```python
>>> assignee = asf_jira.user("yao")
>>> "SPARK-44801"
'SPARK-44801'
>>> asf_jira.assign_issue(issue.key, assignee.name)
Traceback (most recent call last):
  File "", line 1, in 
NameError: name 'issue' is not defined
>>> asf_jira.assign_issue("SPARK-44801", assignee.name)
Traceback (most recent call last):
  File "", line 1, in 
  File 
"/Users/hzyaoqin/python/lib/python3.11/site-packages/jira/client.py", line 123, 
in wrapper
result = func(*arg_list, **kwargs)
 ^
  File 
"/Users/hzyaoqin/python/lib/python3.11/site-packages/jira/client.py", line 
1891, in assign_issue
self._session.put(url, data=json.dumps(payload))
  File 
"/Users/hzyaoqin/python/lib/python3.11/site-packages/requests/sessions.py", 
line 649, in put
return self.request("PUT", url, data=data, **kwargs)
   ^
  File 
"/Users/hzyaoqin/python/lib/python3.11/site-packages/jira/resilientsession.py", 
line 246, in request
elif raise_on_error(response, **processed_kwargs):
 
  File 
"/Users/hzyaoqin/python/lib/python3.11/site-packages/jira/resilientsession.py", 
line 71, in raise_on_error
raise JIRAError(
jira.exceptions.JIRAError: JiraError HTTP 400 url: 
https://issues.apache.org/jira/rest/api/latest/issue/SPARK-44801/assignee
response text = {"errorMessages":[],"errors":{"assignee":"User 
'airhot' cannot be assigned issues."}}
```

The Jira userid 'yao' fails to return my JIRA profile as a candidate(20 in 
total) to match. So, 'airhot' from the head replaces me as an assignee.

### Why are the changes needed?

bugfix for merge_spark_pr

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

test locally

```python
>>> def assign_issue(client: jira.client.JIRA, issue: int, assignee: str) 
-> bool:
... """Assign an issue to a user.
...
... Args:
... issue (Union[int, str]): the issue ID or key to assign
... assignee (str): the user to assign the issue to. None will set 
it to unassigned. -1 will set it to Automatic.
...
... Returns:
... bool
... """
... url = getattr(client, "_get_latest_url")(f"issue/{issue}/assignee")
... payload = {"name": assignee}
... getattr(client, "_session").put(url, data=json.dumps(payload))
... return True
...

>>>
>>> assign_issue(asf_jira, "SPARK-44801", "yao")
True
```

Closes #42496 from yaooqinn/SPARK-44813.

Authored-by: Kent Yao 
Signed-off-by: Sean Owen 
---
 dev/merge_spark_pr.py | 15 ++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/dev/merge_spark_pr.py b/dev/merge_spark_pr.py
index 27d0afe80ed..213798e5a1a 100755
--- a/dev/merge_spark_pr.py
+++ b/dev/merge_spark_pr.py
@@ -394,7 +394,7 @@ def choose_jira_assignee(issue, asf_jira):
 except BaseException:
 # assume it's a user id, and try to assign (might fail, we 
just prompt again)
 assignee = asf_jira.user(raw_assignee)
-asf_jira.assign_issue(issue.key, assignee.name)
+assign_issue(issue.key, assignee.name)
 return assignee
 except KeyboardInterrupt:
 raise
@@ -403,6 +403,19 @@ def choose_jira_assignee(issue, asf_jira):
 print("Error assigning JIRA, try again (or leave blank and fix 
man

[spark-website] branch asf-site updated: Add note on generative tooling to developer tools

2023-08-14 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/spark-website.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new fc89ca1ed2 Add note on generative tooling to developer tools
fc89ca1ed2 is described below

commit fc89ca1ed20551c66dc31ebbe28664d12689bd13
Author: zero323 
AuthorDate: Mon Aug 14 21:04:28 2023 -0500

Add note on generative tooling to developer tools

This PR adds notes on generative tooling and link to the relevant ASF 
policy.

As requested in comments to https://github.com/apache/spark/pull/42469

Author: zero323 

Closes #472 from zero323/SPARK-44782-generative-tooling-notes.
---
 developer-tools.md| 9 +
 site/developer-tools.html | 9 +
 2 files changed, 18 insertions(+)

diff --git a/developer-tools.md b/developer-tools.md
index e0a1844ae7..73e708116e 100644
--- a/developer-tools.md
+++ b/developer-tools.md
@@ -549,3 +549,12 @@ When running Spark tests through SBT, add `javaOptions in 
Test += "-agentpath:/p
 to `SparkBuild.scala` to launch the tests with the YourKit profiler agent 
enabled.  
 The platform-specific paths to the profiler agents are listed in the 
 https://www.yourkit.com/docs/java/help/agent.jsp";>YourKit 
documentation.
+
+Generative tooling usage
+
+In general, the ASF allows contributions co-authored using generative AI 
tools. However, there are several considerations when you submit a patch 
containing generated content.
+
+Foremost, you are required to disclose usage of such tool. Furthermore, you 
are responsible for ensuring that the terms and conditions of the tool in 
question are
+compatible with usage in an Open Source project and inclusion of the generated 
content doesn't pose a risk of copyright violation.
+
+Please refer to https://www.apache.org/legal/generative-tooling.html";>The ASF Generative 
Tooling Guidance for details and developments.
diff --git a/site/developer-tools.html b/site/developer-tools.html
index a43786ff91..de94619481 100644
--- a/site/developer-tools.html
+++ b/site/developer-tools.html
@@ -657,6 +657,15 @@ to SparkBuild.scala to
 The platform-specific paths to the profiler agents are listed in the 
 https://www.yourkit.com/docs/java/help/agent.jsp";>YourKit 
documentation.
 
+Generative tooling usage
+
+In general, the ASF allows contributions co-authored using generative AI 
tools. However, there are several considerations when you submit a patch 
containing generated content.
+
+Foremost, you are required to disclose usage of such tool. Furthermore, you 
are responsible for ensuring that the terms and conditions of the tool in 
question are
+compatible with usage in an Open Source project and inclusion of the generated 
content doesn’t pose a risk of copyright violation.
+
+Please refer to https://www.apache.org/legal/generative-tooling.html";>The ASF Generative 
Tooling Guidance for details and developments.
+
 
 
   


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark-website] branch asf-site updated: Added IOMETE to powered by Spark docs

2023-08-10 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/spark-website.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new 61c79d6c34 Added IOMETE to powered by Spark docs
61c79d6c34 is described below

commit 61c79d6c34c151586a2bb02be1d0c4d86627ce31
Author: Fuad Musayev 
AuthorDate: Thu Aug 10 21:27:35 2023 -0500

Added IOMETE to powered by Spark docs

Added IOMETE Data Lakehouse platform to Powered by Spark docs.

Author: Fuad Musayev 

Closes #471 from fmusayev/powered-by-iomete.
---
 powered-by.md| 1 +
 site/powered-by.html | 1 +
 2 files changed, 2 insertions(+)

diff --git a/powered-by.md b/powered-by.md
index 048108882b..8b2cfa4df1 100644
--- a/powered-by.md
+++ b/powered-by.md
@@ -131,6 +131,7 @@ and external data sources, driving holistic and actionable 
insights.
 - http://www.infoobjects.com";>InfoObjects
   - Award winning Big Data consulting company with focus on Spark and Hadoop
 - http://en.inspur.com";>Inspur
+- https://iomete.com";>IOMETE - IOMETE offers a modern Cloud-Prem 
Data Lakehouse platform, extending cloud-like experience to on-premise and 
private clouds. Utilizing Apache Spark as the query engine, we enable running 
Spark Jobs and ML applications on AWS, Azure, GCP, or On-Prem. Discover more at 
https://iomete.com";>IOMETE.
 - http://www.sehir.edu.tr/en/";>Istanbul Sehir University
 - http://www.kenshoo.com/";>Kenshoo
   - Digital marketing solutions and predictive media optimization
diff --git a/site/powered-by.html b/site/powered-by.html
index de8eb55ce2..aa07b10347 100644
--- a/site/powered-by.html
+++ b/site/powered-by.html
@@ -319,6 +319,7 @@ environments or on bare-metal infrastructures.
 
   
   http://en.inspur.com";>Inspur
+  https://iomete.com";>IOMETE - IOMETE offers a modern 
Cloud-Prem Data Lakehouse platform, extending cloud-like experience to 
on-premise and private clouds. Utilizing Apache Spark as the query engine, we 
enable running Spark Jobs and ML applications on AWS, Azure, GCP, or On-Prem. 
Discover more at https://iomete.com";>IOMETE.
   http://www.sehir.edu.tr/en/";>Istanbul Sehir University
   http://www.kenshoo.com/";>Kenshoo
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-44650][CORE] `spark.executor.defaultJavaOptions` Check illegal java options

2023-08-06 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 41a2a7daeee [SPARK-44650][CORE] `spark.executor.defaultJavaOptions` 
Check illegal java options
41a2a7daeee is described below

commit 41a2a7daeee0a25d39f30364a694becf54ab37e7
Author: sychen 
AuthorDate: Sun Aug 6 08:24:40 2023 -0500

[SPARK-44650][CORE] `spark.executor.defaultJavaOptions` Check illegal java 
options

### What changes were proposed in this pull request?

### Why are the changes needed?
Command
```bash
 ./bin/spark-shell --conf spark.executor.extraJavaOptions='-Dspark.foo=bar'
```
Error
```
spark.executor.extraJavaOptions is not allowed to set Spark options (was 
'-Dspark.foo=bar'). Set them directly on a SparkConf or in a properties file 
when using ./bin/spark-submit.
```

Command
```bash
./bin/spark-shell --conf spark.executor.defaultJavaOptions='-Dspark.foo=bar'
```
Start up normally.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
local test & add UT

```
./bin/spark-shell --conf spark.executor.defaultJavaOptions='-Dspark.foo=bar'
```

```
spark.executor.defaultJavaOptions is not allowed to set Spark options (was 
'-Dspark.foo=bar'). Set them directly on a SparkConf or in a properties file 
when using ./bin/spark-submit.
```

Closes #42313 from cxzl25/SPARK-44650.

Authored-by: sychen 
Signed-off-by: Sean Owen 
---
 .../main/scala/org/apache/spark/SparkConf.scala| 25 +++---
 .../scala/org/apache/spark/SparkConfSuite.scala| 14 
 2 files changed, 27 insertions(+), 12 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/SparkConf.scala 
b/core/src/main/scala/org/apache/spark/SparkConf.scala
index 813a14acd19..8c054d24b10 100644
--- a/core/src/main/scala/org/apache/spark/SparkConf.scala
+++ b/core/src/main/scala/org/apache/spark/SparkConf.scala
@@ -503,8 +503,6 @@ class SparkConf(loadDefaults: Boolean) extends Cloneable 
with Logging with Seria
   logWarning(msg)
 }
 
-val executorOptsKey = EXECUTOR_JAVA_OPTIONS.key
-
 // Used by Yarn in 1.1 and before
 sys.props.get("spark.driver.libraryPath").foreach { value =>
   val warning =
@@ -518,16 +516,19 @@ class SparkConf(loadDefaults: Boolean) extends Cloneable 
with Logging with Seria
 }
 
 // Validate spark.executor.extraJavaOptions
-getOption(executorOptsKey).foreach { javaOpts =>
-  if (javaOpts.contains("-Dspark")) {
-val msg = s"$executorOptsKey is not allowed to set Spark options (was 
'$javaOpts'). " +
-  "Set them directly on a SparkConf or in a properties file when using 
./bin/spark-submit."
-throw new Exception(msg)
-  }
-  if (javaOpts.contains("-Xmx")) {
-val msg = s"$executorOptsKey is not allowed to specify max heap memory 
settings " +
-  s"(was '$javaOpts'). Use spark.executor.memory instead."
-throw new Exception(msg)
+Seq(EXECUTOR_JAVA_OPTIONS.key, 
"spark.executor.defaultJavaOptions").foreach { executorOptsKey =>
+  getOption(executorOptsKey).foreach { javaOpts =>
+if (javaOpts.contains("-Dspark")) {
+  val msg = s"$executorOptsKey is not allowed to set Spark options 
(was '$javaOpts'). " +
+"Set them directly on a SparkConf or in a properties file " +
+"when using ./bin/spark-submit."
+  throw new Exception(msg)
+}
+if (javaOpts.contains("-Xmx")) {
+  val msg = s"$executorOptsKey is not allowed to specify max heap 
memory settings " +
+s"(was '$javaOpts'). Use spark.executor.memory instead."
+  throw new Exception(msg)
+}
   }
 }
 
diff --git a/core/src/test/scala/org/apache/spark/SparkConfSuite.scala 
b/core/src/test/scala/org/apache/spark/SparkConfSuite.scala
index 74fd7816221..75e22e1418b 100644
--- a/core/src/test/scala/org/apache/spark/SparkConfSuite.scala
+++ b/core/src/test/scala/org/apache/spark/SparkConfSuite.scala
@@ -498,6 +498,20 @@ class SparkConfSuite extends SparkFunSuite with 
LocalSparkContext with ResetSyst
 }
 }
   }
+
+  test("SPARK-44650: spark.executor.defaultJavaOptions Check illegal java 
options") {
+val conf = new SparkConf()
+conf.validateSettings()
+conf.set(EXECUTOR_JAVA_OPTIONS.key, "-Dspark.foo=bar")
+intercept[Exception] {
+  conf.validateSettings()
+}
+conf.remove(EXECUTOR_JAVA_OPTIONS.key)

[spark] branch branch-3.5 updated: [MINOR][DOC] Fix a typo in ResolveReferencesInUpdate scaladoc

2023-08-03 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 1ad71ffc33d [MINOR][DOC] Fix a typo in ResolveReferencesInUpdate 
scaladoc
1ad71ffc33d is described below

commit 1ad71ffc33ddf0861f62e389a5e8ad438f9afb26
Author: Sergii Druzkin <65374769+sdruz...@users.noreply.github.com>
AuthorDate: Thu Aug 3 18:52:44 2023 -0500

[MINOR][DOC] Fix a typo in ResolveReferencesInUpdate scaladoc

### What changes were proposed in this pull request?
Fixed a typo in the ResolveReferencesInUpdate documentation.

### Why are the changes needed?

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
CI

Closes #42322 from sdruzkin/master.

Authored-by: Sergii Druzkin <65374769+sdruz...@users.noreply.github.com>
Signed-off-by: Sean Owen 
(cherry picked from commit 52a9002fa2383bd9b26c77e62e0c6bcd46f8944b)
Signed-off-by: Sean Owen 
---
 .../apache/spark/sql/catalyst/analysis/ResolveReferencesInUpdate.scala  | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveReferencesInUpdate.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveReferencesInUpdate.scala
index cebc1e25f92..ead323ce985 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveReferencesInUpdate.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveReferencesInUpdate.scala
@@ -25,7 +25,7 @@ import org.apache.spark.sql.errors.QueryCompilationErrors
 /**
  * A virtual rule to resolve [[UnresolvedAttribute]] in [[UpdateTable]]. It's 
only used by the real
  * rule `ResolveReferences`. The column resolution order for [[UpdateTable]] 
is:
- * 1. Resolves the column to `AttributeReference`` with the output of the 
child plan. This
+ * 1. Resolves the column to `AttributeReference` with the output of the child 
plan. This
  *includes metadata columns as well.
  * 2. Resolves the column to a literal function which is allowed to be invoked 
without braces, e.g.
  *`SELECT col, current_date FROM t`.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [MINOR][DOC] Fix a typo in ResolveReferencesInUpdate scaladoc

2023-08-03 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 52a9002fa23 [MINOR][DOC] Fix a typo in ResolveReferencesInUpdate 
scaladoc
52a9002fa23 is described below

commit 52a9002fa2383bd9b26c77e62e0c6bcd46f8944b
Author: Sergii Druzkin <65374769+sdruz...@users.noreply.github.com>
AuthorDate: Thu Aug 3 18:52:44 2023 -0500

[MINOR][DOC] Fix a typo in ResolveReferencesInUpdate scaladoc

### What changes were proposed in this pull request?
Fixed a typo in the ResolveReferencesInUpdate documentation.

### Why are the changes needed?

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
CI

Closes #42322 from sdruzkin/master.

Authored-by: Sergii Druzkin <65374769+sdruz...@users.noreply.github.com>
Signed-off-by: Sean Owen 
---
 .../apache/spark/sql/catalyst/analysis/ResolveReferencesInUpdate.scala  | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveReferencesInUpdate.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveReferencesInUpdate.scala
index cebc1e25f92..ead323ce985 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveReferencesInUpdate.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveReferencesInUpdate.scala
@@ -25,7 +25,7 @@ import org.apache.spark.sql.errors.QueryCompilationErrors
 /**
  * A virtual rule to resolve [[UnresolvedAttribute]] in [[UpdateTable]]. It's 
only used by the real
  * rule `ResolveReferences`. The column resolution order for [[UpdateTable]] 
is:
- * 1. Resolves the column to `AttributeReference`` with the output of the 
child plan. This
+ * 1. Resolves the column to `AttributeReference` with the output of the child 
plan. This
  *includes metadata columns as well.
  * 2. Resolves the column to a literal function which is allowed to be invoked 
without braces, e.g.
  *`SELECT col, current_date FROM t`.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.5 updated: [SPARK-44604][BUILD] Upgrade Netty to 4.1.96.Final

2023-08-01 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 7b68ccd1cb4 [SPARK-44604][BUILD] Upgrade Netty to 4.1.96.Final
7b68ccd1cb4 is described below

commit 7b68ccd1cb48c38052b0458c5192d5ffcfc97409
Author: panbingkun 
AuthorDate: Tue Aug 1 08:55:27 2023 -0500

[SPARK-44604][BUILD] Upgrade Netty to 4.1.96.Final

### What changes were proposed in this pull request?
The pr aims to upgrade Netty from  4.1.93.Final to 4.1.96.Final.

### Why are the changes needed?
1.Netty 4.1.93.Final VS 4.1.96.Final

https://github.com/netty/netty/compare/netty-4.1.93.Final...netty-4.1.96.Final

2.Netty newest version Fix a possible security issue:

([CVE-2023-34462](https://github.com/netty/netty/security/advisories/GHSA-6mjq-h674-j845))
 when using SniHandler.

3.Netty full release notes:
https://netty.io/news/2023/07/27/4-1-96-Final.html
https://netty.io/news/2023/07/20/4-1-95-Final.html
https://netty.io/news/2023/06/19/4-1-94-Final.html

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA.

Closes #42232 from panbingkun/SPARK-44604.

Authored-by: panbingkun 
Signed-off-by: Sean Owen 
(cherry picked from commit 8053d5f16541edb8e17cbc50684abae69187ff5a)
Signed-off-by: Sean Owen 
---
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 36 +--
 pom.xml   |  6 +-
 2 files changed, 19 insertions(+), 23 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index beae2232202..566f7c9a3ea 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -183,24 +183,24 @@ metrics-jmx/4.2.19//metrics-jmx-4.2.19.jar
 metrics-json/4.2.19//metrics-json-4.2.19.jar
 metrics-jvm/4.2.19//metrics-jvm-4.2.19.jar
 minlog/1.3.0//minlog-1.3.0.jar
-netty-all/4.1.93.Final//netty-all-4.1.93.Final.jar
-netty-buffer/4.1.93.Final//netty-buffer-4.1.93.Final.jar
-netty-codec-http/4.1.93.Final//netty-codec-http-4.1.93.Final.jar
-netty-codec-http2/4.1.93.Final//netty-codec-http2-4.1.93.Final.jar
-netty-codec-socks/4.1.93.Final//netty-codec-socks-4.1.93.Final.jar
-netty-codec/4.1.93.Final//netty-codec-4.1.93.Final.jar
-netty-common/4.1.93.Final//netty-common-4.1.93.Final.jar
-netty-handler-proxy/4.1.93.Final//netty-handler-proxy-4.1.93.Final.jar
-netty-handler/4.1.93.Final//netty-handler-4.1.93.Final.jar
-netty-resolver/4.1.93.Final//netty-resolver-4.1.93.Final.jar
-netty-transport-classes-epoll/4.1.93.Final//netty-transport-classes-epoll-4.1.93.Final.jar
-netty-transport-classes-kqueue/4.1.93.Final//netty-transport-classes-kqueue-4.1.93.Final.jar
-netty-transport-native-epoll/4.1.93.Final/linux-aarch_64/netty-transport-native-epoll-4.1.93.Final-linux-aarch_64.jar
-netty-transport-native-epoll/4.1.93.Final/linux-x86_64/netty-transport-native-epoll-4.1.93.Final-linux-x86_64.jar
-netty-transport-native-kqueue/4.1.93.Final/osx-aarch_64/netty-transport-native-kqueue-4.1.93.Final-osx-aarch_64.jar
-netty-transport-native-kqueue/4.1.93.Final/osx-x86_64/netty-transport-native-kqueue-4.1.93.Final-osx-x86_64.jar
-netty-transport-native-unix-common/4.1.93.Final//netty-transport-native-unix-common-4.1.93.Final.jar
-netty-transport/4.1.93.Final//netty-transport-4.1.93.Final.jar
+netty-all/4.1.96.Final//netty-all-4.1.96.Final.jar
+netty-buffer/4.1.96.Final//netty-buffer-4.1.96.Final.jar
+netty-codec-http/4.1.96.Final//netty-codec-http-4.1.96.Final.jar
+netty-codec-http2/4.1.96.Final//netty-codec-http2-4.1.96.Final.jar
+netty-codec-socks/4.1.96.Final//netty-codec-socks-4.1.96.Final.jar
+netty-codec/4.1.96.Final//netty-codec-4.1.96.Final.jar
+netty-common/4.1.96.Final//netty-common-4.1.96.Final.jar
+netty-handler-proxy/4.1.96.Final//netty-handler-proxy-4.1.96.Final.jar
+netty-handler/4.1.96.Final//netty-handler-4.1.96.Final.jar
+netty-resolver/4.1.96.Final//netty-resolver-4.1.96.Final.jar
+netty-transport-classes-epoll/4.1.96.Final//netty-transport-classes-epoll-4.1.96.Final.jar
+netty-transport-classes-kqueue/4.1.96.Final//netty-transport-classes-kqueue-4.1.96.Final.jar
+netty-transport-native-epoll/4.1.96.Final/linux-aarch_64/netty-transport-native-epoll-4.1.96.Final-linux-aarch_64.jar
+netty-transport-native-epoll/4.1.96.Final/linux-x86_64/netty-transport-native-epoll-4.1.96.Final-linux-x86_64.jar
+netty-transport-native-kqueue/4.1.96.Final/osx-aarch_64/netty-transport-native-kqueue-4.1.96.Final-osx-aarch_64.jar
+netty-transport-native-kqueue/4.1.96.Final/osx-x86_64/netty-transport-native-kqueue-4.1.96.Final-osx-x86_64.jar
+netty-transport-native-unix-common/4.1.96.Final//netty-transport-native-unix-common-4.1.96.Final.jar
+netty-transport/4.1.96.Final//netty-transport-4.1.96

[spark] branch master updated: [SPARK-44604][BUILD] Upgrade Netty to 4.1.96.Final

2023-08-01 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 8053d5f1654 [SPARK-44604][BUILD] Upgrade Netty to 4.1.96.Final
8053d5f1654 is described below

commit 8053d5f16541edb8e17cbc50684abae69187ff5a
Author: panbingkun 
AuthorDate: Tue Aug 1 08:55:27 2023 -0500

[SPARK-44604][BUILD] Upgrade Netty to 4.1.96.Final

### What changes were proposed in this pull request?
The pr aims to upgrade Netty from  4.1.93.Final to 4.1.96.Final.

### Why are the changes needed?
1.Netty 4.1.93.Final VS 4.1.96.Final

https://github.com/netty/netty/compare/netty-4.1.93.Final...netty-4.1.96.Final

2.Netty newest version Fix a possible security issue:

([CVE-2023-34462](https://github.com/netty/netty/security/advisories/GHSA-6mjq-h674-j845))
 when using SniHandler.

3.Netty full release notes:
https://netty.io/news/2023/07/27/4-1-96-Final.html
https://netty.io/news/2023/07/20/4-1-95-Final.html
https://netty.io/news/2023/06/19/4-1-94-Final.html

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA.

Closes #42232 from panbingkun/SPARK-44604.

Authored-by: panbingkun 
Signed-off-by: Sean Owen 
---
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 36 +--
 pom.xml   |  6 +-
 2 files changed, 19 insertions(+), 23 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index 3b54ef43f6a..52a1d00f204 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -183,24 +183,24 @@ metrics-jmx/4.2.19//metrics-jmx-4.2.19.jar
 metrics-json/4.2.19//metrics-json-4.2.19.jar
 metrics-jvm/4.2.19//metrics-jvm-4.2.19.jar
 minlog/1.3.0//minlog-1.3.0.jar
-netty-all/4.1.93.Final//netty-all-4.1.93.Final.jar
-netty-buffer/4.1.93.Final//netty-buffer-4.1.93.Final.jar
-netty-codec-http/4.1.93.Final//netty-codec-http-4.1.93.Final.jar
-netty-codec-http2/4.1.93.Final//netty-codec-http2-4.1.93.Final.jar
-netty-codec-socks/4.1.93.Final//netty-codec-socks-4.1.93.Final.jar
-netty-codec/4.1.93.Final//netty-codec-4.1.93.Final.jar
-netty-common/4.1.93.Final//netty-common-4.1.93.Final.jar
-netty-handler-proxy/4.1.93.Final//netty-handler-proxy-4.1.93.Final.jar
-netty-handler/4.1.93.Final//netty-handler-4.1.93.Final.jar
-netty-resolver/4.1.93.Final//netty-resolver-4.1.93.Final.jar
-netty-transport-classes-epoll/4.1.93.Final//netty-transport-classes-epoll-4.1.93.Final.jar
-netty-transport-classes-kqueue/4.1.93.Final//netty-transport-classes-kqueue-4.1.93.Final.jar
-netty-transport-native-epoll/4.1.93.Final/linux-aarch_64/netty-transport-native-epoll-4.1.93.Final-linux-aarch_64.jar
-netty-transport-native-epoll/4.1.93.Final/linux-x86_64/netty-transport-native-epoll-4.1.93.Final-linux-x86_64.jar
-netty-transport-native-kqueue/4.1.93.Final/osx-aarch_64/netty-transport-native-kqueue-4.1.93.Final-osx-aarch_64.jar
-netty-transport-native-kqueue/4.1.93.Final/osx-x86_64/netty-transport-native-kqueue-4.1.93.Final-osx-x86_64.jar
-netty-transport-native-unix-common/4.1.93.Final//netty-transport-native-unix-common-4.1.93.Final.jar
-netty-transport/4.1.93.Final//netty-transport-4.1.93.Final.jar
+netty-all/4.1.96.Final//netty-all-4.1.96.Final.jar
+netty-buffer/4.1.96.Final//netty-buffer-4.1.96.Final.jar
+netty-codec-http/4.1.96.Final//netty-codec-http-4.1.96.Final.jar
+netty-codec-http2/4.1.96.Final//netty-codec-http2-4.1.96.Final.jar
+netty-codec-socks/4.1.96.Final//netty-codec-socks-4.1.96.Final.jar
+netty-codec/4.1.96.Final//netty-codec-4.1.96.Final.jar
+netty-common/4.1.96.Final//netty-common-4.1.96.Final.jar
+netty-handler-proxy/4.1.96.Final//netty-handler-proxy-4.1.96.Final.jar
+netty-handler/4.1.96.Final//netty-handler-4.1.96.Final.jar
+netty-resolver/4.1.96.Final//netty-resolver-4.1.96.Final.jar
+netty-transport-classes-epoll/4.1.96.Final//netty-transport-classes-epoll-4.1.96.Final.jar
+netty-transport-classes-kqueue/4.1.96.Final//netty-transport-classes-kqueue-4.1.96.Final.jar
+netty-transport-native-epoll/4.1.96.Final/linux-aarch_64/netty-transport-native-epoll-4.1.96.Final-linux-aarch_64.jar
+netty-transport-native-epoll/4.1.96.Final/linux-x86_64/netty-transport-native-epoll-4.1.96.Final-linux-x86_64.jar
+netty-transport-native-kqueue/4.1.96.Final/osx-aarch_64/netty-transport-native-kqueue-4.1.96.Final-osx-aarch_64.jar
+netty-transport-native-kqueue/4.1.96.Final/osx-x86_64/netty-transport-native-kqueue-4.1.96.Final-osx-x86_64.jar
+netty-transport-native-unix-common/4.1.96.Final//netty-transport-native-unix-common-4.1.96.Final.jar
+netty-transport/4.1.96.Final//netty-transport-4.1.96.Final.jar
 objenesis/3.3//objenesis-3.3.jar
 okhttp/3.12.12//okhttp-3.12.12.jar
 okio/1.15.0//okio-1.15.0.jar

[spark] branch branch-3.5 updated: [SPARK-44542][CORE] Eagerly load SparkExitCode class in exception handler

2023-07-30 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 47224b39f6c [SPARK-44542][CORE] Eagerly load SparkExitCode class in 
exception handler
47224b39f6c is described below

commit 47224b39f6c937cadf5946870a4dc8d0dabdfa40
Author: Xianjin 
AuthorDate: Sun Jul 30 22:12:39 2023 -0500

[SPARK-44542][CORE] Eagerly load SparkExitCode class in exception handler

### What changes were proposed in this pull request?
1. eagerly load SparkExitCode class in the the SparkUncaughtExceptionHandler

### Why are the changes needed?
In some extreme case, it's possible for SparkUncaughtExceptionHandler's 
exit/halt process function calls throw
an exception if the SparkExitCode is not loaded earlier, See corresponding 
jira: [SPARK-44542](https://issues.apache.org/jira/browse/SPARK-44542) for more 
details.

By eagerly load SparkExitCode class, we can make sure at least the 
halt/exit would work properly.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
No logic change, hence no new UTs.

Closes #42195 from advancedxy/SPARK-44542.

Authored-by: Xianjin 
Signed-off-by: Sean Owen 
(cherry picked from commit 32498b390db99c9451b14c643456437a023c0d93)
Signed-off-by: Sean Owen 
---
 .../scala/org/apache/spark/util/SparkUncaughtExceptionHandler.scala | 6 ++
 1 file changed, 6 insertions(+)

diff --git 
a/core/src/main/scala/org/apache/spark/util/SparkUncaughtExceptionHandler.scala 
b/core/src/main/scala/org/apache/spark/util/SparkUncaughtExceptionHandler.scala
index e7712875536..b24129eb369 100644
--- 
a/core/src/main/scala/org/apache/spark/util/SparkUncaughtExceptionHandler.scala
+++ 
b/core/src/main/scala/org/apache/spark/util/SparkUncaughtExceptionHandler.scala
@@ -28,6 +28,12 @@ import org.apache.spark.internal.Logging
 private[spark] class SparkUncaughtExceptionHandler(val 
exitOnUncaughtException: Boolean = true)
   extends Thread.UncaughtExceptionHandler with Logging {
 
+  locally {
+// eagerly load SparkExitCode class, so the System.exit and runtime.halt 
have a chance to be
+// executed when the disk containing Spark jars is corrupted. See 
SPARK-44542 for more details.
+val _ = SparkExitCode.OOM
+  }
+
   override def uncaughtException(thread: Thread, exception: Throwable): Unit = 
{
 try {
   // Make it explicit that uncaught exceptions are thrown when container 
is shutting down.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-44542][CORE] Eagerly load SparkExitCode class in exception handler

2023-07-30 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 32498b390db [SPARK-44542][CORE] Eagerly load SparkExitCode class in 
exception handler
32498b390db is described below

commit 32498b390db99c9451b14c643456437a023c0d93
Author: Xianjin 
AuthorDate: Sun Jul 30 22:12:39 2023 -0500

[SPARK-44542][CORE] Eagerly load SparkExitCode class in exception handler

### What changes were proposed in this pull request?
1. eagerly load SparkExitCode class in the the SparkUncaughtExceptionHandler

### Why are the changes needed?
In some extreme case, it's possible for SparkUncaughtExceptionHandler's 
exit/halt process function calls throw
an exception if the SparkExitCode is not loaded earlier, See corresponding 
jira: [SPARK-44542](https://issues.apache.org/jira/browse/SPARK-44542) for more 
details.

By eagerly load SparkExitCode class, we can make sure at least the 
halt/exit would work properly.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
No logic change, hence no new UTs.

Closes #42195 from advancedxy/SPARK-44542.

Authored-by: Xianjin 
Signed-off-by: Sean Owen 
---
 .../scala/org/apache/spark/util/SparkUncaughtExceptionHandler.scala | 6 ++
 1 file changed, 6 insertions(+)

diff --git 
a/core/src/main/scala/org/apache/spark/util/SparkUncaughtExceptionHandler.scala 
b/core/src/main/scala/org/apache/spark/util/SparkUncaughtExceptionHandler.scala
index e7712875536..b24129eb369 100644
--- 
a/core/src/main/scala/org/apache/spark/util/SparkUncaughtExceptionHandler.scala
+++ 
b/core/src/main/scala/org/apache/spark/util/SparkUncaughtExceptionHandler.scala
@@ -28,6 +28,12 @@ import org.apache.spark.internal.Logging
 private[spark] class SparkUncaughtExceptionHandler(val 
exitOnUncaughtException: Boolean = true)
   extends Thread.UncaughtExceptionHandler with Logging {
 
+  locally {
+// eagerly load SparkExitCode class, so the System.exit and runtime.halt 
have a chance to be
+// executed when the disk containing Spark jars is corrupted. See 
SPARK-44542 for more details.
+val _ = SparkExitCode.OOM
+  }
+
   override def uncaughtException(thread: Thread, exception: Throwable): Unit = 
{
 try {
   // Make it explicit that uncaught exceptions are thrown when container 
is shutting down.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.4 updated: [SPARK-44585][MLLIB] Fix warning condition in MLLib RankingMetrics ndcgAk

2023-07-28 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new f19a953b647 [SPARK-44585][MLLIB] Fix warning condition in MLLib 
RankingMetrics ndcgAk
f19a953b647 is described below

commit f19a953b6471673f89d689bea20e0d53026f7b5b
Author: Guilhem Vuillier <101632595+guilhem-de...@users.noreply.github.com>
AuthorDate: Fri Jul 28 17:29:47 2023 -0500

[SPARK-44585][MLLIB] Fix warning condition in MLLib RankingMetrics ndcgAk

### What changes were proposed in this pull request?

This PR fixes the condition to raise the following warning in MLLib's 
RankingMetrics ndcgAk function: "# of ground truth set and # of relevance value 
set should be equal, check input data"

The logic for raising warnings is faulty at the moment: it raises a warning 
if the `rel` input is empty and `lab.size` and `rel.size` are not equal.

The logic should be to raise a warning if `rel` input is **not empty** and 
`lab.size` and `rel.size` are not equal.

This warning was added in the following PR: 
https://github.com/apache/spark/pull/36843

### Why are the changes needed?

With the current logic, RankingMetrics will:
- raise incorrect warning when a user is using it in the "binary" mode 
(i.e. no relevance values in the input)
- not raise warning (that could be necessary) when the user is using it in 
the "non-binary" model (i.e. with relevance values in the input)

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
No change made to the test suite for RankingMetrics: 
https://github.com/uch/spark/blob/a172172329cc78b50f716924f2a344517deb71fc/mllib/src/test/scala/org/apache/spark/mllib/evaluation/RankingMetricsSuite.scala

Closes #42207 from guilhem-depop/patch-1.

Authored-by: Guilhem Vuillier 
<101632595+guilhem-de...@users.noreply.github.com>
Signed-off-by: Sean Owen 
(cherry picked from commit 72af2c0fbc6673a5e49f1fd6693fe2c90141a84f)
Signed-off-by: Sean Owen 
---
 .../scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git 
a/mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala 
b/mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala
index 37e57736574..a3316d8a8fa 100644
--- 
a/mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala
+++ 
b/mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala
@@ -140,6 +140,9 @@ class RankingMetrics[T: ClassTag] @Since("1.2.0") 
(predictionAndLabels: RDD[_ <:
* and the NDCG is obtained by dividing the DCG value on the ground truth 
set. In the current
* implementation, the relevance value is binary if the relevance value is 
empty.
 
+   * If the relevance value is not empty but its size doesn't match the ground 
truth set size,
+   * a log warning is generated.
+   *
* If a query has an empty ground truth set, zero will be used as ndcg 
together with
* a log warning.
*
@@ -157,7 +160,7 @@ class RankingMetrics[T: ClassTag] @Since("1.2.0") 
(predictionAndLabels: RDD[_ <:
   val useBinary = rel.isEmpty
   val labSet = lab.toSet
   val relMap = Utils.toMap(lab, rel)
-  if (useBinary && lab.size != rel.size) {
+  if (!useBinary && lab.size != rel.size) {
 logWarning(
   "# of ground truth set and # of relevance value set should be equal, 
" +
 "check input data")


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.5 updated: [SPARK-44585][MLLIB] Fix warning condition in MLLib RankingMetrics ndcgAk

2023-07-28 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new d0fa5a75d17 [SPARK-44585][MLLIB] Fix warning condition in MLLib 
RankingMetrics ndcgAk
d0fa5a75d17 is described below

commit d0fa5a75d17335e60aefbb554adb9b3fce1f97ff
Author: Guilhem Vuillier <101632595+guilhem-de...@users.noreply.github.com>
AuthorDate: Fri Jul 28 17:29:47 2023 -0500

[SPARK-44585][MLLIB] Fix warning condition in MLLib RankingMetrics ndcgAk

### What changes were proposed in this pull request?

This PR fixes the condition to raise the following warning in MLLib's 
RankingMetrics ndcgAk function: "# of ground truth set and # of relevance value 
set should be equal, check input data"

The logic for raising warnings is faulty at the moment: it raises a warning 
if the `rel` input is empty and `lab.size` and `rel.size` are not equal.

The logic should be to raise a warning if `rel` input is **not empty** and 
`lab.size` and `rel.size` are not equal.

This warning was added in the following PR: 
https://github.com/apache/spark/pull/36843

### Why are the changes needed?

With the current logic, RankingMetrics will:
- raise incorrect warning when a user is using it in the "binary" mode 
(i.e. no relevance values in the input)
- not raise warning (that could be necessary) when the user is using it in 
the "non-binary" model (i.e. with relevance values in the input)

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
No change made to the test suite for RankingMetrics: 
https://github.com/uch/spark/blob/a172172329cc78b50f716924f2a344517deb71fc/mllib/src/test/scala/org/apache/spark/mllib/evaluation/RankingMetricsSuite.scala

Closes #42207 from guilhem-depop/patch-1.

Authored-by: Guilhem Vuillier 
<101632595+guilhem-de...@users.noreply.github.com>
Signed-off-by: Sean Owen 
(cherry picked from commit 72af2c0fbc6673a5e49f1fd6693fe2c90141a84f)
Signed-off-by: Sean Owen 
---
 .../scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git 
a/mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala 
b/mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala
index 37e57736574..a3316d8a8fa 100644
--- 
a/mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala
+++ 
b/mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala
@@ -140,6 +140,9 @@ class RankingMetrics[T: ClassTag] @Since("1.2.0") 
(predictionAndLabels: RDD[_ <:
* and the NDCG is obtained by dividing the DCG value on the ground truth 
set. In the current
* implementation, the relevance value is binary if the relevance value is 
empty.
 
+   * If the relevance value is not empty but its size doesn't match the ground 
truth set size,
+   * a log warning is generated.
+   *
* If a query has an empty ground truth set, zero will be used as ndcg 
together with
* a log warning.
*
@@ -157,7 +160,7 @@ class RankingMetrics[T: ClassTag] @Since("1.2.0") 
(predictionAndLabels: RDD[_ <:
   val useBinary = rel.isEmpty
   val labSet = lab.toSet
   val relMap = Utils.toMap(lab, rel)
-  if (useBinary && lab.size != rel.size) {
+  if (!useBinary && lab.size != rel.size) {
 logWarning(
   "# of ground truth set and # of relevance value set should be equal, 
" +
 "check input data")


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-44585][MLLIB] Fix warning condition in MLLib RankingMetrics ndcgAk

2023-07-28 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 72af2c0fbc6 [SPARK-44585][MLLIB] Fix warning condition in MLLib 
RankingMetrics ndcgAk
72af2c0fbc6 is described below

commit 72af2c0fbc6673a5e49f1fd6693fe2c90141a84f
Author: Guilhem Vuillier <101632595+guilhem-de...@users.noreply.github.com>
AuthorDate: Fri Jul 28 17:29:47 2023 -0500

[SPARK-44585][MLLIB] Fix warning condition in MLLib RankingMetrics ndcgAk

### What changes were proposed in this pull request?

This PR fixes the condition to raise the following warning in MLLib's 
RankingMetrics ndcgAk function: "# of ground truth set and # of relevance value 
set should be equal, check input data"

The logic for raising warnings is faulty at the moment: it raises a warning 
if the `rel` input is empty and `lab.size` and `rel.size` are not equal.

The logic should be to raise a warning if `rel` input is **not empty** and 
`lab.size` and `rel.size` are not equal.

This warning was added in the following PR: 
https://github.com/apache/spark/pull/36843

### Why are the changes needed?

With the current logic, RankingMetrics will:
- raise incorrect warning when a user is using it in the "binary" mode 
(i.e. no relevance values in the input)
- not raise warning (that could be necessary) when the user is using it in 
the "non-binary" model (i.e. with relevance values in the input)

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
No change made to the test suite for RankingMetrics: 
https://github.com/uch/spark/blob/a172172329cc78b50f716924f2a344517deb71fc/mllib/src/test/scala/org/apache/spark/mllib/evaluation/RankingMetricsSuite.scala

Closes #42207 from guilhem-depop/patch-1.

Authored-by: Guilhem Vuillier 
<101632595+guilhem-de...@users.noreply.github.com>
Signed-off-by: Sean Owen 
---
 .../scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git 
a/mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala 
b/mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala
index 37e57736574..a3316d8a8fa 100644
--- 
a/mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala
+++ 
b/mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala
@@ -140,6 +140,9 @@ class RankingMetrics[T: ClassTag] @Since("1.2.0") 
(predictionAndLabels: RDD[_ <:
* and the NDCG is obtained by dividing the DCG value on the ground truth 
set. In the current
* implementation, the relevance value is binary if the relevance value is 
empty.
 
+   * If the relevance value is not empty but its size doesn't match the ground 
truth set size,
+   * a log warning is generated.
+   *
* If a query has an empty ground truth set, zero will be used as ndcg 
together with
* a log warning.
*
@@ -157,7 +160,7 @@ class RankingMetrics[T: ClassTag] @Since("1.2.0") 
(predictionAndLabels: RDD[_ <:
   val useBinary = rel.isEmpty
   val labSet = lab.toSet
   val relMap = Utils.toMap(lab, rel)
-  if (useBinary && lab.size != rel.size) {
+  if (!useBinary && lab.size != rel.size) {
 logWarning(
   "# of ground truth set and # of relevance value set should be equal, 
" +
 "check input data")


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [MINOR][DOCS] fix: some minor typos

2023-07-27 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 921fb289f00 [MINOR][DOCS] fix: some minor typos
921fb289f00 is described below

commit 921fb289f003317d89120faa6937e4abd359195c
Author: Eric Blanco 
AuthorDate: Thu Jul 27 08:53:54 2023 -0500

[MINOR][DOCS] fix: some minor typos

### What changes were proposed in this pull request?
Change `the the` to `the`

### Why are the changes needed?
To fix the typo

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?

Closes #42188 from ejblanco/docs/spark-typos.

Authored-by: Eric Blanco 
Signed-off-by: Sean Owen 
---
 .../spark/sql/connect/service/SparkConnectStreamingQueryCache.scala | 2 +-
 .../org/apache/spark/ui/static/vis-timeline-graph2d.min.js.map  | 2 +-
 dev/connect-jvm-client-mima-check   | 2 +-
 .../main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala| 2 +-
 .../scala/org/apache/spark/sql/catalyst/expressions/WindowTime.scala| 2 +-
 sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala | 2 +-
 6 files changed, 6 insertions(+), 6 deletions(-)

diff --git 
a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamingQueryCache.scala
 
b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamingQueryCache.scala
index 133686df018..87004242da9 100644
--- 
a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamingQueryCache.scala
+++ 
b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamingQueryCache.scala
@@ -84,7 +84,7 @@ private[connect] class SparkConnectStreamingQueryCache(
 
   /**
* Returns [[StreamingQuery]] if it is cached and session matches the cached 
query. It ensures
-   * the the session associated with it matches the session passed into the 
call. If the query is
+   * the session associated with it matches the session passed into the call. 
If the query is
* inactive (i.e. it has a cache expiry time set), this access extends its 
expiry time. So if a
* client keeps accessing a query, it stays in the cache.
*/
diff --git 
a/core/src/main/resources/org/apache/spark/ui/static/vis-timeline-graph2d.min.js.map
 
b/core/src/main/resources/org/apache/spark/ui/static/vis-timeline-graph2d.min.js.map
index 95fdc523cf4..250b375e545 100644
--- 
a/core/src/main/resources/org/apache/spark/ui/static/vis-timeline-graph2d.min.js.map
+++ 
b/core/src/main/resources/org/apache/spark/ui/static/vis-timeline-graph2d.min.js.map
@@ -1 +1 @@
-{"version":3,"file":"vis-timeline-graph2d.min.js","sources":["../../node_modules/moment/locale/de.js","../../node_modules/moment/moment.js","../../node_modules/moment/locale/es.js","../../node_modules/moment/locale/fr.js","../../node_modules/moment/locale/it.js","../../node_modules/moment/locale/ja.js","../../node_modules/moment/locale/nl.js","../../node_modules/moment/locale/pl.js","../../node_modules/moment/locale/ru.js","../../node_modules/moment/locale/uk.js","../../node_modules/core
 [...]
\ No newline at end of file
+{"version":3,"file":"vis-timeline-graph2d.min.js","sources":["../../node_modules/moment/locale/de.js","../../node_modules/moment/moment.js","../../node_modules/moment/locale/es.js","../../node_modules/moment/locale/fr.js","../../node_modules/moment/locale/it.js","../../node_modules/moment/locale/ja.js","../../node_modules/moment/locale/nl.js","../../node_modules/moment/locale/pl.js","../../node_modules/moment/locale/ru.js","../../node_modules/moment/locale/uk.js","../../node_modules/core
 [...]
\ No newline at end of file
diff --git a/dev/connect-jvm-client-mima-check 
b/dev/connect-jvm-client-mima-check
index ac4b95935b9..6a29cbf08ce 100755
--- a/dev/connect-jvm-client-mima-check
+++ b/dev/connect-jvm-client-mima-check
@@ -52,7 +52,7 @@ echo "finish connect-client-jvm module mima check ..."
 
 RESULT_SIZE=$(wc -l .connect-mima-check-result | awk '{print $1}')
 
-# The the file has no content if check passed.
+# The file has no content if check passed.
 if [[ $RESULT_SIZE -eq "0" ]]; then
   ERRORS=""
 else
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
index 3ece74a4d18..92e550ea941 100644
---

[spark] branch branch-3.5 updated: [SPARK-44457][CONNECT][TESTS] Add `truncatedTo(ChronoUnit.MICROS)` to make `ArrowEncoderSuite` in Java 17 daily test GA task pass

2023-07-26 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 17fc3632f23 [SPARK-44457][CONNECT][TESTS] Add 
`truncatedTo(ChronoUnit.MICROS)` to make `ArrowEncoderSuite` in Java 17 daily 
test GA task pass
17fc3632f23 is described below

commit 17fc3632f2344101f8318457e3f9d5f133913997
Author: yangjie01 
AuthorDate: Wed Jul 26 19:17:40 2023 -0500

[SPARK-44457][CONNECT][TESTS] Add `truncatedTo(ChronoUnit.MICROS)` to make 
`ArrowEncoderSuite` in Java 17 daily test GA task pass

### What changes were proposed in this pull request?
Similar to SPARK-42770 | https://github.com/apache/spark/pull/40395, this 
pr call `truncatedTo(ChronoUnit.MICROS)` on `Instant.now()` and 
`LocalDateTime.now()` to ensure microsecond accuracy is used in any environment.

### Why are the changes needed?
Make Java 17 daily test GA task run successfully.

The Java 17 daily test GA task failed now: 
https://github.com/apache/spark/actions/runs/5570003581/jobs/10173767006

```
[info] - nullable fields *** FAILED *** (169 milliseconds)
[info]   NullableData(null, JANUARY, E1, null, 1.00, 
2.00, null, 4, PT0S, null, 2023-07-16, 2023-07-16, null, 
2023-07-16T23:01:54.059339Z, 2023-07-16T23:01:54.059359) did not equal 
NullableData(null, JANUARY, E1, null, 1.00, 
2.00, null, 4, PT0S, null, 2023-07-16, 2023-07-16, null, 
2023-07-16T23:01:54.059339538Z, 2023-07-16T23:01:54.059359638) 
(ArrowEncoderSuite.scala:194)
[info]   Analysis:
[info]   NullableData(instant: 2023-07-16T23:01:54.059339Z -> 
2023-07-16T23:01:54.059339538Z, localDateTime: 2023-07-16T23:01:54.059359 -> 
2023-07-16T23:01:54.059359638)
[info]   org.scalatest.exceptions.TestFailedException:
...
[info] - lenient field serialization - timestamp/instant *** FAILED *** (26 
milliseconds)
[info]   2023-07-16T23:01:55.112838Z did not equal 
2023-07-16T23:01:55.112838568Z (ArrowEncoderSuite.scala:194)
[info]   org.scalatest.exceptions.TestFailedException:
...

```

### Does this PR introduce _any_ user-facing change?
No, just for test

### How was this patch tested?
- Pass GitHub Action
- Git Hub Action test with Java 17 passed: 
https://github.com/LuciferYang/spark/actions/runs/5647253889/job/15297009685

https://github.com/apache/spark/assets/1475305/27a4350a-9475-45e3-b39f-b0b1e8f14e92";>

Closes #42039 from LuciferYang/ArrowEncoderSuite-Java17.

Authored-by: yangjie01 
Signed-off-by: Sean Owen 
(cherry picked from commit da359259b138864a52ea98a4e19c55e593a5a8fa)
Signed-off-by: Sean Owen 
---
 .../spark/sql/connect/client/arrow/ArrowEncoderSuite.scala| 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git 
a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/arrow/ArrowEncoderSuite.scala
 
b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/arrow/ArrowEncoderSuite.scala
index 3f8ac1cb8d1..5c035a613fe 100644
--- 
a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/arrow/ArrowEncoderSuite.scala
+++ 
b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/arrow/ArrowEncoderSuite.scala
@@ -18,6 +18,7 @@ package org.apache.spark.sql.connect.client.arrow
 
 import java.math.BigInteger
 import java.time.{Duration, Period, ZoneOffset}
+import java.time.temporal.ChronoUnit
 import java.util
 import java.util.{Collections, Objects}
 
@@ -361,8 +362,10 @@ class ArrowEncoderSuite extends ConnectFunSuite with 
BeforeAndAfterAll {
 
   test("nullable fields") {
 val encoder = ScalaReflection.encoderFor[NullableData]
-val instant = java.time.Instant.now()
-val now = java.time.LocalDateTime.now()
+// SPARK-44457: Similar to SPARK-42770, calling 
`truncatedTo(ChronoUnit.MICROS)`
+// on `Instant.now()` and `LocalDateTime.now()` to ensure microsecond 
accuracy is used.
+val instant = java.time.Instant.now().truncatedTo(ChronoUnit.MICROS)
+val now = java.time.LocalDateTime.now().truncatedTo(ChronoUnit.MICROS)
 val today = java.time.LocalDate.now()
 roundTripAndCheckIdentical(encoder) { () =>
   val maybeNull = MaybeNull(3)
@@ -602,7 +605,9 @@ class ArrowEncoderSuite extends ConnectFunSuite with 
BeforeAndAfterAll {
   }
 
   test("lenient field serialization - timestamp/instant") {
-val base = java.time.Instant.now()
+// SPARK-44457: Similar to SPARK-42770, calling 
`truncatedTo(ChronoUnit.MICROS)`
+// on `Instant.now()` to ensure microsecond accuracy is used.
+val base = java.time.Instant.now().truncatedTo(ChronoUnit.MICROS)
 val instants = () => Itera

[spark] branch master updated: [SPARK-44457][CONNECT][TESTS] Add `truncatedTo(ChronoUnit.MICROS)` to make `ArrowEncoderSuite` in Java 17 daily test GA task pass

2023-07-26 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new da359259b13 [SPARK-44457][CONNECT][TESTS] Add 
`truncatedTo(ChronoUnit.MICROS)` to make `ArrowEncoderSuite` in Java 17 daily 
test GA task pass
da359259b13 is described below

commit da359259b138864a52ea98a4e19c55e593a5a8fa
Author: yangjie01 
AuthorDate: Wed Jul 26 19:17:40 2023 -0500

[SPARK-44457][CONNECT][TESTS] Add `truncatedTo(ChronoUnit.MICROS)` to make 
`ArrowEncoderSuite` in Java 17 daily test GA task pass

### What changes were proposed in this pull request?
Similar to SPARK-42770 | https://github.com/apache/spark/pull/40395, this 
pr call `truncatedTo(ChronoUnit.MICROS)` on `Instant.now()` and 
`LocalDateTime.now()` to ensure microsecond accuracy is used in any environment.

### Why are the changes needed?
Make Java 17 daily test GA task run successfully.

The Java 17 daily test GA task failed now: 
https://github.com/apache/spark/actions/runs/5570003581/jobs/10173767006

```
[info] - nullable fields *** FAILED *** (169 milliseconds)
[info]   NullableData(null, JANUARY, E1, null, 1.00, 
2.00, null, 4, PT0S, null, 2023-07-16, 2023-07-16, null, 
2023-07-16T23:01:54.059339Z, 2023-07-16T23:01:54.059359) did not equal 
NullableData(null, JANUARY, E1, null, 1.00, 
2.00, null, 4, PT0S, null, 2023-07-16, 2023-07-16, null, 
2023-07-16T23:01:54.059339538Z, 2023-07-16T23:01:54.059359638) 
(ArrowEncoderSuite.scala:194)
[info]   Analysis:
[info]   NullableData(instant: 2023-07-16T23:01:54.059339Z -> 
2023-07-16T23:01:54.059339538Z, localDateTime: 2023-07-16T23:01:54.059359 -> 
2023-07-16T23:01:54.059359638)
[info]   org.scalatest.exceptions.TestFailedException:
...
[info] - lenient field serialization - timestamp/instant *** FAILED *** (26 
milliseconds)
[info]   2023-07-16T23:01:55.112838Z did not equal 
2023-07-16T23:01:55.112838568Z (ArrowEncoderSuite.scala:194)
[info]   org.scalatest.exceptions.TestFailedException:
...

```

### Does this PR introduce _any_ user-facing change?
No, just for test

### How was this patch tested?
- Pass GitHub Action
- Git Hub Action test with Java 17 passed: 
https://github.com/LuciferYang/spark/actions/runs/5647253889/job/15297009685

https://github.com/apache/spark/assets/1475305/27a4350a-9475-45e3-b39f-b0b1e8f14e92";>

Closes #42039 from LuciferYang/ArrowEncoderSuite-Java17.

Authored-by: yangjie01 
Signed-off-by: Sean Owen 
---
 .../spark/sql/connect/client/arrow/ArrowEncoderSuite.scala| 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git 
a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/arrow/ArrowEncoderSuite.scala
 
b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/arrow/ArrowEncoderSuite.scala
index 3f8ac1cb8d1..5c035a613fe 100644
--- 
a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/arrow/ArrowEncoderSuite.scala
+++ 
b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/arrow/ArrowEncoderSuite.scala
@@ -18,6 +18,7 @@ package org.apache.spark.sql.connect.client.arrow
 
 import java.math.BigInteger
 import java.time.{Duration, Period, ZoneOffset}
+import java.time.temporal.ChronoUnit
 import java.util
 import java.util.{Collections, Objects}
 
@@ -361,8 +362,10 @@ class ArrowEncoderSuite extends ConnectFunSuite with 
BeforeAndAfterAll {
 
   test("nullable fields") {
 val encoder = ScalaReflection.encoderFor[NullableData]
-val instant = java.time.Instant.now()
-val now = java.time.LocalDateTime.now()
+// SPARK-44457: Similar to SPARK-42770, calling 
`truncatedTo(ChronoUnit.MICROS)`
+// on `Instant.now()` and `LocalDateTime.now()` to ensure microsecond 
accuracy is used.
+val instant = java.time.Instant.now().truncatedTo(ChronoUnit.MICROS)
+val now = java.time.LocalDateTime.now().truncatedTo(ChronoUnit.MICROS)
 val today = java.time.LocalDate.now()
 roundTripAndCheckIdentical(encoder) { () =>
   val maybeNull = MaybeNull(3)
@@ -602,7 +605,9 @@ class ArrowEncoderSuite extends ConnectFunSuite with 
BeforeAndAfterAll {
   }
 
   test("lenient field serialization - timestamp/instant") {
-val base = java.time.Instant.now()
+// SPARK-44457: Similar to SPARK-42770, calling 
`truncatedTo(ChronoUnit.MICROS)`
+// on `Instant.now()` to ensure microsecond accuracy is used.
+val base = java.time.Instant.now().truncatedTo(ChronoUnit.MICROS)
 val instants = () => Iterator.tabulate(10)(i => base.plusSeconds(i * i * 
60))
 val timestamps = () => instants().map(java.sql.T

[spark] branch master updated: [SPARK-44522][BUILD] Upgrade `scala-xml` to 2.2.0

2023-07-26 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 43b753a3530 [SPARK-44522][BUILD] Upgrade `scala-xml` to 2.2.0
43b753a3530 is described below

commit 43b753a3530bcfdad415765e1348136d70d8125d
Author: yangjie01 
AuthorDate: Wed Jul 26 19:11:00 2023 -0500

[SPARK-44522][BUILD] Upgrade `scala-xml` to 2.2.0

### What changes were proposed in this pull request?
This pr aims to upgrade `scala-xml` from 2.1.0 to 2.2.0.

### Why are the changes needed?
The new version bring some bug fix like:
- https://github.com/scala/scala-xml/pull/651
- https://github.com/scala/scala-xml/pull/677

The full release notes as follows:
- https://github.com/scala/scala-xml/releases/tag/v2.2.0

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
- Pass GitHub Actions
- Checked Scala 2.13, all Scala test passed: 
https://github.com/LuciferYang/spark/runs/15278359785

Closes #42119 from LuciferYang/scala-xml-220.

Authored-by: yangjie01 
Signed-off-by: Sean Owen 
---
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +-
 pom.xml   | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index 168b0b34787..3b54ef43f6a 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -229,7 +229,7 @@ scala-compiler/2.12.18//scala-compiler-2.12.18.jar
 scala-library/2.12.18//scala-library-2.12.18.jar
 scala-parser-combinators_2.12/2.3.0//scala-parser-combinators_2.12-2.3.0.jar
 scala-reflect/2.12.18//scala-reflect-2.12.18.jar
-scala-xml_2.12/2.1.0//scala-xml_2.12-2.1.0.jar
+scala-xml_2.12/2.2.0//scala-xml_2.12-2.2.0.jar
 shims/0.9.45//shims-0.9.45.jar
 slf4j-api/2.0.7//slf4j-api-2.0.7.jar
 snakeyaml-engine/2.6//snakeyaml-engine-2.6.jar
diff --git a/pom.xml b/pom.xml
index 5711dba04b9..2e9d1d2d8f3 100644
--- a/pom.xml
+++ b/pom.xml
@@ -1089,7 +1089,7 @@
   
 org.scala-lang.modules
 scala-xml_${scala.binary.version}
-2.1.0
+2.2.0
   
   
 org.scala-lang


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-44441][BUILD] Upgrade `bcprov-jdk15on` and `bcpkix-jdk15on` to 1.70

2023-07-15 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new aec34451297 [SPARK-1][BUILD] Upgrade `bcprov-jdk15on` and 
`bcpkix-jdk15on` to 1.70
aec34451297 is described below

commit aec3445129789c5b1d768333bacf3f3e680d73a0
Author: yangjie01 
AuthorDate: Sat Jul 15 12:17:07 2023 -0500

[SPARK-1][BUILD] Upgrade `bcprov-jdk15on` and `bcpkix-jdk15on` to 1.70

### What changes were proposed in this pull request?
This pr aims to upgrade `bcprov-jdk15on` and `bcpkix-jdk15on`  from 1.60 to 
1.70

### Why are the changes needed?
The new version fixed 
[CVE-2020-15522](https://github.com/bcgit/bc-java/wiki/CVE-2020-15522).

The release notes as follows:
- https://www.bouncycastle.org/releasenotes.html#r1rv70

### Does this PR introduce _any_ user-facing change?
No, just upgrade test dependency

### How was this patch tested?
Pass Git Hub Actions

Closes #42015 from LuciferYang/SPARK-1.

Authored-by: yangjie01 
Signed-off-by: Sean Owen 
---
 pom.xml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/pom.xml b/pom.xml
index eac34643fc9..3c2107b1b00 100644
--- a/pom.xml
+++ b/pom.xml
@@ -214,7 +214,7 @@
 3.1.0
 1.1.0
 1.5.0
-1.60
+1.70
 1.9.0

[spark] branch master updated: [MINOR][SS][DOCS] Fix typos in the Scaladoc and make the semantic of getCurrentWatermarkMs explicit

2023-07-15 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new e5e2b914de6 [MINOR][SS][DOCS] Fix typos in the Scaladoc and make the 
semantic of getCurrentWatermarkMs explicit
e5e2b914de6 is described below

commit e5e2b914de6a498ae191bdb0d02308c5b6f13f15
Author: bartosz25 
AuthorDate: Sat Jul 15 08:31:31 2023 -0500

[MINOR][SS][DOCS] Fix typos in the Scaladoc and make the semantic of 
getCurrentWatermarkMs explicit

### What changes were proposed in this pull request?
Improve the code comments:
1. Rate micro-batch data source Scaladoc parameters aren't consistent with 
the options really supported by this data source.
2. The `getCurrentWatermarkMs` has a special semantic for the 1st 
micro-batch when the watermark is not set yet. IMO, it should return 
`Option[Long]`, hence `None` instead of `0` for the first micro-batch, but 
since it's a breaking change, I preferred to add a note on that instead.

### Why are the changes needed?
1. Avoid confusion while using the classes and methods.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
The tests weren't added because the change is only at the Scaladoc level. I 
affirm that the contribution is my original work and that I license the work to 
the project under the project's open source license.

Closes #41988 from bartosz25/comments_fixes.

Authored-by: bartosz25 
Signed-off-by: Sean Owen 
---
 .../sql/execution/streaming/sources/RatePerMicroBatchProvider.scala  | 4 ++--
 .../src/main/scala/org/apache/spark/sql/streaming/GroupState.scala   | 5 +
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/sources/RatePerMicroBatchProvider.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/sources/RatePerMicroBatchProvider.scala
index ccf8b0a7b92..41878a6a549 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/sources/RatePerMicroBatchProvider.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/sources/RatePerMicroBatchProvider.scala
@@ -34,11 +34,11 @@ import org.apache.spark.sql.util.CaseInsensitiveStringMap
  *  with 0L.
  *
  *  This source supports the following options:
- *  - `rowsPerMicroBatch` (e.g. 100): How many rows should be generated per 
micro-batch.
+ *  - `rowsPerBatch` (e.g. 100): How many rows should be generated per 
micro-batch.
  *  - `numPartitions` (e.g. 10, default: Spark's default parallelism): The 
partition number for the
  *generated rows.
  *  - `startTimestamp` (e.g. 1000, default: 0): starting value of generated 
time
- *  - `advanceMillisPerMicroBatch` (e.g. 1000, default: 1000): the amount of 
time being advanced in
+ *  - `advanceMillisPerBatch` (e.g. 1000, default: 1000): the amount of time 
being advanced in
  *generated time on each micro-batch.
  *
  *  Unlike `rate` data source, this data source provides a consistent set of 
input rows per
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/streaming/GroupState.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/streaming/GroupState.scala
index 2c8f1db74f8..f08a2fd3cc5 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/streaming/GroupState.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/streaming/GroupState.scala
@@ -315,6 +315,11 @@ trait GroupState[S] extends LogicalGroupState[S] {
*
* @note In a streaming query, this can be called only when watermark is set 
before calling
*   `[map/flatMap]GroupsWithState`. In a batch query, this method 
always returns -1.
+   * @note The watermark gets propagated in the end of each query. As a 
result, this method will
+   *   return 0 (1970-01-01T00:00:00) for the first micro-batch. If you 
use this value
+   *   as a part of the timestamp set in the `setTimeoutTimestamp`, it may 
lead to the
+   *   state expiring immediately in the next micro-batch, once the 
watermark gets the
+   *   real value from your data.
*/
   @throws[UnsupportedOperationException](
 "if watermark has not been set before in [map|flatMap]GroupsWithState")


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-43389][SQL] Added a null check for lineSep option

2023-07-13 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 9f07e4a747b [SPARK-43389][SQL] Added a null check for lineSep option
9f07e4a747b is described below

commit 9f07e4a747b0e2a62b954db3c9be425c924da47a
Author: Gurpreet Singh 
AuthorDate: Thu Jul 13 18:17:45 2023 -0500

[SPARK-43389][SQL] Added a null check for lineSep option

### What changes were proposed in this pull request?

### Why are the changes needed?

- `spark.read.csv` throws `NullPointerException` when lineSep is set to None
- More details about the issue here: 
https://issues.apache.org/jira/browse/SPARK-43389

### Does this PR introduce _any_ user-facing change?

~~Users now should be able to explicitly set `lineSep` as `None` without 
getting an exception~~
After some discussion, it was decided to add a `require` check for `null` 
instead of letting it through.

### How was this patch tested?

Tested the changes with a python script that explicitly sets `lineSep` to 
`None`
```python
from pyspark.sql import SparkSession

# Create a SparkSession
spark = SparkSession.builder.appName("HelloWorld").getOrCreate()

# Read CSV into a DataFrame
df = spark.read.csv("/tmp/hello.csv", header=True, inferSchema=True, 
lineSep=None)

# Also tested the following case when options are passed before invoking 
.csv
#df = spark.read.option("lineSep", 
None).csv("/Users/gdhuper/Documents/tmp/hello.csv", header=True, 
inferSchema=True)

# Show the DataFrame
df.show()

# Stop the SparkSession
spark.stop()
```

Closes #41904 from gdhuper/gdhuper/SPARK-43389.

Authored-by: Gurpreet Singh 
Signed-off-by: Sean Owen 
---
 .../src/main/scala/org/apache/spark/sql/catalyst/csv/CSVOptions.scala| 1 +
 .../org/apache/spark/sql/execution/datasources/text/TextOptions.scala| 1 +
 2 files changed, 2 insertions(+)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVOptions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVOptions.scala
index 2b6b60fdf76..f4ad1f2f2e5 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVOptions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVOptions.scala
@@ -254,6 +254,7 @@ class CSVOptions(
* A string between two consecutive JSON records.
*/
   val lineSeparator: Option[String] = parameters.get(LINE_SEP).map { sep =>
+require(sep != null, "'lineSep' cannot be a null value.")
 require(sep.nonEmpty, "'lineSep' cannot be an empty string.")
 // Intentionally allow it up to 2 for Window's CRLF although multiple
 // characters have an issue with quotes. This is intentionally 
undocumented.
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/text/TextOptions.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/text/TextOptions.scala
index f26f05cbe1c..468d58974ed 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/text/TextOptions.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/text/TextOptions.scala
@@ -45,6 +45,7 @@ class TextOptions(@transient private val parameters: 
CaseInsensitiveMap[String])
   val encoding: Option[String] = parameters.get(ENCODING)
 
   val lineSeparator: Option[String] = parameters.get(LINE_SEP).map { lineSep =>
+require(lineSep != null, s"'$LINE_SEP' cannot be a null value.")
 require(lineSep.nonEmpty, s"'$LINE_SEP' cannot be an empty string.")
 
 lineSep


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-44332][CORE][WEBUI] Fix the sorting error of Executor ID Column on Executors UI Page

2023-07-10 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 9717d74d072 [SPARK-44332][CORE][WEBUI] Fix the sorting error of 
Executor ID Column on Executors UI Page
9717d74d072 is described below

commit 9717d74d0726bd177b8d0f0cc2c9b0404f82dafc
Author: panbingkun 
AuthorDate: Mon Jul 10 19:15:56 2023 -0500

[SPARK-44332][CORE][WEBUI] Fix the sorting error of Executor ID Column on 
Executors UI Page

### What changes were proposed in this pull request?
The pr aims to fix the sorting error of `Executor ID` Column on `Executor 
Page`.

### Why are the changes needed?
Fix UI Sort bug.
PS: Can be reproduced using: sh bin/spark-shell --master 
"local-cluster[12,1,1024]"

- Before patch
Before - asc:
https://github.com/apache/spark/assets/15246973/83648087-804a-4a62-8f3e-c748f46b95d7";>

Before - desc:
https://github.com/apache/spark/assets/15246973/b68547f3-af36-4e97-b922-7c3ffa3cbb30";>

- After patch
After - asc:
https://github.com/apache/spark/assets/15246973/9fd40fc7-9b72-4a08-8e16-a89d9625a1a0";>

After - desc:
https://github.com/apache/spark/assets/15246973/11921083-30cc-46e9-a9f6-1fe9aecde1a7";>

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
- Pass GA.
- Manually test.

Closes #41887 from panbingkun/align_executor_id.

Authored-by: panbingkun 
Signed-off-by: Sean Owen 
---
 .../org/apache/spark/ui/static/executorspage.js| 31 +++---
 1 file changed, 28 insertions(+), 3 deletions(-)

diff --git 
a/core/src/main/resources/org/apache/spark/ui/static/executorspage.js 
b/core/src/main/resources/org/apache/spark/ui/static/executorspage.js
index 520efbd6def..38dc446eaac 100644
--- a/core/src/main/resources/org/apache/spark/ui/static/executorspage.js
+++ b/core/src/main/resources/org/apache/spark/ui/static/executorspage.js
@@ -96,6 +96,32 @@ jQuery.extend(jQuery.fn.dataTableExt.oSort, {
   }
 });
 
+jQuery.extend( jQuery.fn.dataTableExt.oSort, {
+  "executor-id-asc": function ( a, b ) {
+if ($.isNumeric(a) && $.isNumeric(b)) {
+  return parseFloat(a) - parseFloat(b);
+} else if (!$.isNumeric(a) && $.isNumeric(b)) {
+  return -1;
+} else if ($.isNumeric(a) && !$.isNumeric(b)) {
+  return 1;
+} else {
+  return a.localeCompare(b);
+}
+  },
+
+  "executor-id-desc": function ( a, b ) {
+if ($.isNumeric(a) && $.isNumeric(b)) {
+  return parseFloat(b) - parseFloat(a);
+} else if (!$.isNumeric(a) && $.isNumeric(b)) {
+  return 1;
+} else if ($.isNumeric(a) && !$.isNumeric(b)) {
+  return -1;
+} else {
+  return b.localeCompare(a);
+}
+  }
+});
+
 $(document).ajaxStop($.unblockUI);
 $(document).ajaxStart(function () {
   $.blockUI({message: 'Loading Executors Page...'});
@@ -403,9 +429,8 @@ $(document).ready(function () {
   "data": response,
   "columns": [
 {
-  data: function (row, type) {
-return type !== 'display' ? (isNaN(row.id) ? 0 : row.id ) : 
row.id;
-  }
+  data: "id",
+  type: "executor-id"
 },
 {data: 'hostPort'},
 {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-44350][BUILD] Upgrade sbt to 1.9.2

2023-07-10 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new f1ec99b10ca [SPARK-44350][BUILD] Upgrade sbt to 1.9.2
f1ec99b10ca is described below

commit f1ec99b10caf85e95aec2ed4f1e0b55cc0bd6f11
Author: panbingkun 
AuthorDate: Mon Jul 10 13:29:33 2023 -0500

[SPARK-44350][BUILD] Upgrade sbt to 1.9.2

### What changes were proposed in this pull request?
The pr aims to upgrade sbt from 1.9.1 to 1.9.2.

### Why are the changes needed?
1.The new version brings bug fixed:
- Let ++ fall back to a bincompat Scala version by eed3si9n in 
https://github.com/sbt/sbt/pull/7328

2.v1.9.1 VS v1.9.2
https://github.com/sbt/sbt/compare/v1.9.1...v1.9.2

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA.

Closes #41916 from panbingkun/upgrade_sbt_192.

Authored-by: panbingkun 
Signed-off-by: Sean Owen 
---
 dev/appveyor-install-dependencies.ps1 | 2 +-
 project/build.properties  | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/dev/appveyor-install-dependencies.ps1 
b/dev/appveyor-install-dependencies.ps1
index 6848d3af43d..3737382eb86 100644
--- a/dev/appveyor-install-dependencies.ps1
+++ b/dev/appveyor-install-dependencies.ps1
@@ -97,7 +97,7 @@ if (!(Test-Path $tools)) {
 # == SBT
 Push-Location $tools
 
-$sbtVer = "1.9.1"
+$sbtVer = "1.9.2"
 Start-FileDownload 
"https://github.com/sbt/sbt/releases/download/v$sbtVer/sbt-$sbtVer.zip"; 
"sbt.zip"
 
 # extract
diff --git a/project/build.properties b/project/build.properties
index f27c9c4c8cc..3eb34b94744 100644
--- a/project/build.properties
+++ b/project/build.properties
@@ -15,4 +15,4 @@
 # limitations under the License.
 #
 # Please update the version in appveyor-install-dependencies.ps1 together.
-sbt.version=1.9.1
+sbt.version=1.9.2


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-44257][BUILD] Update some maven plugins & scalafmt to newest version

2023-06-30 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 05f5dccbd34 [SPARK-44257][BUILD] Update some maven plugins & scalafmt 
to newest version
05f5dccbd34 is described below

commit 05f5dccbd34218c7d399228529853bdb1595f3a2
Author: panbingkun 
AuthorDate: Fri Jun 30 09:14:22 2023 -0500

[SPARK-44257][BUILD] Update some maven plugins & scalafmt to newest version

### What changes were proposed in this pull request?
The pr aims to update some maven plugins & scalafmt to newest version, 
include:
- maven-clean-plugin from 3.2.0 to 3.3.1
- maven-shade-plugin from 3.4.1 to 3.5.0
- scalafmt from 3.7.4 to 3.7.5

### Why are the changes needed?
1.maven-clean-plugin

https://github.com/apache/maven-clean-plugin/releases/tag/maven-clean-plugin-3.3.1

2.maven-shade-plugin

https://github.com/apache/maven-shade-plugin/releases/tag/maven-shade-plugin-3.5.0

3.scalafmt
https://github.com/scalameta/scalafmt/releases/tag/v3.7.5
Router: make sure to indent comments after lambda 
(https://github.com/scalameta/scalafmt/pull/3556) kitbellew
Fix proposed version syntax 
(https://github.com/scalameta/scalafmt/pull/3555) JD557

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA.

Closes #41803 from panbingkun/SPARK-44257.

Authored-by: panbingkun 
Signed-off-by: Sean Owen 
---
 .../src/main/scala/org/apache/spark/sql/Dataset.scala| 16 +++-
 .../scala/org/apache/spark/sql/catalog/Catalog.scala |  7 +++
 .../org/apache/spark/sql/internal/CatalogImpl.scala  |  7 +++
 dev/.scalafmt.conf   |  2 +-
 pom.xml  |  4 ++--
 5 files changed, 16 insertions(+), 20 deletions(-)

diff --git 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala
 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala
index eba425ce127..b959974dc30 100644
--- 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala
+++ 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala
@@ -535,7 +535,7 @@ class Dataset[T] private[sql] (
   assert(result.schema.size == 1)
   // scalastyle:off println
   println(result.toArray.head)
-// scalastyle:on println
+  // scalastyle:on println
 }
   }
 
@@ -2214,10 +2214,9 @@ class Dataset[T] private[sql] (
* tied to this Spark application.
*
* Global temporary view is cross-session. Its lifetime is the lifetime of 
the Spark
-   * application,
-   * i.e. it will be automatically dropped when the application terminates. 
It's tied to a system
-   * preserved database `global_temp`, and we must use the qualified name to 
refer a global temp
-   * view, e.g. `SELECT * FROM global_temp.view1`.
+   * application, i.e. it will be automatically dropped when the application 
terminates. It's tied
+   * to a system preserved database `global_temp`, and we must use the 
qualified name to refer a
+   * global temp view, e.g. `SELECT * FROM global_temp.view1`.
*
* @throws AnalysisException
*   if the view name is invalid or already exists
@@ -2235,10 +2234,9 @@ class Dataset[T] private[sql] (
* temporary view is tied to this Spark application.
*
* Global temporary view is cross-session. Its lifetime is the lifetime of 
the Spark
-   * application,
-   * i.e. it will be automatically dropped when the application terminates. 
It's tied to a system
-   * preserved database `global_temp`, and we must use the qualified name to 
refer a global temp
-   * view, e.g. `SELECT * FROM global_temp.view1`.
+   * application, i.e. it will be automatically dropped when the application 
terminates. It's tied
+   * to a system preserved database `global_temp`, and we must use the 
qualified name to refer a
+   * global temp view, e.g. `SELECT * FROM global_temp.view1`.
*
* @group basic
* @since 3.4.0
diff --git 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/catalog/Catalog.scala
 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/catalog/Catalog.scala
index 268f162cbfa..11c3f4e3d18 100644
--- 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/catalog/Catalog.scala
+++ 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/catalog/Catalog.scala
@@ -543,10 +543,9 @@ abstract class Catalog {
* cached before, then it will also be uncached.
*
* Global temporary view is cross-session. Its lifetime is the lifetime of 
the Spark
-   * application,
-   * i.e. it will be automatically dropped when the application te

[spark] branch master updated: [SPARK-41599] Memory leak in FileSystem.CACHE when submitting apps to secure cluster using InProcessLauncher

2023-06-30 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 7971e1c6a7c [SPARK-41599] Memory leak in FileSystem.CACHE when 
submitting apps to secure cluster using InProcessLauncher
7971e1c6a7c is described below

commit 7971e1c6a7c074c65829c2bdfad857a33e0a7a5d
Author: Xieming LI 
AuthorDate: Fri Jun 30 08:20:04 2023 -0500

[SPARK-41599] Memory leak in FileSystem.CACHE when submitting apps to 
secure cluster using InProcessLauncher

### What changes were proposed in this pull request?

Using `FileSystem.closeAllForUGI` to close the cache to prevent memory leak.

### Why are the changes needed?

There seems to be a memory leak in FileSystem.CACHE when submitting apps to 
secure cluster using InProcessLauncher.
For more detail, see 
[SPARK-41599](https://issues.apache.org/jira/browse/SPARK-41599)

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

I have tested the patch with my code which uses inProcessLauncher.
Confirmed that the memory leak issue is mitigated.

https://github.com/apache/spark/assets/4378066/cfdef4d3-cb43-464c-bb46-de60f3b91622";>

I will be very helpful if I can have some feedback and I will add some test 
cases if required.

Closes #41692 from risyomei/fix-SPARK-41599.

Authored-by: Xieming LI 
Signed-off-by: Sean Owen 
---
 core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala | 2 ++
 .../apache/spark/deploy/security/HadoopDelegationTokenManager.scala   | 4 
 2 files changed, 6 insertions(+)

diff --git a/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala 
b/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
index 8f9477385e7..60253ed5fda 100644
--- a/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
@@ -186,6 +186,8 @@ private[spark] class SparkSubmit extends Logging {
   } else {
 throw e
   }
+  } finally {
+FileSystem.closeAllForUGI(proxyUser)
   }
 }
   } else {
diff --git 
a/core/src/main/scala/org/apache/spark/deploy/security/HadoopDelegationTokenManager.scala
 
b/core/src/main/scala/org/apache/spark/deploy/security/HadoopDelegationTokenManager.scala
index 6ce195b6c7a..54a24927ded 100644
--- 
a/core/src/main/scala/org/apache/spark/deploy/security/HadoopDelegationTokenManager.scala
+++ 
b/core/src/main/scala/org/apache/spark/deploy/security/HadoopDelegationTokenManager.scala
@@ -26,6 +26,7 @@ import java.util.concurrent.{ScheduledExecutorService, 
TimeUnit}
 import scala.collection.mutable
 
 import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.FileSystem
 import org.apache.hadoop.security.{Credentials, UserGroupInformation}
 
 import org.apache.spark.SparkConf
@@ -149,6 +150,9 @@ private[spark] class HadoopDelegationTokenManager(
   creds.addAll(newTokens)
 }
   })
+  if(!currentUser.equals(freshUGI)) {
+FileSystem.closeAllForUGI(freshUGI)
+  }
 }
   }
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (6590e7db521 -> a8ea35f7c2f)

2023-06-24 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 6590e7db521 [SPARK-44158][K8S] Remove unused 
`spark.kubernetes.executor.lostCheckmaxAttempts`
 add a8ea35f7c2f [SPARK-39740][UI] Upgrade vis timeline to 7.7.2 to fix 
CVE-2020-28487

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/ui/static/timeline-view.js| 40 ++-
 .../spark/ui/static/vis-timeline-graph2d.min.css   |  3 +-
 .../ui/static/vis-timeline-graph2d.min.css.map |  1 +
 .../spark/ui/static/vis-timeline-graph2d.min.js| 57 ++
 .../ui/static/vis-timeline-graph2d.min.js.map  |  1 +
 dev/.rat-excludes  |  2 +
 licenses-binary/LICENSE-vis-timeline.txt   | 29 +--
 licenses/LICENSE-vis-timeline.txt  | 29 +--
 8 files changed, 100 insertions(+), 62 deletions(-)
 create mode 100644 
core/src/main/resources/org/apache/spark/ui/static/vis-timeline-graph2d.min.css.map
 create mode 100644 
core/src/main/resources/org/apache/spark/ui/static/vis-timeline-graph2d.min.js.map


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-44024][SQL] Change to use `map` when `unzip` only used to extract a single element

2023-06-18 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new ad6cd60ca74 [SPARK-44024][SQL] Change to use `map` when `unzip` only 
used to extract a single element
ad6cd60ca74 is described below

commit ad6cd60ca7408018d8c6259597456e9c2fe8b376
Author: yangjie01 
AuthorDate: Sun Jun 18 07:19:56 2023 -0500

[SPARK-44024][SQL] Change to use `map` when `unzip` only used to extract a 
single element

### What changes were proposed in this pull request?
A minor code simplification, use `map` instead of `unzip` when `unzip` only 
used to extract a single element.

### Why are the changes needed?
Code simplification

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GitHub Actions

Closes #41548 from LuciferYang/SPARK-44024.

Lead-authored-by: yangjie01 
Co-authored-by: YangJie 
Signed-off-by: Sean Owen 
---
 .../scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala | 2 +-
 .../apache/spark/sql/execution/datasources/v2/CreateIndexExec.scala   | 2 +-
 .../spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala   | 4 ++--
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
index 568e3d30e34..c70dba01808 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
@@ -861,7 +861,7 @@ object ColumnPruning extends Rule[LogicalPlan] {
   val newProjects = e.projections.map { proj =>
 proj.zip(e.output).filter { case (_, a) =>
   newOutput.contains(a)
-}.unzip._1
+}.map(_._1)
   }
   a.copy(child = Expand(newProjects, newOutput, grandChild))
 
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/CreateIndexExec.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/CreateIndexExec.scala
index 20ccf991af6..8dac6737334 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/CreateIndexExec.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/CreateIndexExec.scala
@@ -52,7 +52,7 @@ case class CreateIndexExec(
 }
 try {
   table.createIndex(
-indexName, columns.unzip._1.toArray, colProperties, 
propertiesWithIndexType.asJava)
+indexName, columns.map(_._1).toArray, colProperties, 
propertiesWithIndexType.asJava)
 } catch {
   case _: IndexAlreadyExistsException if ignoreIfExists =>
 logWarning(s"Index $indexName already exists in table ${table.name}. 
Ignoring.")
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala
index 49a6c7232ec..e58fe7844ab 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala
@@ -192,11 +192,11 @@ object V2ScanRelationPushDown extends Rule[LogicalPlan] 
with PredicateHelper {
   val groupOutputMap = normalizedGroupingExpr.zipWithIndex.map { case (e, 
i) =>
 AttributeReference(s"group_col_$i", e.dataType)() -> e
   }
-  val groupOutput = groupOutputMap.unzip._1
+  val groupOutput = groupOutputMap.map(_._1)
   val aggOutputMap = finalAggExprs.zipWithIndex.map { case (e, i) =>
 AttributeReference(s"agg_func_$i", e.dataType)() -> e
   }
-  val aggOutput = aggOutputMap.unzip._1
+  val aggOutput = aggOutputMap.map(_._1)
   val newOutput = groupOutput ++ aggOutput
   val groupByExprToOutputOrdinal = mutable.HashMap.empty[Expression, Int]
   normalizedGroupingExpr.zipWithIndex.foreach { case (expr, ordinal) =>


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-43179][FOLLOW-UP] Use the secret ByteBuffer instead of the String

2023-06-11 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 45ad044042f [SPARK-43179][FOLLOW-UP] Use the secret ByteBuffer instead 
of the String
45ad044042f is described below

commit 45ad044042f7f376c4c0234807a62179b680edae
Author: Chandni Singh 
AuthorDate: Sun Jun 11 07:59:35 2023 -0500

[SPARK-43179][FOLLOW-UP] Use the secret ByteBuffer instead of the String

### What changes were proposed in this pull request?
Introduced a bug with this change: 
https://github.com/apache/spark/pull/40843. To get the value that is persisted 
in db, we used to use `mapper.writeValueAsString(ByteBuffer)`. We changed it to 
`mapper.writeValueAsString(String)`. However, when we load from the db, it 
still uses
`ByteBuffer secret = mapper.readValue(e.getValue(), ByteBuffer.class);` 
causing exceptions when the shuffle service is unable to recover the apps:
```
ERROR org.apache.spark.network.server.TransportRequestHandler: Error while 
invoking RpcHandler#receive() on RPC id 5764589675121231159 
java.lang.RuntimeException: javax.security.sasl.SaslException: DIGEST-MD5: 
digest response format violation. Mismatched response. at 
org.sparkproject.guava.base.Throwables.propagate(Throwables.java:160) at 
org.apache.spark.network.sasl.SparkSaslServer.response(SparkSaslServer.java:121)
 at org.apache.spark.network.sasl.SaslRpcHandler.doAuthChallenge(Sas [...]
```

### Why are the changes needed?
It fixes the bug that was introduced with SPARK-43179

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
The existing UTs in the `YarnShuffleServiceSuite` were using empty password 
which masked the issue. Changed it to use a non-empty password.

Closes #41502 from otterc/SPARK-43179-followup.

Authored-by: Chandni Singh 
Signed-off-by: Sean Owen 
---
 .../spark/network/yarn/YarnShuffleService.java |  4 +++-
 .../network/yarn/YarnShuffleServiceSuite.scala | 25 +-
 2 files changed, 18 insertions(+), 11 deletions(-)

diff --git 
a/common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java
 
b/common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java
index 578c1a19c40..b34ebf6e29b 100644
--- 
a/common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java
+++ 
b/common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java
@@ -440,7 +440,9 @@ public class YarnShuffleService extends AuxiliaryService {
 if (db != null && 
AppsWithRecoveryDisabled.isRecoveryEnabledForApp(appId)) {
   AppId fullId = new AppId(appId);
   byte[] key = dbAppKey(fullId);
-  byte[] value = 
mapper.writeValueAsString(shuffleSecret).getBytes(StandardCharsets.UTF_8);
+  ByteBuffer dbVal = metaInfo != null ?
+  JavaUtils.stringToBytes(shuffleSecret) : appServiceData;
+  byte[] value = 
mapper.writeValueAsString(dbVal).getBytes(StandardCharsets.UTF_8);
   db.put(key, value);
 }
 secretManager.registerApp(appId, shuffleSecret);
diff --git 
a/resource-managers/yarn/src/test/scala/org/apache/spark/network/yarn/YarnShuffleServiceSuite.scala
 
b/resource-managers/yarn/src/test/scala/org/apache/spark/network/yarn/YarnShuffleServiceSuite.scala
index 3e78262a765..552cc98311e 100644
--- 
a/resource-managers/yarn/src/test/scala/org/apache/spark/network/yarn/YarnShuffleServiceSuite.scala
+++ 
b/resource-managers/yarn/src/test/scala/org/apache/spark/network/yarn/YarnShuffleServiceSuite.scala
@@ -71,6 +71,8 @@ abstract class YarnShuffleServiceSuite extends SparkFunSuite 
with Matchers {
   private[yarn] val SORT_MANAGER_WITH_MERGE_SHUFFLE_META_WithNoAttemptID =
 "org.apache.spark.shuffle.sort.SortShuffleManager:{\"mergeDir\": 
\"merge_manager\"}"
   private val DUMMY_BLOCK_DATA = 
"dummyBlockData".getBytes(StandardCharsets.UTF_8)
+  private val DUMMY_PASSWORD = "dummyPassword"
+  private val EMPTY_PASSWORD = ""
 
   private var recoveryLocalDir: File = _
   protected var tempDir: File = _
@@ -191,7 +193,8 @@ abstract class YarnShuffleServiceSuite extends 
SparkFunSuite with Matchers {
 val app3Data = makeAppInfo("user", app3Id)
 s1.initializeApplication(app3Data)
 val app4Id = ApplicationId.newInstance(0, 4)
-val app4Data = makeAppInfo("user", app4Id)
+val app4Data = makeAppInfo("user", app4Id, metadataStorageDisabled = false,
+authEnabled = true, DUMMY_PASSWORD)
 s1.initializeApplication(app4Data)
 
 val execStateFile = s1.registeredExecutorFile
@@ -1038,15 +1041,15 @@ abstract class YarnShuffleServiceSuite exte

[spark] branch master updated (595ad30e625 -> 8ae95724721)

2023-06-04 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 595ad30e625 [SPARK-43911][SQL] Use toSet to deduplicate the iterator 
data to prevent the creation of large Array
 add 8ae95724721 [SPARK-43955][BUILD] Upgrade `scalafmt` from 3.7.3 to 3.7.4

No new revisions were added by this update.

Summary of changes:
 dev/.scalafmt.conf | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 4066 matches

Mail list logo