(spark) branch master updated (b8e7d99d417a -> 11247d804cd3)

2024-03-20 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from b8e7d99d417a [SPARK-47490][SS] Fix RocksDB Logger constructor use to 
avoid deprecation warning
 add 11247d804cd3 [SPARK-47494][DOC] Add migration doc for the behavior 
change of Parquet timestamp inference since Spark 3.3

No new revisions were added by this update.

Summary of changes:
 docs/sql-migration-guide.md | 2 ++
 1 file changed, 2 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-47490][SS] Fix RocksDB Logger constructor use to avoid deprecation warning

2024-03-20 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new b8e7d99d417a [SPARK-47490][SS] Fix RocksDB Logger constructor use to 
avoid deprecation warning
b8e7d99d417a is described below

commit b8e7d99d417ab4bcc3e69d11a0eee5864cb083e3
Author: Anish Shrigondekar 
AuthorDate: Wed Mar 20 15:11:51 2024 -0700

[SPARK-47490][SS] Fix RocksDB Logger constructor use to avoid deprecation 
warning

### What changes were proposed in this pull request?
Fix RocksDB Logger constructor use to avoid deprecation warning

### Why are the changes needed?
With the latest RocksDB upgrade, the Logger constructor used was deprecated 
which was throwing a compiler warning.
```
[warn] val dbLogger = new Logger(dbOptions) {
[warn]^
[warn] one warning found
[warn] two warnings found
[info] compiling 36 Scala sources and 16 Java sources to 
/Users/anish.shrigondekar/spark/spark/sql/core/target/scala-2.13/classes ...
[warn] -target is deprecated: Use -release instead to compile against the 
correct platform API.
[warn] Applicable -Wconf / nowarn filters for this warning: msg=, cat=deprecation
[warn] 
/Users/anish.shrigondekar/spark/spark/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala:851:24:
 constructor Logger in class Logger is deprecated
[warn] Applicable -Wconf / nowarn filters for this warning: msg=, cat=deprecation, 
site=org.apache.spark.sql.execution.streaming.state.RocksDB.createLogger.dbLogger,
 origin=org.rocksdb.Logger.
```

Updated to use the new recommendation as mentioned here - 
https://javadoc.io/doc/org.rocksdb/rocksdbjni/latest/org/rocksdb/Logger.html

Recommendation:
```

[Logger](https://javadoc.io/static/org.rocksdb/rocksdbjni/8.11.3/org/rocksdb/Logger.html#Logger-org.rocksdb.DBOptions-)([DBOptions](https://javadoc.io/static/org.rocksdb/rocksdbjni/8.11.3/org/rocksdb/DBOptions.html)
 dboptions)
Deprecated.
Use 
[Logger(InfoLogLevel)](https://javadoc.io/static/org.rocksdb/rocksdbjni/8.11.3/org/rocksdb/Logger.html#Logger-org.rocksdb.InfoLogLevel-)
 instead, e.g. new Logger(dbOptions.infoLogLevel()).
```

After the fix, the warning is not seen.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Existing unit tests

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #45616 from anishshri-db/task/SPARK-47490.

Authored-by: Anish Shrigondekar 
Signed-off-by: Dongjoon Hyun 
---
 .../scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala  | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala
index 950baba9031b..8fad5ce7bd6a 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala
@@ -848,7 +848,7 @@ class RocksDB(
 
   /** Create a native RocksDB logger that forwards native logs to log4j with 
correct log levels. */
   private def createLogger(): Logger = {
-val dbLogger = new Logger(dbOptions) {
+val dbLogger = new Logger(dbOptions.infoLogLevel()) {
   override def log(infoLogLevel: InfoLogLevel, logMsg: String) = {
 // Map DB log level to log4j levels
 // Warn is mapped to info because RocksDB warn is too verbose


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-47486][CONNECT] Remove unused private `ArrowDeserializers.getString` method

2024-03-20 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new f66274e92d1c [SPARK-47486][CONNECT] Remove unused private 
`ArrowDeserializers.getString` method
f66274e92d1c is described below

commit f66274e92d1ce6e65fecd45711da59eb08a9d296
Author: yangjie01 
AuthorDate: Wed Mar 20 15:10:49 2024 -0700

[SPARK-47486][CONNECT] Remove unused private `ArrowDeserializers.getString` 
method

### What changes were proposed in this pull request?
The private method `getString` in `ArrowDeserializers` is no longer used 
after SPARK-9 | https://github.com/apache/spark/pull/42076, this pr removes 
it.

### Why are the changes needed?
Code clean up.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GitHub Actions

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #45610 from LuciferYang/SPARK-47486.

Authored-by: yangjie01 
Signed-off-by: Dongjoon Hyun 
---
 .../spark/sql/connect/client/arrow/ArrowDeserializer.scala  | 13 +
 1 file changed, 1 insertion(+), 12 deletions(-)

diff --git 
a/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/arrow/ArrowDeserializer.scala
 
b/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/arrow/ArrowDeserializer.scala
index ac9619487f02..eaf2927863ec 100644
--- 
a/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/arrow/ArrowDeserializer.scala
+++ 
b/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/arrow/ArrowDeserializer.scala
@@ -29,10 +29,9 @@ import scala.collection.mutable
 import scala.reflect.ClassTag
 
 import org.apache.arrow.memory.BufferAllocator
-import org.apache.arrow.vector.{FieldVector, VarCharVector, VectorSchemaRoot}
+import org.apache.arrow.vector.{FieldVector, VectorSchemaRoot}
 import org.apache.arrow.vector.complex.{ListVector, MapVector, StructVector}
 import org.apache.arrow.vector.ipc.ArrowReader
-import org.apache.arrow.vector.util.Text
 
 import org.apache.spark.sql.catalyst.ScalaReflection
 import org.apache.spark.sql.catalyst.encoders.AgnosticEncoder
@@ -468,16 +467,6 @@ object ArrowDeserializers {
 
   private def isTuple(cls: Class[_]): Boolean = 
cls.getName.startsWith("scala.Tuple")
 
-  private def getString(v: VarCharVector, i: Int): String = {
-// This is currently a bit heavy on allocations:
-// - byte array created in VarCharVector.get
-// - CharBuffer created CharSetEncoder
-// - char array in String
-// By using direct buffers and reusing the char buffer
-// we could get rid of the first two allocations.
-Text.decode(v.get(i))
-  }
-
   private def loadListIntoBuilder(
   v: ListVector,
   i: Int,


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-45393][BUILD] Upgrade Hadoop to 3.4.0

2024-03-20 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 49b4c3bc9c09 [SPARK-45393][BUILD] Upgrade Hadoop to 3.4.0
49b4c3bc9c09 is described below

commit 49b4c3bc9c09325de941dfaf41e4fd3a4a4c345f
Author: Dongjoon Hyun 
AuthorDate: Wed Mar 20 10:37:51 2024 -0700

[SPARK-45393][BUILD] Upgrade Hadoop to 3.4.0

### What changes were proposed in this pull request?

This PR aims to upgrade to Apache Hadoop 3.4.0 for Apache Spark 4.0.0.

### Why are the changes needed?

To bring the new features like the following
- https://hadoop.apache.org/docs/r3.4.0
- [HADOOP-18995](https://issues.apache.org/jira/browse/HADOOP-18995) 
Upgrade AWS SDK version to 2.21.33 for `S3 Express One Zone`
- [HADOOP-18328](https://issues.apache.org/jira/browse/HADOOP-18328) 
Supports `S3 on Outposts`

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45583 from dongjoon-hyun/SPARK-45393.

Lead-authored-by: Dongjoon Hyun 
Co-authored-by: YangJie 
Signed-off-by: Dongjoon Hyun 
---
 dev/deps/spark-deps-hadoop-3-hive-2.3  | 27 --
 pom.xml|  2 +-
 .../spark/deploy/yarn/YarnClusterSuite.scala   |  3 ++-
 3 files changed, 18 insertions(+), 14 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index 86da61d89149..903c7a245af3 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -9,7 +9,7 @@ algebra_2.13/2.8.0//algebra_2.13-2.8.0.jar
 aliyun-java-sdk-core/4.5.10//aliyun-java-sdk-core-4.5.10.jar
 aliyun-java-sdk-kms/2.11.0//aliyun-java-sdk-kms-2.11.0.jar
 aliyun-java-sdk-ram/3.1.0//aliyun-java-sdk-ram-3.1.0.jar
-aliyun-sdk-oss/3.13.0//aliyun-sdk-oss-3.13.0.jar
+aliyun-sdk-oss/3.13.2//aliyun-sdk-oss-3.13.2.jar
 annotations/17.0.0//annotations-17.0.0.jar
 antlr-runtime/3.5.2//antlr-runtime-3.5.2.jar
 antlr4-runtime/4.13.1//antlr4-runtime-4.13.1.jar
@@ -24,7 +24,6 @@ audience-annotations/0.12.0//audience-annotations-0.12.0.jar
 avro-ipc/1.11.3//avro-ipc-1.11.3.jar
 avro-mapred/1.11.3//avro-mapred-1.11.3.jar
 avro/1.11.3//avro-1.11.3.jar
-aws-java-sdk-bundle/1.12.367//aws-java-sdk-bundle-1.12.367.jar
 azure-data-lake-store-sdk/2.3.9//azure-data-lake-store-sdk-2.3.9.jar
 azure-keyvault-core/1.0.0//azure-keyvault-core-1.0.0.jar
 azure-storage/7.0.1//azure-storage-7.0.1.jar
@@ -32,6 +31,7 @@ blas/3.0.3//blas-3.0.3.jar
 bonecp/0.8.0.RELEASE//bonecp-0.8.0.RELEASE.jar
 breeze-macros_2.13/2.1.0//breeze-macros_2.13-2.1.0.jar
 breeze_2.13/2.1.0//breeze_2.13-2.1.0.jar
+bundle/2.23.19//bundle-2.23.19.jar
 cats-kernel_2.13/2.8.0//cats-kernel_2.13-2.8.0.jar
 chill-java/0.10.0//chill-java-0.10.0.jar
 chill_2.13/0.10.0//chill_2.13-0.10.0.jar
@@ -65,21 +65,23 @@ derbytools/10.16.1.1//derbytools-10.16.1.1.jar
 
dropwizard-metrics-hadoop-metrics2-reporter/0.1.2//dropwizard-metrics-hadoop-metrics2-reporter-0.1.2.jar
 eclipse-collections-api/11.1.0//eclipse-collections-api-11.1.0.jar
 eclipse-collections/11.1.0//eclipse-collections-11.1.0.jar
+esdk-obs-java/3.20.4.2//esdk-obs-java-3.20.4.2.jar
 flatbuffers-java/23.5.26//flatbuffers-java-23.5.26.jar
 gcs-connector/hadoop3-2.2.20/shaded/gcs-connector-hadoop3-2.2.20-shaded.jar
 gmetric4j/1.0.10//gmetric4j-1.0.10.jar
 gson/2.2.4//gson-2.2.4.jar
 guava/14.0.1//guava-14.0.1.jar
-hadoop-aliyun/3.3.6//hadoop-aliyun-3.3.6.jar
-hadoop-annotations/3.3.6//hadoop-annotations-3.3.6.jar
-hadoop-aws/3.3.6//hadoop-aws-3.3.6.jar
-hadoop-azure-datalake/3.3.6//hadoop-azure-datalake-3.3.6.jar
-hadoop-azure/3.3.6//hadoop-azure-3.3.6.jar
-hadoop-client-api/3.3.6//hadoop-client-api-3.3.6.jar
-hadoop-client-runtime/3.3.6//hadoop-client-runtime-3.3.6.jar
-hadoop-cloud-storage/3.3.6//hadoop-cloud-storage-3.3.6.jar
-hadoop-shaded-guava/1.1.1//hadoop-shaded-guava-1.1.1.jar
-hadoop-yarn-server-web-proxy/3.3.6//hadoop-yarn-server-web-proxy-3.3.6.jar
+hadoop-aliyun/3.4.0//hadoop-aliyun-3.4.0.jar
+hadoop-annotations/3.4.0//hadoop-annotations-3.4.0.jar
+hadoop-aws/3.4.0//hadoop-aws-3.4.0.jar
+hadoop-azure-datalake/3.4.0//hadoop-azure-datalake-3.4.0.jar
+hadoop-azure/3.4.0//hadoop-azure-3.4.0.jar
+hadoop-client-api/3.4.0//hadoop-client-api-3.4.0.jar
+hadoop-client-runtime/3.4.0//hadoop-client-runtime-3.4.0.jar
+hadoop-cloud-storage/3.4.0//hadoop-cloud-storage-3.4.0.jar
+hadoop-huaweicloud/3.4.0//hadoop-huaweicloud-3.4.0.jar
+hadoop-shaded-guava/1.2.0//hadoop-shaded-guava-1.2.0.jar
+hadoop-yarn-server-web-proxy/3.4.0//hadoop-yarn-server-web-proxy-3.4.0.jar
 hive-beeline/2.3.9//hive-beeline-2.3.9.jar
 hive-cli/2.3.9//hive

(spark) branch master updated: [SPARK-47462][SQL] Align mappings of other unsigned numeric types with TINYINT in MySQLDialect

2024-03-20 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new a34c8ceb19bd [SPARK-47462][SQL] Align mappings of other unsigned 
numeric types with TINYINT in MySQLDialect
a34c8ceb19bd is described below

commit a34c8ceb19bd1c1548a60bb144d1c587a2861cd8
Author: Kent Yao 
AuthorDate: Wed Mar 20 09:31:26 2024 -0700

[SPARK-47462][SQL] Align mappings of other unsigned numeric types with 
TINYINT in MySQLDialect

### What changes were proposed in this pull request?

Align mappings of other unsigned numeric types with TINYINT in 
MySQLDialect. TINYINT is mapping to ByteType and TINYINT UNSIGNED is mapping to 
ShortType.

In this PR, we
- map SMALLINT to ShortType, SMALLINT UNSIGNED to IntegerType. W/o this, 
both of them are mapping to IntegerType
- map MEDIUMINT UNSIGNED to IntegerType, and MEDIUMINT is AS-IS. W/o this, 
MEDIUMINT UNSIGNED uses LongType

Other unsigned/signed types remain unchanged and only improve the test 
coverage.

### Why are the changes needed?

Consistency and efficiency while reading MySQL numeric values

### Does this PR introduce _any_ user-facing change?

yes, the mappings described the 1st section.

### How was this patch tested?

new tests

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #45588 from yaooqinn/SPARK-47462.

Authored-by: Kent Yao 
Signed-off-by: Dongjoon Hyun 
---
 .../spark/sql/jdbc/MySQLIntegrationSuite.scala | 39 ++
 .../org/apache/spark/sql/jdbc/MySQLDialect.scala   | 10 ++
 2 files changed, 42 insertions(+), 7 deletions(-)

diff --git 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala
 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala
index 3d65b4f305b3..5b2214f2efd6 100644
--- 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala
+++ 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala
@@ -53,11 +53,19 @@ class MySQLIntegrationSuite extends 
DockerJDBCIntegrationSuite {
 
 conn.prepareStatement("CREATE TABLE numbers (onebit BIT(1), tenbits 
BIT(10), "
   + "small SMALLINT, med MEDIUMINT, nor INT, big BIGINT, deci 
DECIMAL(40,20), flt FLOAT, "
-  + "dbl DOUBLE, tiny TINYINT, u_tiny TINYINT UNSIGNED)").executeUpdate()
+  + "dbl DOUBLE, tiny TINYINT)").executeUpdate()
 
 conn.prepareStatement("INSERT INTO numbers VALUES (b'0', b'1000100101', "
   + "17, 7, 123456789, 123456789012345, 
123456789012345.123456789012345, "
-  + "42.75, 1.0002, -128, 255)").executeUpdate()
+  + "42.75, 1.0002, -128)").executeUpdate()
+
+conn.prepareStatement("CREATE TABLE unsigned_numbers (" +
+  "tiny TINYINT UNSIGNED, small SMALLINT UNSIGNED, med MEDIUMINT 
UNSIGNED," +
+  "nor INT UNSIGNED, big BIGINT UNSIGNED, deci DECIMAL(40,20) UNSIGNED," +
+  "dbl DOUBLE UNSIGNED)").executeUpdate()
+
+conn.prepareStatement("INSERT INTO unsigned_numbers VALUES (255, 65535, 
16777215, 4294967295," +
+  "9223372036854775808, 123456789012345.123456789012345, 
1.0002)").executeUpdate()
 
 conn.prepareStatement("CREATE TABLE dates (d DATE, t TIME, dt DATETIME, ts 
TIMESTAMP, "
   + "yr YEAR)").executeUpdate()
@@ -87,10 +95,10 @@ class MySQLIntegrationSuite extends 
DockerJDBCIntegrationSuite {
 val rows = df.collect()
 assert(rows.length == 1)
 val types = rows(0).toSeq.map(x => x.getClass.toString)
-assert(types.length == 11)
+assert(types.length == 10)
 assert(types(0).equals("class java.lang.Boolean"))
 assert(types(1).equals("class java.lang.Long"))
-assert(types(2).equals("class java.lang.Integer"))
+assert(types(2).equals("class java.lang.Short"))
 assert(types(3).equals("class java.lang.Integer"))
 assert(types(4).equals("class java.lang.Integer"))
 assert(types(5).equals("class java.lang.Long"))
@@ -98,10 +106,9 @@ class MySQLIntegrationSuite extends 
DockerJDBCIntegrationSuite {
 assert(types(7).equals("class java.lang.Double"))
 assert(types(8).equals("class java.lang.Double"))
 assert(types(9).equals("class java.lang.Byte"))
-assert(types(10).equals("class java.lang.Short"))
 assert(rows(0).getBoolean(0) == false)
 assert(rows

(spark) branch branch-3.5 updated: [SPARK-47481][INFRA][3.5] Fix Python linter

2024-03-20 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 9baf82b1c97a [SPARK-47481][INFRA][3.5] Fix Python linter
9baf82b1c97a is described below

commit 9baf82b1c97a792a3733dedccf1c03737b592bbd
Author: panbingkun 
AuthorDate: Wed Mar 20 07:19:29 2024 -0700

[SPARK-47481][INFRA][3.5] Fix Python linter

### What changes were proposed in this pull request?
The pr aims to fix `python linter issue` on `branch-3.5` through pinning 
`matplotlib==3.7.2`

### Why are the changes needed?
Fix `python linter issue` on `branch-3.5`.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #45550 from panbingkun/branch-3.5_scheduled_job.

Authored-by: panbingkun 
Signed-off-by: Dongjoon Hyun 
---
 dev/infra/Dockerfile | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/dev/infra/Dockerfile b/dev/infra/Dockerfile
index d3fcd7ab3622..f0b88666c040 100644
--- a/dev/infra/Dockerfile
+++ b/dev/infra/Dockerfile
@@ -65,10 +65,10 @@ RUN Rscript -e "devtools::install_version('roxygen2', 
version='7.2.0', repos='ht
 ENV R_LIBS_SITE 
"/usr/local/lib/R/site-library:${R_LIBS_SITE}:/usr/lib/R/library"
 
 RUN pypy3 -m pip install numpy 'pandas<=2.0.3' scipy coverage matplotlib
-RUN python3.9 -m pip install numpy 'pyarrow==12.0.1' 'pandas<=2.0.3' scipy 
unittest-xml-reporting plotly>=4.8 'mlflow>=2.3.1' coverage matplotlib openpyxl 
'memory-profiler==0.60.0' 'scikit-learn==1.1.*'
+RUN python3.9 -m pip install 'numpy==1.25.1' 'pyarrow==12.0.1' 'pandas<=2.0.3' 
scipy unittest-xml-reporting plotly>=4.8 'mlflow>=2.3.1' coverage 
'matplotlib==3.7.2' openpyxl 'memory-profiler==0.60.0' 'scikit-learn==1.1.*'
 
 # Add Python deps for Spark Connect.
 RUN python3.9 -m pip install 'grpcio>=1.48,<1.57' 'grpcio-status>=1.48,<1.57' 
'protobuf==3.20.3' 'googleapis-common-protos==1.56.4'
 
 # Add torch as a testing dependency for TorchDistributor
-RUN python3.9 -m pip install torch torchvision torcheval
+RUN python3.9 -m pip install 'torch==2.0.1' 'torchvision==0.15.2' torcheval


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch branch-3.4 updated: [SPARK-47481][INFRA][3.4] Pin `matplotlib<3.3.0` to fix Python linter failure

2024-03-20 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new 4de8000f21a4 [SPARK-47481][INFRA][3.4] Pin `matplotlib<3.3.0` to fix 
Python linter failure
4de8000f21a4 is described below

commit 4de8000f21a48796d30af37bc57269395792a254
Author: panbingkun 
AuthorDate: Wed Mar 20 07:15:32 2024 -0700

[SPARK-47481][INFRA][3.4] Pin `matplotlib<3.3.0` to fix Python linter 
failure

### What changes were proposed in this pull request?
The pr aims to fix `python linter issue` on branch-3.4 through pinning 
`matplotlib<3.3.0`

### Why are the changes needed?
- Through this PR https://github.com/apache/spark/pull/45600, we found that 
the version of `matplotlib` in our Docker image was `3.8.2`, which clearly did 
not meet the original requirements for `branch-3.4`.
  
https://github.com/panbingkun/spark/actions/runs/8354370179/job/22869580038
  https://github.com/apache/spark/assets/15246973/dd425bfb-ce5f-4a99-a487-a462d6e9";>
  https://github.com/apache/spark/blob/branch-3.4/dev/requirements.txt#L12
  https://github.com/apache/spark/assets/15246973/70485648-b886-4218-bb21-c41a85d5eecf";>

- Fix as follows:
https://github.com/apache/spark/assets/15246973/db31d8fb-0b6c-4925-95e1-0ca0247bb9f5";>

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #45608 from panbingkun/branch_3.4_pin_matplotlib.

Authored-by: panbingkun 
Signed-off-by: Dongjoon Hyun 
---
 dev/infra/Dockerfile | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/dev/infra/Dockerfile b/dev/infra/Dockerfile
index 68d27052437b..5ebd10339be9 100644
--- a/dev/infra/Dockerfile
+++ b/dev/infra/Dockerfile
@@ -37,6 +37,7 @@ RUN add-apt-repository ppa:pypy/ppa
 RUN apt update
 RUN $APT_INSTALL gfortran libopenblas-dev liblapack-dev
 RUN $APT_INSTALL build-essential
+RUN $APT_INSTALL python3-matplotlib
 
 RUN mkdir -p /usr/local/pypy/pypy3.7 && \
 curl -sqL https://downloads.python.org/pypy/pypy3.7-v7.3.7-linux64.tar.bz2 
| tar xjf - -C /usr/local/pypy/pypy3.7 --strip-components=1 && \
@@ -64,8 +65,8 @@ RUN Rscript -e "devtools::install_version('roxygen2', 
version='7.2.0', repos='ht
 # See more in SPARK-39735
 ENV R_LIBS_SITE 
"/usr/local/lib/R/site-library:${R_LIBS_SITE}:/usr/lib/R/library"
 
-RUN pypy3 -m pip install numpy 'pandas<=1.5.3' scipy coverage matplotlib
-RUN python3.9 -m pip install 'numpy==1.23.5' 'pyarrow==12.0.1' 'pandas<=1.5.3' 
scipy unittest-xml-reporting plotly>=4.8 scikit-learn 'mlflow>=1.0' coverage 
matplotlib openpyxl 'memory-profiler==0.60.0' 'scikit-learn==1.1.*'
+RUN pypy3 -m pip install numpy 'pandas<=1.5.3' scipy coverage 
'matplotlib<3.3.0'
+RUN python3.9 -m pip install 'numpy==1.23.5' 'pyarrow==12.0.1' 'pandas<=1.5.3' 
scipy unittest-xml-reporting plotly>=4.8 scikit-learn 'mlflow>=1.0' coverage 
'matplotlib<3.3.0' openpyxl 'memory-profiler==0.60.0' 'scikit-learn==1.1.*'
 
 # Add Python deps for Spark Connect.
 RUN python3.9 -m pip install grpcio protobuf googleapis-common-protos 
grpcio-status


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch branch-3.4 updated: [SPARK-47472][INFRA][3.4] Pin `numpy` to 1.23.5 in `dev/infra/Dockerfile`

2024-03-19 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new d25f49a14733 [SPARK-47472][INFRA][3.4] Pin `numpy` to 1.23.5 in 
`dev/infra/Dockerfile`
d25f49a14733 is described below

commit d25f49a14733c5a0e872498cab40a30a5ebc28b4
Author: Dongjoon Hyun 
AuthorDate: Tue Mar 19 20:53:45 2024 -0700

[SPARK-47472][INFRA][3.4] Pin `numpy` to 1.23.5 in `dev/infra/Dockerfile`

### What changes were proposed in this pull request?

This PR aims to pin `numpy` to 1.23.5 in `dev/infra/Dockerfile` to recover 
the following test failure.

### Why are the changes needed?

`numpy==1.23.5` was the version of the last successful run.
- https://github.com/apache/spark/actions/runs/8276453417/job/22725387782

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

Closes #45595 from dongjoon-hyun/pin-numpy.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 dev/infra/Dockerfile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/dev/infra/Dockerfile b/dev/infra/Dockerfile
index 93d8793826ff..68d27052437b 100644
--- a/dev/infra/Dockerfile
+++ b/dev/infra/Dockerfile
@@ -65,7 +65,7 @@ RUN Rscript -e "devtools::install_version('roxygen2', 
version='7.2.0', repos='ht
 ENV R_LIBS_SITE 
"/usr/local/lib/R/site-library:${R_LIBS_SITE}:/usr/lib/R/library"
 
 RUN pypy3 -m pip install numpy 'pandas<=1.5.3' scipy coverage matplotlib
-RUN python3.9 -m pip install numpy 'pyarrow==12.0.1' 'pandas<=1.5.3' scipy 
unittest-xml-reporting plotly>=4.8 scikit-learn 'mlflow>=1.0' coverage 
matplotlib openpyxl 'memory-profiler==0.60.0' 'scikit-learn==1.1.*'
+RUN python3.9 -m pip install 'numpy==1.23.5' 'pyarrow==12.0.1' 'pandas<=1.5.3' 
scipy unittest-xml-reporting plotly>=4.8 scikit-learn 'mlflow>=1.0' coverage 
matplotlib openpyxl 'memory-profiler==0.60.0' 'scikit-learn==1.1.*'
 
 # Add Python deps for Spark Connect.
 RUN python3.9 -m pip install grpcio protobuf googleapis-common-protos 
grpcio-status


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated (bc378f4ff5e2 -> 61d7b0f24fc9)

2024-03-19 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from bc378f4ff5e2 [SPARK-47330][SQL][TESTS] XML: Added XmlExpressionsSuite
 add 61d7b0f24fc9 [SPARK-47470][SQL][TESTS] Ignore 
`IntentionallyFaultyConnectionProvider` error in `CliSuite`

No new revisions were added by this update.

Summary of changes:
 .../test/scala/org/apache/spark/sql/hive/thriftserver/CliSuite.scala| 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-47468][BUILD] Exclude `logback` dependency from SBT like Maven

2024-03-19 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new c32d27850e2e [SPARK-47468][BUILD] Exclude `logback` dependency from 
SBT like Maven
c32d27850e2e is described below

commit c32d27850e2ea5f8cb36099ab8453b09f4c70861
Author: Dongjoon Hyun 
AuthorDate: Tue Mar 19 17:52:38 2024 -0700

[SPARK-47468][BUILD] Exclude `logback` dependency from SBT like Maven

### What changes were proposed in this pull request?

This PR aims to exclude `logback` from SBT dependency like Maven to fix the 
following SBT issue.

```
[info]   stderr> SLF4J: Class path contains multiple SLF4J bindings.
[info]   stderr> SLF4J: Found binding in 
[jar:file:/home/runner/work/spark/spark/assembly/target/scala-2.13/jars/logback-classic-1.2.13.jar!/org/slf4j/impl/StaticLoggerBinder.class]
[info]   stderr> SLF4J: Found binding in 
[jar:file:/home/runner/.cache/coursier/v1/https/maven-central.storage-download.googleapis.com/maven2/ch/qos/logback/logback-classic/1.2.13/logback-classic-1.2.13.jar!/org/slf4j/impl/StaticLoggerBinder.class]
[info]   stderr> SLF4J: See 
http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
[info]   stderr> SLF4J: Actual binding is of type 
[ch.qos.logback.classic.util.ContextSelectorStaticBinder]
```

### Why are the changes needed?

**Maven**
```
$ build/mvn dependency:tree --pl core | grep logback
Using `mvn` from path: /opt/homebrew/bin/mvn
Using SPARK_LOCAL_IP=localhost
```

**SBT (BEFORE)**
```
$ build/sbt "core/test:dependencyTree" | grep logback
Using SPARK_LOCAL_IP=localhost
[info]   |   +-ch.qos.logback:logback-classic:1.2.13
[info]   |   | +-ch.qos.logback:logback-core:1.2.13
[info]   |   +-ch.qos.logback:logback-core:1.2.13
[info]   | | +-ch.qos.logback:logback-classic:1.2.13
[info]   | | | +-ch.qos.logback:logback-core:1.2.13
[info]   | | +-ch.qos.logback:logback-core:1.2.13
[info]   | +-ch.qos.logback:logback-classic:1.2.13
[info]   | | +-ch.qos.logback:logback-core:1.2.13
[info]   | +-ch.qos.logback:logback-core:1.2.13
```

**SBT (AFTER)**
```
$ build/sbt "core/test:dependencyTree" | grep logback
Using SPARK_LOCAL_IP=localhost
```

### Does this PR introduce _any_ user-facing change?

No. This only fixes developer and CI issues.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

    No.
    
Closes #45594 from dongjoon-hyun/SPARK-47468.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 project/SparkBuild.scala | 1 +
 1 file changed, 1 insertion(+)

diff --git a/project/SparkBuild.scala b/project/SparkBuild.scala
index b7b9589568e1..3d89af2aa7b4 100644
--- a/project/SparkBuild.scala
+++ b/project/SparkBuild.scala
@@ -1078,6 +1078,7 @@ object ExcludedDependencies {
 // purpose only. Here we exclude them from the whole project scope and add 
them w/ yarn only.
 excludeDependencies ++= Seq(
   ExclusionRule(organization = "com.sun.jersey"),
+  ExclusionRule(organization = "ch.qos.logback"),
   ExclusionRule("javax.ws.rs", "jsr311-api"))
   )
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-47464][INFRA] Update `labeler.yml` for module `common/sketch` and `common/variant`

2024-03-19 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 32ee2d7936a5 [SPARK-47464][INFRA] Update `labeler.yml` for module 
`common/sketch` and `common/variant`
32ee2d7936a5 is described below

commit 32ee2d7936a50a653e8ea599d622fbc550fa5eac
Author: panbingkun 
AuthorDate: Tue Mar 19 16:27:15 2024 -0700

[SPARK-47464][INFRA] Update `labeler.yml` for module `common/sketch` and 
`common/variant`

### What changes were proposed in this pull request?
The pr aims to update `labeler.yml` for module `common/sketch` and 
`common/variant`.

### Why are the changes needed?
Currently, the above modules are not classified in the file `labeler.yml`, 
and the GitHub action label cannot automatically tag the submitted PR.

### Does this PR introduce _any_ user-facing change?
Yes, only for dev.

### How was this patch tested?
Manually test: after this PR is merged, continue to observe.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #45590 from panbingkun/SPARK-47464.

Authored-by: panbingkun 
Signed-off-by: Dongjoon Hyun 
---
 .github/labeler.yml | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/.github/labeler.yml b/.github/labeler.yml
index 7d24390f2968..104eac99ec4d 100644
--- a/.github/labeler.yml
+++ b/.github/labeler.yml
@@ -101,6 +101,8 @@ SQL:
 ]
 - any-glob-to-any-file: [
  'common/unsafe/**/*',
+ 'common/sketch/**/*',
+ 'common/variant/**/*',
  'bin/spark-sql*',
  'bin/beeline*',
  'sbin/*thriftserver*.sh',


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated (90560dce85b0 -> db531c6ee719)

2024-03-19 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 90560dce85b0 [SPARK-47458][CORE] Fix the problem with calculating the 
maximum concurrent tasks for the barrier stage
 add db531c6ee719 [SPARK-47461][CORE] Remove private function 
`totalRunningTasksPerResourceProfile` from `ExecutorAllocationManager`

No new revisions were added by this update.

Summary of changes:
 core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala  | 4 
 .../test/scala/org/apache/spark/ExecutorAllocationManagerSuite.scala  | 4 +---
 2 files changed, 1 insertion(+), 7 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated (b6a836946311 -> a6bffcc3e5f0)

2024-03-19 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from b6a836946311 [SPARK-47454][PYTHON][CONNECT][TESTS] Split 
`pyspark.sql.tests.test_dataframe`
 add a6bffcc3e5f0 [SPARK-47457][SQL] Fix 
`IsolatedClientLoader.supportsHadoopShadedClient` to handle Hadoop 3.4+

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala | 2 ++
 .../org/apache/spark/sql/hive/client/HadoopVersionInfoSuite.scala | 4 ++--
 2 files changed, 4 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-47452][INFRA] Use `Ubuntu 22.04` in `dev/infra/Dockerfile`

2024-03-18 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new ef94f7094989 [SPARK-47452][INFRA] Use `Ubuntu 22.04` in 
`dev/infra/Dockerfile`
ef94f7094989 is described below

commit ef94f709498974cb31e805541e0803270cd5c39e
Author: Dongjoon Hyun 
AuthorDate: Mon Mar 18 23:15:32 2024 -0700

[SPARK-47452][INFRA] Use `Ubuntu 22.04` in `dev/infra/Dockerfile`

### What changes were proposed in this pull request?

This PR aims to use `Ubuntu 22.04` in `dev/infra/Dockerfile` for Apache 
Spark 4.0.0.

| Installed SW  | BEFORE | AFTER |
| - |  | --- |
| Ubuntu LTS   | 20.04.5 | 22.04.4  |
| Java| 17.0.10  | 17.0.10 |
| PyPy 3.8| 3.8.16| 3.8.16  |
| Python 3.9 | 3.9.5 | 3.9.18  |
| Python 3.10   | 3.10.13  | 3.10.12 |
| Python 3.11| 3.11.8| 3.11.8 |
| Python 3.12   | 3.12.2| 3.12.2 |
| R | 3.6.3 | 4.1.2  |

### Why are the changes needed?

- Since Apache Spark 3.4.0, we use `Ubuntu 20.04` via SPARK-39522.
- From Apache Spark 4.0.0, this PR aims to use `Ubuntu 22.04` mainly.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45576 from dongjoon-hyun/SPARK-47452.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 dev/infra/Dockerfile | 52 +---
 1 file changed, 25 insertions(+), 27 deletions(-)

diff --git a/dev/infra/Dockerfile b/dev/infra/Dockerfile
index 64adf33e6742..f17ee58c9d90 100644
--- a/dev/infra/Dockerfile
+++ b/dev/infra/Dockerfile
@@ -15,11 +15,11 @@
 # limitations under the License.
 #
 
-# Image for building and testing Spark branches. Based on Ubuntu 20.04.
+# Image for building and testing Spark branches. Based on Ubuntu 22.04.
 # See also in https://hub.docker.com/_/ubuntu
-FROM ubuntu:focal-20221019
+FROM ubuntu:jammy-20240227
 
-ENV FULL_REFRESH_DATE 20240117
+ENV FULL_REFRESH_DATE 20240318
 
 ENV DEBIAN_FRONTEND noninteractive
 ENV DEBCONF_NONINTERACTIVE_SEEN true
@@ -50,10 +50,8 @@ RUN apt-get update && apt-get install -y \
 openjdk-17-jdk-headless \
 pandoc \
 pkg-config \
-python3-pip \
-python3-setuptools \
-python3.8 \
-python3.9 \
+python3.10 \
+python3-psutil \
 qpdf \
 r-base \
 ruby \
@@ -64,10 +62,10 @@ RUN apt-get update && apt-get install -y \
 && rm -rf /var/lib/apt/lists/*
 
 
-RUN echo 'deb https://cloud.r-project.org/bin/linux/ubuntu focal-cran40/' >> 
/etc/apt/sources.list
+RUN echo 'deb https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/' >> 
/etc/apt/sources.list
 RUN gpg --keyserver hkps://keyserver.ubuntu.com --recv-key 
E298A3A825C0D65DFD57CBB651716619E084DAB9
 RUN gpg -a --export E084DAB9 | apt-key add -
-RUN add-apt-repository 'deb https://cloud.r-project.org/bin/linux/ubuntu 
focal-cran40/'
+RUN add-apt-repository 'deb https://cloud.r-project.org/bin/linux/ubuntu 
jammy-cran40/'
 
 # See more in SPARK-39959, roxygen2 < 7.2.1
 RUN Rscript -e "install.packages(c('devtools', 'knitr', 'markdown',  \
@@ -82,9 +80,6 @@ RUN Rscript -e "install.packages(c('devtools', 'knitr', 
'markdown',  \
 ENV R_LIBS_SITE 
"/usr/local/lib/R/site-library:${R_LIBS_SITE}:/usr/lib/R/library"
 
 
-RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.9
-
-
 RUN add-apt-repository ppa:pypy/ppa
 RUN mkdir -p /usr/local/pypy/pypy3.8 && \
 curl -sqL 
https://downloads.python.org/pypy/pypy3.8-v7.3.11-linux64.tar.bz2 | tar xjf - 
-C /usr/local/pypy/pypy3.8 --strip-components=1 && \
@@ -98,41 +93,44 @@ ARG BASIC_PIP_PKGS="numpy pyarrow>=15.0.0 six==1.16.0 
pandas<=2.2.1 scipy plotly
 # Python deps for Spark Connect
 ARG CONNECT_PIP_PKGS="grpcio==1.62.0 grpcio-status==1.62.0 protobuf==4.25.1 
googleapis-common-protos==1.56.4"
 
-# Add torch as a testing dependency for TorchDistributor and 
DeepspeedTorchDistributor
-RUN python3.9 -m pip install $BASIC_PIP_PKGS unittest-xml-reporting 
$CONNECT_PIP_PKGS && \
-python3.9 -m pip install torch torchvision --index-url 
https://download.pytorch.org/whl/cpu && \
-python3.9 -m pip install deepspeed torcheval && \
-python3.9 -m pip cache purge
-
-# Install Python 3.10 at the last stage to avoid breaking Python 3.9
-RUN add-apt-repository ppa:deadsnakes/ppa
-RUN apt-get update && apt-get install -y \
-python3.10 python3.10-distut

(spark) branch master updated (5f48931fcdf7 -> 5e42ecc8163a)

2024-03-18 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 5f48931fcdf7 [SPARK-47453][SQL][DOCKER][BUILD][TESTS] Upgrade MySQL 
docker image version to 8.3.0
 add 5e42ecc8163a [SPARK-47456][SQL] Support ORC Brotli codec

No new revisions were added by this update.

Summary of changes:
 docs/sql-data-sources-orc.md | 2 +-
 .../src/main/scala/org/apache/spark/sql/internal/SQLConf.scala   | 4 ++--
 .../spark/sql/execution/datasources/orc/OrcCompressionCodec.java | 3 ++-
 .../org/apache/spark/sql/execution/datasources/orc/OrcUtils.scala| 3 ++-
 .../spark/sql/execution/datasources/FileSourceCodecSuite.scala   | 5 -
 5 files changed, 11 insertions(+), 6 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated (681b41f0808e -> 5f48931fcdf7)

2024-03-18 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 681b41f0808e [SPARK-47422][SQL] Support collated strings in array 
operations
 add 5f48931fcdf7 [SPARK-47453][SQL][DOCKER][BUILD][TESTS] Upgrade MySQL 
docker image version to 8.3.0

No new revisions were added by this update.

Summary of changes:
 ...baseOnDocker.scala => MySQLDatabaseOnDocker.scala} | 17 +++--
 .../apache/spark/sql/jdbc/MySQLIntegrationSuite.scala | 15 +++
 .../spark/sql/jdbc/v2/MySQLIntegrationSuite.scala | 19 ---
 .../spark/sql/jdbc/v2/MySQLNamespaceSuite.scala   | 19 ---
 4 files changed, 18 insertions(+), 52 deletions(-)
 copy 
connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/{DB2DatabaseOnDocker.scala
 => MySQLDatabaseOnDocker.scala} (66%)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated (9f8147c2a8d2 -> e01ed0da22f2)

2024-03-18 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 9f8147c2a8d2 [SPARK-47329][SS][DOCS] Add note to persist dataframe 
while using foreachbatch and stateful streaming query to prevent state from 
being re-loaded in each batch
 add e01ed0da22f2 [SPARK-47345][SQL][TESTS][FOLLOW-UP] Rename JSON to XML 
within XmlFunctionsSuite

No new revisions were added by this update.

Summary of changes:
 sql/core/src/test/scala/org/apache/spark/sql/XmlFunctionsSuite.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated (cb20fcae951d -> acf17fd67217)

2024-03-18 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from cb20fcae951d [SPARK-47448][CORE] Enable 
`spark.shuffle.service.removeShuffle` by default
 add acf17fd67217 [SPARK-47450][INFRA][R] Use R 4.3.3 in `windows` R GitHub 
Action job

No new revisions were added by this update.

Summary of changes:
 .github/workflows/build_sparkr_window.yml | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated (51e8634a5883 -> cb20fcae951d)

2024-03-18 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 51e8634a5883 [SPARK-47380][CONNECT] Ensure on the server side that the 
SparkSession is the same
 add cb20fcae951d [SPARK-47448][CORE] Enable 
`spark.shuffle.service.removeShuffle` by default

No new revisions were added by this update.

Summary of changes:
 core/src/main/scala/org/apache/spark/internal/config/package.scala | 2 +-
 core/src/test/scala/org/apache/spark/MapOutputTrackerSuite.scala   | 1 +
 docs/configuration.md  | 2 +-
 docs/core-migration-guide.md   | 2 ++
 4 files changed, 5 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-47446][CORE] Make `BlockManager` warn before `removeBlockInternal`

2024-03-18 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new a40940a0bc6d [SPARK-47446][CORE] Make `BlockManager` warn before 
`removeBlockInternal`
a40940a0bc6d is described below

commit a40940a0bc6de58b5c56b8ad918f338c6e70572f
Author: Dongjoon Hyun 
AuthorDate: Mon Mar 18 12:39:44 2024 -0700

[SPARK-47446][CORE] Make `BlockManager` warn before `removeBlockInternal`

### What changes were proposed in this pull request?

This PR aims to make `BlockManager` warn before invoking 
`removeBlockInternal` by switching the log position. To be clear,
1. For the case where `removeBlockInternal` succeeds, the log messages are 
identical before and after this PR.
2. For the case where `removeBlockInternal` fails, the user will see one 
additional warning message like the following which was hidden from the users 
before this PR.
```
logWarning(s"Putting block $blockId failed")
```

### Why are the changes needed?

When `Put` operation fails, Apache Spark currently tries 
`removeBlockInternal` first before logging.


https://github.com/apache/spark/blob/ce93c9fd86715e2479552628398f6fc11e83b2af/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L1554-L1567

On top of that, if `removeBlockInternal` fails consecutively, Spark shows 
the warning like the following and fails the job.
```
24/03/18 18:40:46 WARN BlockManager: Putting block broadcast_0 failed due 
to exception java.nio.file.NoSuchFileException: 
/data/spark/blockmgr-56a6c418-90be-4d89-9707-ef45f7eaf74c/0e.
24/03/18 18:40:46 WARN BlockManager: Block broadcast_0 was not removed 
normally.
24/03/18 18:40:46 INFO TaskSchedulerImpl: Cancelling stage 0
24/03/18 18:40:46 INFO TaskSchedulerImpl: Killing all running tasks in 
stage 0: Stage cancelled
24/03/18 18:40:46 INFO DAGScheduler: ResultStage 0 (reduce at 
SparkPi.scala:38) failed in 0.264 s due to Job aborted due to stage failure: 
Task serialization failed: java.nio.file.NoSuchFileException: 
/data/spark/blockmgr-56a6c418-90be-4d89-9707-ef45f7eaf74c/0e
java.nio.file.NoSuchFileException: 
/data/spark/blockmgr-56a6c418-90be-4d89-9707-ef45f7eaf74c/0e
```

It's misleading although they might share the same root cause. Since `Put` 
operation fails before the above failure, we had better switch WARN message to 
make it clear.

### Does this PR introduce _any_ user-facing change?

No. This is a warning message change only.

### How was this patch tested?

Manual review.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45570 from dongjoon-hyun/SPARK-47446.

    Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 core/src/main/scala/org/apache/spark/storage/BlockManager.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/core/src/main/scala/org/apache/spark/storage/BlockManager.scala 
b/core/src/main/scala/org/apache/spark/storage/BlockManager.scala
index 228ec5752e1b..89b3914e94af 100644
--- a/core/src/main/scala/org/apache/spark/storage/BlockManager.scala
+++ b/core/src/main/scala/org/apache/spark/storage/BlockManager.scala
@@ -1561,8 +1561,8 @@ private[spark] class BlockManager(
   blockInfoManager.unlock(blockId)
 }
   } else {
-removeBlockInternal(blockId, tellMaster = false)
 logWarning(s"Putting block $blockId failed")
+removeBlockInternal(blockId, tellMaster = false)
   }
   res
 } catch {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-47383][CORE] Support `spark.shutdown.timeout` config

2024-03-18 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new ce93c9fd8671 [SPARK-47383][CORE] Support `spark.shutdown.timeout` 
config
ce93c9fd8671 is described below

commit ce93c9fd86715e2479552628398f6fc11e83b2af
Author: Rob Reeves 
AuthorDate: Mon Mar 18 10:36:38 2024 -0700

[SPARK-47383][CORE] Support `spark.shutdown.timeout` config

### What changes were proposed in this pull request?
Make the shutdown hook timeout configurable. If this is not defined it 
falls back to the existing behavior, which uses a default timeout of 30 
seconds, or whatever is defined in core-site.xml for the 
hadoop.service.shutdown.timeout property.

### Why are the changes needed?
Spark sometimes times out during the shutdown process. This can result in 
data left in the queues to be dropped and causes metadata loss (e.g. event 
logs, anything written by custom listeners).

This is not easily configurable before this change. The underlying 
`org.apache.hadoop.util.ShutdownHookManager` has a default timeout of 30 
seconds.  It can be configured by setting hadoop.service.shutdown.timeout, but 
this must be done in the core-site.xml/core-default.xml because a new hadoop 
conf object is created and there is no opportunity to modify it.

### Does this PR introduce _any_ user-facing change?
Yes, a new config `spark.shutdown.timeout` is added.

### How was this patch tested?
Manual testing in spark-shell. This behavior is not practical to write a 
unit test for.

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #45504 from robreeves/sc_shutdown_timeout.

Authored-by: Rob Reeves 
Signed-off-by: Dongjoon Hyun 
---
 .../org/apache/spark/internal/config/package.scala| 10 ++
 .../org/apache/spark/util/ShutdownHookManager.scala   | 19 +--
 2 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/internal/config/package.scala 
b/core/src/main/scala/org/apache/spark/internal/config/package.scala
index aa240b5cc5b5..e72b9cb694eb 100644
--- a/core/src/main/scala/org/apache/spark/internal/config/package.scala
+++ b/core/src/main/scala/org/apache/spark/internal/config/package.scala
@@ -2683,4 +2683,14 @@ package object config {
   .version("4.0.0")
   .booleanConf
   .createWithDefault(false)
+
+  private[spark] val SPARK_SHUTDOWN_TIMEOUT_MS =
+ConfigBuilder("spark.shutdown.timeout")
+  .internal()
+  .doc("Defines the timeout period to wait for all shutdown hooks to be 
executed. " +
+"This must be passed as a system property argument in the Java 
options, for example " +
+"spark.driver.extraJavaOptions=\"-Dspark.shutdown.timeout=60s\".")
+  .version("4.0.0")
+  .timeConf(TimeUnit.MILLISECONDS)
+  .createOptional
 }
diff --git 
a/core/src/main/scala/org/apache/spark/util/ShutdownHookManager.scala 
b/core/src/main/scala/org/apache/spark/util/ShutdownHookManager.scala
index 4db268604a3e..c6cad9440168 100644
--- a/core/src/main/scala/org/apache/spark/util/ShutdownHookManager.scala
+++ b/core/src/main/scala/org/apache/spark/util/ShutdownHookManager.scala
@@ -19,12 +19,16 @@ package org.apache.spark.util
 
 import java.io.File
 import java.util.PriorityQueue
+import java.util.concurrent.TimeUnit
 
 import scala.util.Try
 
 import org.apache.hadoop.fs.FileSystem
 
+import org.apache.spark.SparkConf
 import org.apache.spark.internal.Logging
+import org.apache.spark.internal.config.SPARK_SHUTDOWN_TIMEOUT_MS
+
 
 /**
  * Various utility methods used by Spark.
@@ -177,8 +181,19 @@ private [util] class SparkShutdownHookManager {
 val hookTask = new Runnable() {
   override def run(): Unit = runAll()
 }
-org.apache.hadoop.util.ShutdownHookManager.get().addShutdownHook(
-  hookTask, FileSystem.SHUTDOWN_HOOK_PRIORITY + 30)
+val priority = FileSystem.SHUTDOWN_HOOK_PRIORITY + 30
+// The timeout property must be passed as a Java system property because 
this
+// is initialized before Spark configurations are registered as system
+// properties later in initialization.
+val timeout = new SparkConf().get(SPARK_SHUTDOWN_TIMEOUT_MS)
+
+timeout.fold {
+  org.apache.hadoop.util.ShutdownHookManager.get().addShutdownHook(
+hookTask, priority)
+} { t =>
+  org.apache.hadoop.util.ShutdownHookManager.get().addShutdownHook(
+hookTask, priority, t, TimeUnit.MILLISECONDS)
+}
   }
 
   def runAll(): Unit = {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-47435][SQL] Fix overflow issue of MySQL UNSIGNED TINYINT caused by SPARK-45561

2024-03-18 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 8bd42cbdb6bf [SPARK-47435][SQL] Fix overflow issue of MySQL UNSIGNED 
TINYINT  caused by SPARK-45561
8bd42cbdb6bf is described below

commit 8bd42cbdb6bfa40aead94570b06e926f8e8aa9e1
Author: Kent Yao 
AuthorDate: Mon Mar 18 08:56:55 2024 -0700

[SPARK-47435][SQL] Fix overflow issue of MySQL UNSIGNED TINYINT  caused by 
SPARK-45561

### What changes were proposed in this pull request?

SPARK-45561 mapped java.sql.Types.TINYINT to ByteType in MySQL Dialect, 
which caused unsigned TINYINT overflow. As regardless of signed or unsigned 
types, the TINYINT is used for java.sql.Types.

In this PR, we put the signed info into the metadata for mapping TINYINT to 
short or byte.

### Why are the changes needed?

bugfix

### Does this PR introduce _any_ user-facing change?

Uses can read MySQL UNSIGNED TINYINT values after this PR like versions 
before 3.5.0 which has breaked since 3.5.1

### How was this patch tested?

new tests

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #45556 from yaooqinn/SPARK-47435.

Authored-by: Kent Yao 
Signed-off-by: Dongjoon Hyun 
---
 .../spark/sql/jdbc/MySQLIntegrationSuite.scala |  9 ++--
 .../spark/sql/jdbc/v2/DB2IntegrationSuite.scala|  9 ++--
 .../sql/jdbc/v2/MsSqlServerIntegrationSuite.scala  |  6 ++-
 .../spark/sql/jdbc/v2/MySQLIntegrationSuite.scala  | 15 --
 .../spark/sql/jdbc/v2/OracleIntegrationSuite.scala |  9 ++--
 .../sql/jdbc/v2/PostgresIntegrationSuite.scala |  9 ++--
 .../org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala  | 26 ++
 .../sql/execution/datasources/jdbc/JdbcUtils.scala |  5 +-
 .../org/apache/spark/sql/jdbc/MySQLDialect.scala   | 10 ++--
 .../v2/jdbc/JDBCTableCatalogSuite.scala| 60 --
 .../org/apache/spark/sql/jdbc/JDBCSuite.scala  | 24 +
 11 files changed, 114 insertions(+), 68 deletions(-)

diff --git 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala
 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala
index b1d239337aa0..79e88f109534 100644
--- 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala
+++ 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala
@@ -57,10 +57,11 @@ class MySQLIntegrationSuite extends 
DockerJDBCIntegrationSuite {
 
 conn.prepareStatement("CREATE TABLE numbers (onebit BIT(1), tenbits 
BIT(10), "
   + "small SMALLINT, med MEDIUMINT, nor INT, big BIGINT, deci 
DECIMAL(40,20), flt FLOAT, "
-  + "dbl DOUBLE, tiny TINYINT)").executeUpdate()
+  + "dbl DOUBLE, tiny TINYINT, u_tiny TINYINT UNSIGNED)").executeUpdate()
+
 conn.prepareStatement("INSERT INTO numbers VALUES (b'0', b'1000100101', "
   + "17, 7, 123456789, 123456789012345, 
123456789012345.123456789012345, "
-  + "42.75, 1.0002, -128)").executeUpdate()
+  + "42.75, 1.0002, -128, 255)").executeUpdate()
 
 conn.prepareStatement("CREATE TABLE dates (d DATE, t TIME, dt DATETIME, ts 
TIMESTAMP, "
   + "yr YEAR)").executeUpdate()
@@ -90,7 +91,7 @@ class MySQLIntegrationSuite extends 
DockerJDBCIntegrationSuite {
 val rows = df.collect()
 assert(rows.length == 1)
 val types = rows(0).toSeq.map(x => x.getClass.toString)
-assert(types.length == 10)
+assert(types.length == 11)
 assert(types(0).equals("class java.lang.Boolean"))
 assert(types(1).equals("class java.lang.Long"))
 assert(types(2).equals("class java.lang.Integer"))
@@ -101,6 +102,7 @@ class MySQLIntegrationSuite extends 
DockerJDBCIntegrationSuite {
 assert(types(7).equals("class java.lang.Double"))
 assert(types(8).equals("class java.lang.Double"))
 assert(types(9).equals("class java.lang.Byte"))
+assert(types(10).equals("class java.lang.Short"))
 assert(rows(0).getBoolean(0) == false)
 assert(rows(0).getLong(1) == 0x225)
 assert(rows(0).getInt(2) == 17)
@@ -112,6 +114,7 @@ class MySQLIntegrationSuite extends 
DockerJDBCIntegrationSuite {
 assert(rows(0).getDouble(7) == 42.75)
 assert(rows(0).getDouble(8) == 1.0002)
 assert(rows(0).getByte(9) == 0x80.toByte)
+assert(rows(0).getShort(10) == 0xff.toShort)
   }
 
   test("Date types") {
diff --git 
a/connector/docker-integration-tests/src/test/scala/org/apa

(spark) branch master updated (4dc362dbc6c0 -> 1aafe60b3e76)

2024-03-18 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 4dc362dbc6c0 [SPARK-47438][BUILD] Upgrade jackson to 2.17.0
 add 1aafe60b3e76 [SPARK-47442][CORE][TEST] Use port 0 to start worker 
servers in MasterSuite

No new revisions were added by this update.

Summary of changes:
 .../test/scala/org/apache/spark/deploy/master/MasterSuiteBase.scala| 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-47438][BUILD] Upgrade jackson to 2.17.0

2024-03-18 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 4dc362dbc6c0 [SPARK-47438][BUILD] Upgrade jackson to 2.17.0
4dc362dbc6c0 is described below

commit 4dc362dbc6c039d955e4dceb87e53dfc76ef2a5c
Author: panbingkun 
AuthorDate: Mon Mar 18 08:25:16 2024 -0700

[SPARK-47438][BUILD] Upgrade jackson to 2.17.0

### What changes were proposed in this pull request?
The pr aims to upgrade  jackson from `2.16.1` to `2.17.0`.

### Why are the changes needed?
The full release notes: 
https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.17

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #45562 from panbingkun/SPARK-47438.

Authored-by: panbingkun 
Signed-off-by: Dongjoon Hyun 
---
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 14 +++---
 pom.xml   |  4 ++--
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index d4b7d38aea22..86da61d89149 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -103,15 +103,15 @@ icu4j/72.1//icu4j-72.1.jar
 ini4j/0.5.4//ini4j-0.5.4.jar
 istack-commons-runtime/3.0.8//istack-commons-runtime-3.0.8.jar
 ivy/2.5.2//ivy-2.5.2.jar
-jackson-annotations/2.16.1//jackson-annotations-2.16.1.jar
+jackson-annotations/2.17.0//jackson-annotations-2.17.0.jar
 jackson-core-asl/1.9.13//jackson-core-asl-1.9.13.jar
-jackson-core/2.16.1//jackson-core-2.16.1.jar
-jackson-databind/2.16.1//jackson-databind-2.16.1.jar
-jackson-dataformat-cbor/2.16.1//jackson-dataformat-cbor-2.16.1.jar
-jackson-dataformat-yaml/2.16.1//jackson-dataformat-yaml-2.16.1.jar
-jackson-datatype-jsr310/2.16.1//jackson-datatype-jsr310-2.16.1.jar
+jackson-core/2.17.0//jackson-core-2.17.0.jar
+jackson-databind/2.17.0//jackson-databind-2.17.0.jar
+jackson-dataformat-cbor/2.17.0//jackson-dataformat-cbor-2.17.0.jar
+jackson-dataformat-yaml/2.17.0//jackson-dataformat-yaml-2.17.0.jar
+jackson-datatype-jsr310/2.17.0//jackson-datatype-jsr310-2.17.0.jar
 jackson-mapper-asl/1.9.13//jackson-mapper-asl-1.9.13.jar
-jackson-module-scala_2.13/2.16.1//jackson-module-scala_2.13-2.16.1.jar
+jackson-module-scala_2.13/2.17.0//jackson-module-scala_2.13-2.17.0.jar
 jakarta.annotation-api/2.0.0//jakarta.annotation-api-2.0.0.jar
 jakarta.inject-api/2.0.1//jakarta.inject-api-2.0.1.jar
 jakarta.servlet-api/5.0.0//jakarta.servlet-api-5.0.0.jar
diff --git a/pom.xml b/pom.xml
index 757d911c1229..5cc56a92999d 100644
--- a/pom.xml
+++ b/pom.xml
@@ -184,8 +184,8 @@
 true
 true
 1.9.13
-2.16.1
-
2.16.1
+2.17.0
+
2.17.0
 2.3.1
 3.0.2
 1.1.10.5


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [MINOR][DOCS] Add `Web UI` link to `Other Documents` section of index.md

2024-03-18 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 57424b92c5b5 [MINOR][DOCS] Add `Web UI` link to `Other Documents` 
section of index.md
57424b92c5b5 is described below

commit 57424b92c5b5e7c3de680a7d8a6b137911f45666
Author: Matt Braymer-Hayes 
AuthorDate: Mon Mar 18 07:53:11 2024 -0700

[MINOR][DOCS] Add `Web UI` link to `Other Documents` section of index.md

### What changes were proposed in this pull request?

Adds the Web UI to the `Other Documents` list on the main page.

### Why are the changes needed?

I found it difficult to find the Web UI docs: it's only linked inside the 
Monitoring docs. Adding it to the main page will make it easier for people to 
find and use the docs.

### Does this PR introduce _any_ user-facing change?

Yes: adds another cross-reference on the main page.

### How was this patch tested?

Visually verified that Markdown still rendered properly.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #45534 from mattayes/patch-2.

Authored-by: Matt Braymer-Hayes 
Signed-off-by: Dongjoon Hyun 
---
 docs/index.md | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/docs/index.md b/docs/index.md
index 5f3858bec86b..12c53c40c8f7 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -138,6 +138,7 @@ options for deployment:
 
 * [Configuration](configuration.html): customize Spark via its configuration 
system
 * [Monitoring](monitoring.html): track the behavior of your applications
+* [Web UI](web-ui.html): view useful information about your applications
 * [Tuning Guide](tuning.html): best practices to optimize performance and 
memory use
 * [Job Scheduling](job-scheduling.html): scheduling resources across and 
within Spark applications
 * [Security](security.html): Spark security support
@@ -145,7 +146,7 @@ options for deployment:
 * Integration with other storage systems:
   * [Cloud Infrastructures](cloud-integration.html)
   * [OpenStack Swift](storage-openstack-swift.html)
-* [Migration Guide](migration-guide.html): Migration guides for Spark 
components
+* [Migration Guide](migration-guide.html): migration guides for Spark 
components
 * [Building Spark](building-spark.html): build Spark using the Maven system
 * [Contributing to Spark](https://spark.apache.org/contributing.html)
 * [Third Party Projects](https://spark.apache.org/third-party-projects.html): 
related third party Spark projects


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch branch-3.4 updated: [SPARK-47434][WEBUI] Fix `statistics` link in `StreamingQueryPage`

2024-03-18 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new 7a899e219f5a [SPARK-47434][WEBUI] Fix `statistics` link in 
`StreamingQueryPage`
7a899e219f5a is described below

commit 7a899e219f5a17ab12aeb8d67738025b7e2b9d9c
Author: Huw Campbell 
AuthorDate: Mon Mar 18 07:38:10 2024 -0700

[SPARK-47434][WEBUI] Fix `statistics` link in `StreamingQueryPage`

### What changes were proposed in this pull request?

Like SPARK-24553, this PR aims to fix redirect issues (incorrect 302) when 
one is using proxy settings. Change the generated link to be consistent with 
other links and include a trailing slash

### Why are the changes needed?

When using a proxy, an invalid redirect is issued if this is not included

### Does this PR introduce _any_ user-facing change?

Only that people will be able to use these links if they are using a proxy

### How was this patch tested?

With a proxy installed I went to the location this link would generate and 
could go to the page, when it redirects with the link as it exists.

Edit: Further tested by building a version of our application with this 
patch applied, the links work now.

### Was this patch authored or co-authored using generative AI tooling?

No.

Page with working link
https://github.com/apache/spark/assets/5205457/dbcd1ffc-b7e6-4f84-8ca7-602c41202bf3";>

Goes correctly to
https://github.com/apache/spark/assets/5205457/89111c82-b24a-4b33-895f-9c0131e8acb5";>

Before it would redirect and we'd get a 404.

https://github.com/apache/spark/assets/5205457/1adfeba1-a1f6-4c35-9c39-e077c680baef";>

Closes #45527 from HuwCampbell/patch-1.

Authored-by: Huw Campbell 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 9b466d329c3c75e89b80109755a41c2d271b8acc)
Signed-off-by: Dongjoon Hyun 
---
 .../scala/org/apache/spark/sql/streaming/ui/StreamingQueryPage.scala| 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryPage.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryPage.scala
index 7cd7db4088ac..ce3e7cde01b7 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryPage.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryPage.scala
@@ -174,7 +174,7 @@ private[ui] class StreamingQueryPagedTable(
 
   override def row(query: StructuredStreamingRow): Seq[Node] = {
 val streamingQuery = query.streamingUIData
-val statisticsLink = "%s/%s/statistics?id=%s"
+val statisticsLink = "%s/%s/statistics/?id=%s"
   .format(SparkUIUtils.prependBaseUri(request, parent.basePath), 
parent.prefix,
 streamingQuery.summary.runId)
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch branch-3.5 updated: [SPARK-47434][WEBUI] Fix `statistics` link in `StreamingQueryPage`

2024-03-18 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new bb7a6138b827 [SPARK-47434][WEBUI] Fix `statistics` link in 
`StreamingQueryPage`
bb7a6138b827 is described below

commit bb7a6138b827975fc827813ab42a2b9074bf8d5e
Author: Huw Campbell 
AuthorDate: Mon Mar 18 07:38:10 2024 -0700

[SPARK-47434][WEBUI] Fix `statistics` link in `StreamingQueryPage`

### What changes were proposed in this pull request?

Like SPARK-24553, this PR aims to fix redirect issues (incorrect 302) when 
one is using proxy settings. Change the generated link to be consistent with 
other links and include a trailing slash

### Why are the changes needed?

When using a proxy, an invalid redirect is issued if this is not included

### Does this PR introduce _any_ user-facing change?

Only that people will be able to use these links if they are using a proxy

### How was this patch tested?

With a proxy installed I went to the location this link would generate and 
could go to the page, when it redirects with the link as it exists.

Edit: Further tested by building a version of our application with this 
patch applied, the links work now.

### Was this patch authored or co-authored using generative AI tooling?

No.

Page with working link
https://github.com/apache/spark/assets/5205457/dbcd1ffc-b7e6-4f84-8ca7-602c41202bf3";>

Goes correctly to
https://github.com/apache/spark/assets/5205457/89111c82-b24a-4b33-895f-9c0131e8acb5";>

Before it would redirect and we'd get a 404.

https://github.com/apache/spark/assets/5205457/1adfeba1-a1f6-4c35-9c39-e077c680baef";>

Closes #45527 from HuwCampbell/patch-1.

Authored-by: Huw Campbell 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 9b466d329c3c75e89b80109755a41c2d271b8acc)
Signed-off-by: Dongjoon Hyun 
---
 .../scala/org/apache/spark/sql/streaming/ui/StreamingQueryPage.scala| 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryPage.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryPage.scala
index 7cd7db4088ac..ce3e7cde01b7 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryPage.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryPage.scala
@@ -174,7 +174,7 @@ private[ui] class StreamingQueryPagedTable(
 
   override def row(query: StructuredStreamingRow): Seq[Node] = {
 val streamingQuery = query.streamingUIData
-val statisticsLink = "%s/%s/statistics?id=%s"
+val statisticsLink = "%s/%s/statistics/?id=%s"
   .format(SparkUIUtils.prependBaseUri(request, parent.basePath), 
parent.prefix,
 streamingQuery.summary.runId)
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated (d3f12df6e09e -> 9b466d329c3c)

2024-03-18 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from d3f12df6e09e [SPARK-47437][PYTHON][CONNECT] Correct the error class 
for `DataFrame.sort*`
 add 9b466d329c3c [SPARK-47434][WEBUI] Fix `statistics` link in 
`StreamingQueryPage`

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/sql/streaming/ui/StreamingQueryPage.scala| 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch branch-3.4 updated (be0e44e59b3e -> b4e2c6750cb3)

2024-03-17 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


from be0e44e59b3e [SPARK-45141][PYTHON][INFRA][TESTS] Pin `pyarrow==12.0.1` 
in CI
 add b4e2c6750cb3 [SPARK-47433][PYTHON][DOCS][INFRA][3.4] Update PySpark 
package dependency with version ranges

No new revisions were added by this update.

Summary of changes:
 dev/requirements.txt   |  2 +-
 python/docs/source/getting_started/install.rst | 16 
 2 files changed, 9 insertions(+), 9 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch branch-3.5 updated: [SPARK-47432][PYTHON][CONNECT][DOCS][3.5] Add `pyarrow` upper bound requirement, `<13.0.0`

2024-03-17 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new cc6912ec612c [SPARK-47432][PYTHON][CONNECT][DOCS][3.5] Add `pyarrow` 
upper bound requirement, `<13.0.0`
cc6912ec612c is described below

commit cc6912ec612c30e46e1595860a5519bb1caa221b
Author: Dongjoon Hyun 
AuthorDate: Sun Mar 17 15:15:50 2024 -0700

[SPARK-47432][PYTHON][CONNECT][DOCS][3.5] Add `pyarrow` upper bound 
requirement, `<13.0.0`

### What changes were proposed in this pull request?

This PR aims to add `pyarrow` upper bound requirement, `<13.0.0`, to Apache 
Spark 3.5.x.

### Why are the changes needed?

PyArrow 13.0.0 has breaking changes mentioned by #42920 which is a part of 
Apache Spark 4.0.0.

### Does this PR introduce _any_ user-facing change?

No, this only clarifies the upper bound.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45553 from dongjoon-hyun/SPARK-47432.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 dev/requirements.txt   | 2 +-
 python/docs/source/getting_started/install.rst | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/dev/requirements.txt b/dev/requirements.txt
index 597417aba1f3..0749af75aa4b 100644
--- a/dev/requirements.txt
+++ b/dev/requirements.txt
@@ -3,7 +3,7 @@ py4j
 
 # PySpark dependencies (optional)
 numpy
-pyarrow
+pyarrow<13.0.0
 pandas
 scipy
 plotly
diff --git a/python/docs/source/getting_started/install.rst 
b/python/docs/source/getting_started/install.rst
index 6822285e9617..e97632a8b384 100644
--- a/python/docs/source/getting_started/install.rst
+++ b/python/docs/source/getting_started/install.rst
@@ -157,7 +157,7 @@ PackageSupported version Note
 == = 
==
 `py4j` >=0.10.9.7Required
 `pandas`   >=1.0.5   Required for pandas API 
on Spark and Spark Connect; Optional for Spark SQL
-`pyarrow`  >=4.0.0   Required for pandas API 
on Spark and Spark Connect; Optional for Spark SQL
+`pyarrow`  >=4.0.0,<13.0.0   Required for pandas API 
on Spark and Spark Connect; Optional for Spark SQL
 `numpy`>=1.15Required for pandas API 
on Spark and MLLib DataFrame-based API; Optional for Spark SQL
 `grpcio`   >=1.48,<1.57  Required for Spark Connect
 `grpcio-status`>=1.48,<1.57  Required for Spark Connect


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch branch-3.4 updated: [SPARK-45141][PYTHON][INFRA][TESTS] Pin `pyarrow==12.0.1` in CI

2024-03-17 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new be0e44e59b3e [SPARK-45141][PYTHON][INFRA][TESTS] Pin `pyarrow==12.0.1` 
in CI
be0e44e59b3e is described below

commit be0e44e59b3e71cb11353e11f19146e0d1827432
Author: Ruifeng Zheng 
AuthorDate: Wed Sep 13 15:51:27 2023 +0800

[SPARK-45141][PYTHON][INFRA][TESTS] Pin `pyarrow==12.0.1` in CI

Pin `pyarrow==12.0.1` in CI

to fix test failure,  
https://github.com/apache/spark/actions/runs/6167186123/job/16738683632

```
==
FAIL [0.095s]: test_from_to_pandas 
(pyspark.pandas.tests.data_type_ops.test_datetime_ops.DatetimeOpsTests)
--
Traceback (most recent call last):
  File "/__w/spark/spark/python/pyspark/testing/pandasutils.py", line 122, 
in _assert_pandas_equal
assert_series_equal(
  File 
"/usr/local/lib/python3.9/dist-packages/pandas/_testing/asserters.py", line 
931, in assert_series_equal
assert_attr_equal("dtype", left, right, obj=f"Attributes of {obj}")
  File 
"/usr/local/lib/python3.9/dist-packages/pandas/_testing/asserters.py", line 
415, in assert_attr_equal
raise_assert_detail(obj, msg, left_attr, right_attr)
  File 
"/usr/local/lib/python3.9/dist-packages/pandas/_testing/asserters.py", line 
599, in raise_assert_detail
raise AssertionError(msg)
AssertionError: Attributes of Series are different

Attribute "dtype" are different
[left]:  datetime64[ns]
[right]: datetime64[us]
```

No

CI and manually test

No

Closes #42897 from zhengruifeng/pin_pyarrow.

Authored-by: Ruifeng Zheng 
Signed-off-by: Ruifeng Zheng 
(cherry picked from commit e3d2dfa8b514f9358823c3cb1ad6523da8a6646b)
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 8049a203b8c5f2f8045701916e66cfc786e16b57)
Signed-off-by: Dongjoon Hyun 
---
 .github/workflows/build_and_test.yml | 4 ++--
 dev/infra/Dockerfile | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/.github/workflows/build_and_test.yml 
b/.github/workflows/build_and_test.yml
index 33747fb5b61d..2184577d5c44 100644
--- a/.github/workflows/build_and_test.yml
+++ b/.github/workflows/build_and_test.yml
@@ -252,7 +252,7 @@ jobs:
 - name: Install Python packages (Python 3.8)
   if: (contains(matrix.modules, 'sql') && !contains(matrix.modules, 
'sql-'))
   run: |
-python3.8 -m pip install 'numpy>=1.20.0' pyarrow pandas scipy 
unittest-xml-reporting 'grpcio==1.48.1' 'protobuf==3.19.5'
+python3.8 -m pip install 'numpy>=1.20.0' 'pyarrow==12.0.1' pandas 
scipy unittest-xml-reporting 'grpcio==1.48.1' 'protobuf==3.19.5'
 python3.8 -m pip list
 # Run the tests.
 - name: Run tests
@@ -626,7 +626,7 @@ jobs:
 #   See also https://issues.apache.org/jira/browse/SPARK-38279.
 python3.9 -m pip install 'sphinx<3.1.0' mkdocs pydata_sphinx_theme 
'sphinx-copybutton==0.5.2' nbsphinx numpydoc 'jinja2<3.0.0' 'markupsafe==2.0.1' 
'pyzmq<24.0.0' 'sphinxcontrib-applehelp==1.0.4' 'sphinxcontrib-devhelp==1.0.2' 
'sphinxcontrib-htmlhelp==2.0.1' 'sphinxcontrib-qthelp==1.0.3' 
'sphinxcontrib-serializinghtml==1.1.5' 'nest-asyncio==1.5.8' 'rpds-py==0.16.2' 
'alabaster==0.7.13'
 python3.9 -m pip install ipython_genutils # See SPARK-38517
-python3.9 -m pip install sphinx_plotly_directive 'numpy>=1.20.0' 
pyarrow pandas 'plotly>=4.8'
+python3.9 -m pip install sphinx_plotly_directive 'numpy>=1.20.0' 
'pyarrow==12.0.1' pandas 'plotly>=4.8'
 python3.9 -m pip install 'docutils<0.18.0' # See SPARK-39421
 apt-get update -y
 apt-get install -y ruby ruby-dev
diff --git a/dev/infra/Dockerfile b/dev/infra/Dockerfile
index 2e78f4af2144..93d8793826ff 100644
--- a/dev/infra/Dockerfile
+++ b/dev/infra/Dockerfile
@@ -65,7 +65,7 @@ RUN Rscript -e "devtools::install_version('roxygen2', 
version='7.2.0', repos='ht
 ENV R_LIBS_SITE 
"/usr/local/lib/R/site-library:${R_LIBS_SITE}:/usr/lib/R/library"
 
 RUN pypy3 -m pip install numpy 'pandas<=1.5.3' scipy coverage matplotlib
-RUN python3.9 -m pip install numpy pyarrow 'pandas<=1.5.3' scipy 
unittest-xml-reporting plotl

(spark) branch branch-3.5 updated: [SPARK-45141][PYTHON][INFRA][TESTS] Pin `pyarrow==12.0.1` in CI

2024-03-17 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 8049a203b8c5 [SPARK-45141][PYTHON][INFRA][TESTS] Pin `pyarrow==12.0.1` 
in CI
8049a203b8c5 is described below

commit 8049a203b8c5f2f8045701916e66cfc786e16b57
Author: Ruifeng Zheng 
AuthorDate: Wed Sep 13 15:51:27 2023 +0800

[SPARK-45141][PYTHON][INFRA][TESTS] Pin `pyarrow==12.0.1` in CI

### What changes were proposed in this pull request?
Pin `pyarrow==12.0.1` in CI

### Why are the changes needed?
to fix test failure,  
https://github.com/apache/spark/actions/runs/6167186123/job/16738683632

```
==
FAIL [0.095s]: test_from_to_pandas 
(pyspark.pandas.tests.data_type_ops.test_datetime_ops.DatetimeOpsTests)
--
Traceback (most recent call last):
  File "/__w/spark/spark/python/pyspark/testing/pandasutils.py", line 122, 
in _assert_pandas_equal
assert_series_equal(
  File 
"/usr/local/lib/python3.9/dist-packages/pandas/_testing/asserters.py", line 
931, in assert_series_equal
assert_attr_equal("dtype", left, right, obj=f"Attributes of {obj}")
  File 
"/usr/local/lib/python3.9/dist-packages/pandas/_testing/asserters.py", line 
415, in assert_attr_equal
raise_assert_detail(obj, msg, left_attr, right_attr)
  File 
"/usr/local/lib/python3.9/dist-packages/pandas/_testing/asserters.py", line 
599, in raise_assert_detail
raise AssertionError(msg)
AssertionError: Attributes of Series are different

Attribute "dtype" are different
[left]:  datetime64[ns]
[right]: datetime64[us]
```

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI and manually test

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #42897 from zhengruifeng/pin_pyarrow.

Authored-by: Ruifeng Zheng 
Signed-off-by: Ruifeng Zheng 
(cherry picked from commit e3d2dfa8b514f9358823c3cb1ad6523da8a6646b)
Signed-off-by: Dongjoon Hyun 
---
 .github/workflows/build_and_test.yml | 4 ++--
 dev/infra/Dockerfile | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/.github/workflows/build_and_test.yml 
b/.github/workflows/build_and_test.yml
index b0760a955342..8488540b415d 100644
--- a/.github/workflows/build_and_test.yml
+++ b/.github/workflows/build_and_test.yml
@@ -258,7 +258,7 @@ jobs:
 - name: Install Python packages (Python 3.8)
   if: (contains(matrix.modules, 'sql') && !contains(matrix.modules, 
'sql-'))
   run: |
-python3.8 -m pip install 'numpy>=1.20.0' pyarrow pandas scipy 
unittest-xml-reporting 'grpcio==1.56.0' 'protobuf==3.20.3'
+python3.8 -m pip install 'numpy>=1.20.0' 'pyarrow==12.0.1' pandas 
scipy unittest-xml-reporting 'grpcio==1.56.0' 'protobuf==3.20.3'
 python3.8 -m pip list
 # Run the tests.
 - name: Run tests
@@ -684,7 +684,7 @@ jobs:
 #   See also https://issues.apache.org/jira/browse/SPARK-38279.
 python3.9 -m pip install 'sphinx<3.1.0' mkdocs pydata_sphinx_theme 
'sphinx-copybutton==0.5.2' nbsphinx numpydoc 'jinja2<3.0.0' 'markupsafe==2.0.1' 
'pyzmq<24.0.0' 'sphinxcontrib-applehelp==1.0.4' 'sphinxcontrib-devhelp==1.0.2' 
'sphinxcontrib-htmlhelp==2.0.1' 'sphinxcontrib-qthelp==1.0.3' 
'sphinxcontrib-serializinghtml==1.1.5' 'nest-asyncio==1.5.8' 'rpds-py==0.16.2' 
'alabaster==0.7.13'
 python3.9 -m pip install ipython_genutils # See SPARK-38517
-python3.9 -m pip install sphinx_plotly_directive 'numpy>=1.20.0' 
pyarrow pandas 'plotly>=4.8'
+python3.9 -m pip install sphinx_plotly_directive 'numpy>=1.20.0' 
'pyarrow==12.0.1' pandas 'plotly>=4.8'
 python3.9 -m pip install 'docutils<0.18.0' # See SPARK-39421
 apt-get update -y
 apt-get install -y ruby ruby-dev
diff --git a/dev/infra/Dockerfile b/dev/infra/Dockerfile
index d3bae836cc63..d3fcd7ab3622 100644
--- a/dev/infra/Dockerfile
+++ b/dev/infra/Dockerfile
@@ -65,7 +65,7 @@ RUN Rscript -e "devtools::install_version('roxygen2', 
version='7.2.0', repos='ht
 ENV R_LIBS_SITE 
"/usr/local/lib/R/site-library:${R_LIBS_SITE}:/usr/lib/R/library"
 
 RUN pypy3 -m pip install numpy 'pandas

(spark) branch master updated: [SPARK-47426][BUILD] Upgrade Guava used by the connect module to `33.1.0-jre`

2024-03-17 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 2dba72100e03 [SPARK-47426][BUILD] Upgrade Guava used by the connect 
module to `33.1.0-jre`
2dba72100e03 is described below

commit 2dba72100e0326f1889ff0be2dc576b1e712ad15
Author: panbingkun 
AuthorDate: Sun Mar 17 13:52:14 2024 -0700

[SPARK-47426][BUILD] Upgrade Guava used by the connect module to 
`33.1.0-jre`

### What changes were proposed in this pull request?
The pr aims to upgrade Guava used by the `connect` module to `33.1.0-jre`.

### Why are the changes needed?
- The new version bring some bug fixes and optimizations as follows:
cache: Fixed a bug that could cause 
https://github.com/google/guava/pull/6851#issuecomment-1931276822.
hash: Optimized Checksum-based hash functions for Java 9+.

- The full release notes:
https://github.com/google/guava/releases/tag/v33.1.0

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #45540 from panbingkun/SPARK-47426.

Authored-by: panbingkun 
Signed-off-by: Dongjoon Hyun 
---
 pom.xml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/pom.xml b/pom.xml
index d67ab1c01273..757d911c1229 100644
--- a/pom.xml
+++ b/pom.xml
@@ -288,7 +288,7 @@
 
true
 
 
-33.0.0-jre
+33.1.0-jre
 1.0.2
 1.62.2
 1.1.3


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark-website) branch asf-site updated: Update the organization in committers.md (#509)

2024-03-16 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/spark-website.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new 3eae7010b9 Update the organization in committers.md (#509)
3eae7010b9 is described below

commit 3eae7010b9f3cc01ceabe5036c0bd8910ccb8c67
Author: Jerry Shao 
AuthorDate: Sat Mar 16 20:53:28 2024 -0700

Update the organization in committers.md (#509)
---
 committers.md| 2 +-
 site/committers.html | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/committers.md b/committers.md
index 58aedb94fd..17530a2411 100644
--- a/committers.md
+++ b/committers.md
@@ -73,7 +73,7 @@ navigation:
 |Josh Rosen|Stripe|
 |Sandy Ryza|Remix|
 |Kousuke Saruta|NTT Data|
-|Saisai Shao|Tencent|
+|Saisai Shao|Datastrato|
 |Prashant Sharma|IBM|
 |Gabor Somogyi|Apple|
 |Ram Sriharsha|Databricks|
diff --git a/site/committers.html b/site/committers.html
index 8a9839aa91..22e2f4c481 100644
--- a/site/committers.html
+++ b/site/committers.html
@@ -403,7 +403,7 @@
 
 
   Saisai Shao
-  Tencent
+  Datastrato
 
 
   Prashant Sharma


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch branch-3.4 updated: [SPARK-47428][BUILD][3.4] Upgrade Jetty to 9.4.54.v20240208

2024-03-15 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new 3c41b1d97e1f [SPARK-47428][BUILD][3.4] Upgrade Jetty to 
9.4.54.v20240208
3c41b1d97e1f is described below

commit 3c41b1d97e1f5ff9f74f9ea72f7ea92dcbca2122
Author: Dongjoon Hyun 
AuthorDate: Fri Mar 15 22:42:17 2024 -0700

[SPARK-47428][BUILD][3.4] Upgrade Jetty to 9.4.54.v20240208

### What changes were proposed in this pull request?

This PR aims to upgrade Jetty to 9.4.54.v20240208 for Apache Spark 3.4.3.

### Why are the changes needed?

To bring the latest bug fixes.
- https://github.com/jetty/jetty.project/releases/tag/jetty-9.4.54.v20240208
- https://github.com/jetty/jetty.project/releases/tag/jetty-9.4.53.v20231009
- https://github.com/jetty/jetty.project/releases/tag/jetty-9.4.52.v20230823
- https://github.com/jetty/jetty.project/releases/tag/jetty-9.4.51.v20230217

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45544 from dongjoon-hyun/SPARK-47428-3.4.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 dev/deps/spark-deps-hadoop-2-hive-2.3 | 2 +-
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 4 ++--
 pom.xml   | 2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-2-hive-2.3 
b/dev/deps/spark-deps-hadoop-2-hive-2.3
index 691c83632b38..a94fbcd0ca77 100644
--- a/dev/deps/spark-deps-hadoop-2-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-2-hive-2.3
@@ -143,7 +143,7 @@ jersey-hk2/2.36//jersey-hk2-2.36.jar
 jersey-server/2.36//jersey-server-2.36.jar
 jetty-sslengine/6.1.26//jetty-sslengine-6.1.26.jar
 jetty-util/6.1.26//jetty-util-6.1.26.jar
-jetty-util/9.4.50.v20221201//jetty-util-9.4.50.v20221201.jar
+jetty-util/9.4.54.v20240208//jetty-util-9.4.54.v20240208.jar
 jetty/6.1.26//jetty-6.1.26.jar
 jline/2.14.6//jline-2.14.6.jar
 joda-time/2.12.2//joda-time-2.12.2.jar
diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index 4d94cb5c699e..99665da7d16a 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -128,8 +128,8 @@ 
jersey-container-servlet/2.36//jersey-container-servlet-2.36.jar
 jersey-hk2/2.36//jersey-hk2-2.36.jar
 jersey-server/2.36//jersey-server-2.36.jar
 jettison/1.1//jettison-1.1.jar
-jetty-util-ajax/9.4.50.v20221201//jetty-util-ajax-9.4.50.v20221201.jar
-jetty-util/9.4.50.v20221201//jetty-util-9.4.50.v20221201.jar
+jetty-util-ajax/9.4.54.v20240208//jetty-util-ajax-9.4.54.v20240208.jar
+jetty-util/9.4.54.v20240208//jetty-util-9.4.54.v20240208.jar
 jline/2.14.6//jline-2.14.6.jar
 joda-time/2.12.2//joda-time-2.12.2.jar
 jodd-core/3.5.2//jodd-core-3.5.2.jar
diff --git a/pom.xml b/pom.xml
index 373d17b76c09..77218d162c41 100644
--- a/pom.xml
+++ b/pom.xml
@@ -143,7 +143,7 @@
 1.12.3
 1.8.6
 shaded-protobuf
-9.4.50.v20221201
+9.4.54.v20240208
 4.0.3
 0.10.0
 2.5.1


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch branch-3.4 updated: [SPARK-45587][INFRA] Skip UNIDOC and MIMA in `build` GitHub Action job

2024-03-15 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new 210e80e8b7ba [SPARK-45587][INFRA] Skip UNIDOC and MIMA in `build` 
GitHub Action job
210e80e8b7ba is described below

commit 210e80e8b7baa5fc1e6462615bc8134a4c90647c
Author: Dongjoon Hyun 
AuthorDate: Tue Oct 17 23:38:56 2023 -0700

[SPARK-45587][INFRA] Skip UNIDOC and MIMA in `build` GitHub Action job

### What changes were proposed in this pull request?

This PR aims to skip `Unidoc` and `MIMA` phases in many general test 
pipelines. `mima` test is moved to `lint` job.

### Why are the changes needed?

By having an independent document generation and mima checking GitHub 
Action job, we can skip them in the following many jobs.


https://github.com/apache/spark/blob/73f9f5296e36541db78ab10c4c01a56fbc17cca8/.github/workflows/build_and_test.yml#L142-L190

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manually check the GitHub action logs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #43422 from dongjoon-hyun/SPARK-45587.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 8c6eeb8ab0180368cc60de8b2dbae7457bee5794)
Signed-off-by: Dongjoon Hyun 
---
 .github/workflows/build_and_test.yml | 4 
 1 file changed, 4 insertions(+)

diff --git a/.github/workflows/build_and_test.yml 
b/.github/workflows/build_and_test.yml
index 13527119e51a..33747fb5b61d 100644
--- a/.github/workflows/build_and_test.yml
+++ b/.github/workflows/build_and_test.yml
@@ -198,6 +198,8 @@ jobs:
   HIVE_PROFILE: ${{ matrix.hive }}
   GITHUB_PREV_SHA: ${{ github.event.before }}
   SPARK_LOCAL_IP: localhost
+  SKIP_UNIDOC: true
+  SKIP_MIMA: true
   SKIP_PACKAGING: true
 steps:
 - name: Checkout Spark repository
@@ -578,6 +580,8 @@ jobs:
   run: ./dev/check-license
 - name: Dependencies test
   run: ./dev/test-dependencies.sh
+- name: MIMA test
+  run: ./dev/mima
 - name: Scala linter
   run: ./dev/lint-scala
 - name: Java linter


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch branch-3.5 updated: [SPARK-45587][INFRA] Skip UNIDOC and MIMA in `build` GitHub Action job

2024-03-15 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 8c6eeb8ab018 [SPARK-45587][INFRA] Skip UNIDOC and MIMA in `build` 
GitHub Action job
8c6eeb8ab018 is described below

commit 8c6eeb8ab0180368cc60de8b2dbae7457bee5794
Author: Dongjoon Hyun 
AuthorDate: Tue Oct 17 23:38:56 2023 -0700

[SPARK-45587][INFRA] Skip UNIDOC and MIMA in `build` GitHub Action job

### What changes were proposed in this pull request?

This PR aims to skip `Unidoc` and `MIMA` phases in many general test 
pipelines. `mima` test is moved to `lint` job.

### Why are the changes needed?

By having an independent document generation and mima checking GitHub 
Action job, we can skip them in the following many jobs.


https://github.com/apache/spark/blob/73f9f5296e36541db78ab10c4c01a56fbc17cca8/.github/workflows/build_and_test.yml#L142-L190

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manually check the GitHub action logs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #43422 from dongjoon-hyun/SPARK-45587.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 .github/workflows/build_and_test.yml | 4 
 1 file changed, 4 insertions(+)

diff --git a/.github/workflows/build_and_test.yml 
b/.github/workflows/build_and_test.yml
index ad8685754b31..b0760a955342 100644
--- a/.github/workflows/build_and_test.yml
+++ b/.github/workflows/build_and_test.yml
@@ -204,6 +204,8 @@ jobs:
   HIVE_PROFILE: ${{ matrix.hive }}
   GITHUB_PREV_SHA: ${{ github.event.before }}
   SPARK_LOCAL_IP: localhost
+  SKIP_UNIDOC: true
+  SKIP_MIMA: true
   SKIP_PACKAGING: true
 steps:
 - name: Checkout Spark repository
@@ -627,6 +629,8 @@ jobs:
   run: ./dev/check-license
 - name: Dependencies test
   run: ./dev/test-dependencies.sh
+- name: MIMA test
+  run: ./dev/mima
 - name: Scala linter
   run: ./dev/lint-scala
 - name: Java linter


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch branch-3.5 updated: [SPARK-47428][BUILD][3.5] Upgrade Jetty to 9.4.54.v20240208

2024-03-15 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new d59425275cdd [SPARK-47428][BUILD][3.5] Upgrade Jetty to 
9.4.54.v20240208
d59425275cdd is described below

commit d59425275cdd0ff678a5bcccef4c7b74fe8170cb
Author: Dongjoon Hyun 
AuthorDate: Fri Mar 15 22:28:45 2024 -0700

[SPARK-47428][BUILD][3.5] Upgrade Jetty to 9.4.54.v20240208

### What changes were proposed in this pull request?

This PR aims to upgrade Jetty to 9.4.54.v20240208

### Why are the changes needed?

To bring the latest bug fixes.
- https://github.com/jetty/jetty.project/releases/tag/jetty-9.4.54.v20240208
- https://github.com/jetty/jetty.project/releases/tag/jetty-9.4.53.v20231009

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45543 from dongjoon-hyun/SPARK-47428.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 4 ++--
 pom.xml   | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index c76702cd0af0..8ecf931bf513 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -130,8 +130,8 @@ 
jersey-container-servlet/2.40//jersey-container-servlet-2.40.jar
 jersey-hk2/2.40//jersey-hk2-2.40.jar
 jersey-server/2.40//jersey-server-2.40.jar
 jettison/1.1//jettison-1.1.jar
-jetty-util-ajax/9.4.52.v20230823//jetty-util-ajax-9.4.52.v20230823.jar
-jetty-util/9.4.52.v20230823//jetty-util-9.4.52.v20230823.jar
+jetty-util-ajax/9.4.54.v20240208//jetty-util-ajax-9.4.54.v20240208.jar
+jetty-util/9.4.54.v20240208//jetty-util-9.4.54.v20240208.jar
 jline/2.14.6//jline-2.14.6.jar
 joda-time/2.12.5//joda-time-2.12.5.jar
 jodd-core/3.5.2//jodd-core-3.5.2.jar
diff --git a/pom.xml b/pom.xml
index 5db3c78e00eb..fb6208777d3f 100644
--- a/pom.xml
+++ b/pom.xml
@@ -143,7 +143,7 @@
 1.13.1
 1.9.2
 shaded-protobuf
-9.4.52.v20230823
+9.4.54.v20240208
 4.0.3
 0.10.0
 

(spark) branch master updated (4437e6e21237 -> 6bf031796c8c)

2024-03-15 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 4437e6e21237 [SPARK-47419][CONNECT] Move `log4j2-defaults.properties` 
to `common/utils`
 add 6bf031796c8c [SPARK-44740][CONNECT][TESTS][FOLLOWUP] Deduplicate 
`test_metadata`

No new revisions were added by this update.

Summary of changes:
 python/pyspark/sql/tests/connect/test_connect_session.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated (b7aa9740249b -> 4437e6e21237)

2024-03-15 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from b7aa9740249b [SPARK-47407][SQL] Support java.sql.Types.NULL map to 
NullType
 add 4437e6e21237 [SPARK-47419][CONNECT] Move `log4j2-defaults.properties` 
to `common/utils`

No new revisions were added by this update.

Summary of changes:
 .../utils}/src/main/resources/org/apache/spark/log4j2-defaults.properties | 0
 1 file changed, 0 insertions(+), 0 deletions(-)
 rename {core => 
common/utils}/src/main/resources/org/apache/spark/log4j2-defaults.properties 
(100%)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-47234][BUILD] Upgrade Scala to 2.13.13

2024-03-14 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 56cfc89e8f15 [SPARK-47234][BUILD] Upgrade Scala to 2.13.13
56cfc89e8f15 is described below

commit 56cfc89e8f1599fe859db1bd6628a9b07d53bed4
Author: panbingkun 
AuthorDate: Thu Mar 14 22:40:54 2024 -0700

[SPARK-47234][BUILD] Upgrade Scala to 2.13.13

### What changes were proposed in this pull request?
The pr aims to upgrade scala from `2.13.12` to `2.13.13`.

### Why are the changes needed?
- The new version bring some bug fixes:
  https://github.com/scala/scala/pull/10525
  https://github.com/scala/scala/pull/10528

- The release notes as follows: 
https://github.com/scala/scala/releases/tag/v2.13.13

### Does this PR introduce _any_ user-facing change?
Yes, The `scala` version is changed from `2.13.12` to `2.13.13`.

### How was this patch tested?
- Pass GA.
- After the master is upgraded to this version `2.13.13`, we need to 
continue to observe.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #45342 from panbingkun/SPARK-47234.

Authored-by: panbingkun 
Signed-off-by: Dongjoon Hyun 
---
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 8 
 docs/_config.yml  | 2 +-
 pom.xml   | 4 ++--
 3 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index 2e091cb3638e..d4b7d38aea22 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -139,7 +139,7 @@ jettison/1.5.4//jettison-1.5.4.jar
 jetty-util-ajax/11.0.20//jetty-util-ajax-11.0.20.jar
 jetty-util/11.0.20//jetty-util-11.0.20.jar
 jline/2.14.6//jline-2.14.6.jar
-jline/3.22.0//jline-3.22.0.jar
+jline/3.24.1//jline-3.24.1.jar
 jna/5.13.0//jna-5.13.0.jar
 joda-time/2.12.7//joda-time-2.12.7.jar
 jodd-core/3.5.2//jodd-core-3.5.2.jar
@@ -245,11 +245,11 @@ py4j/0.10.9.7//py4j-0.10.9.7.jar
 remotetea-oncrpc/1.1.2//remotetea-oncrpc-1.1.2.jar
 rocksdbjni/8.11.3//rocksdbjni-8.11.3.jar
 scala-collection-compat_2.13/2.7.0//scala-collection-compat_2.13-2.7.0.jar
-scala-compiler/2.13.12//scala-compiler-2.13.12.jar
-scala-library/2.13.12//scala-library-2.13.12.jar
+scala-compiler/2.13.13//scala-compiler-2.13.13.jar
+scala-library/2.13.13//scala-library-2.13.13.jar
 
scala-parallel-collections_2.13/1.0.4//scala-parallel-collections_2.13-1.0.4.jar
 scala-parser-combinators_2.13/2.3.0//scala-parser-combinators_2.13-2.3.0.jar
-scala-reflect/2.13.12//scala-reflect-2.13.12.jar
+scala-reflect/2.13.13//scala-reflect-2.13.13.jar
 scala-xml_2.13/2.2.0//scala-xml_2.13-2.2.0.jar
 slf4j-api/2.0.12//slf4j-api-2.0.12.jar
 snakeyaml-engine/2.7//snakeyaml-engine-2.7.jar
diff --git a/docs/_config.yml b/docs/_config.yml
index 7a305ceea67b..19183f85df23 100644
--- a/docs/_config.yml
+++ b/docs/_config.yml
@@ -22,7 +22,7 @@ include:
 SPARK_VERSION: 4.0.0-SNAPSHOT
 SPARK_VERSION_SHORT: 4.0.0
 SCALA_BINARY_VERSION: "2.13"
-SCALA_VERSION: "2.13.12"
+SCALA_VERSION: "2.13.13"
 SPARK_ISSUE_TRACKER_URL: https://issues.apache.org/jira/browse/SPARK
 SPARK_GITHUB_URL: https://github.com/apache/spark
 # Before a new release, we should:
diff --git a/pom.xml b/pom.xml
index 6a811e74e7f8..d67ab1c01273 100644
--- a/pom.xml
+++ b/pom.xml
@@ -172,7 +172,7 @@
 
 3.2.2
 4.4
-2.13.12
+2.13.13
 2.13
 2.2.0
 
@@ -226,7 +226,7 @@
 ./python/pyspark/sql/pandas/utils.py, and ./python/setup.py too.
 -->
 15.0.0
-2.5.11
+3.0.0-M1
 
 
 org.fusesource.leveldbjni


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated (213399b61de5 -> fe0aa1edff04)

2024-03-14 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 213399b61de5 [SPARK-47396][SQL] Add a general mapping for TIME WITHOUT 
TIME ZONE to TimestampNTZType
 add fe0aa1edff04 [SPARK-47402][BUILD] Upgrade `ZooKeeper` to 3.9.2

No new revisions were added by this update.

Summary of changes:
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 4 ++--
 pom.xml   | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated (7b4ab4fa452d -> 213399b61de5)

2024-03-14 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 7b4ab4fa452d [SPARK-47387][SQL] Remove some unused error classes
 add 213399b61de5 [SPARK-47396][SQL] Add a general mapping for TIME WITHOUT 
TIME ZONE to TimestampNTZType

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/execution/datasources/jdbc/JdbcUtils.scala   |  1 +
 .../src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala   | 10 ++
 2 files changed, 11 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-47390][SQL][FOLLOWUP] Fix TIME_WITH_TIMEZONE mapping in PostgresDialect

2024-03-14 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new d41d5ecda8c1 [SPARK-47390][SQL][FOLLOWUP] Fix TIME_WITH_TIMEZONE 
mapping in PostgresDialect
d41d5ecda8c1 is described below

commit d41d5ecda8c11d7e8f6a1fafa1d2be97c0f49f04
Author: Kent Yao 
AuthorDate: Thu Mar 14 10:30:48 2024 -0700

[SPARK-47390][SQL][FOLLOWUP] Fix TIME_WITH_TIMEZONE mapping in 
PostgresDialect

### What changes were proposed in this pull request?

This PR fixes a bug in SPARK-47390, we shall separate TIME from TIMESTAMP 
case-match branch

### Why are the changes needed?

bugfix

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

local test with #45519 merged together

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #45522 from yaooqinn/SPARK-47390-F.

Authored-by: Kent Yao 
Signed-off-by: Dongjoon Hyun 
---
 .../src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala  | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala
index 7d8ed70b2bd1..9b286620a140 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala
@@ -58,10 +58,14 @@ private object PostgresDialect extends JdbcDialect with 
SQLConfHelper {
 // See SPARK-34333 and https://github.com/pgjdbc/pgjdbc/issues/100
 Some(StringType)
   case Types.TIMESTAMP
-if "timestamptz".equalsIgnoreCase(typeName) || 
"timetz".equalsIgnoreCase(typeName) =>
+if "timestamptz".equalsIgnoreCase(typeName) =>
 // timestamptz represents timestamp with time zone, currently it maps 
to Types.TIMESTAMP.
 // We need to change to Types.TIMESTAMP_WITH_TIMEZONE if the upstream 
changes.
 Some(TimestampType)
+  case Types.TIME if "timetz".equalsIgnoreCase(typeName) =>
+// timetz represents time with time zone, currently it maps to 
Types.TIME.
+// We need to change to Types.TIME_WITH_TIMEZONE if the upstream 
changes.
+Some(TimestampType)
   case Types.OTHER => Some(StringType)
   case _ if "text".equalsIgnoreCase(typeName) => Some(StringType) // 
sqlType is Types.VARCHAR
   case Types.ARRAY =>


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated (481597cd2d79 -> b98accd9d931)

2024-03-14 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 481597cd2d79 [SPARK-47400][BUILD] Upgrade `gcs-connector` to 2.2.20
 add b98accd9d931 [SPARK-47401][K8S][DOCS] Update `YuniKorn` docs with v1.5

No new revisions were added by this update.

Summary of changes:
 docs/running-on-kubernetes.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-47400][BUILD] Upgrade `gcs-connector` to 2.2.20

2024-03-14 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 481597cd2d79 [SPARK-47400][BUILD] Upgrade `gcs-connector` to 2.2.20
481597cd2d79 is described below

commit 481597cd2d790e168cde113bf13b34fdb471f377
Author: Dongjoon Hyun 
AuthorDate: Thu Mar 14 09:41:03 2024 -0700

[SPARK-47400][BUILD] Upgrade `gcs-connector` to 2.2.20

### What changes were proposed in this pull request?

This PR aims to upgrade `gas-connector` to 2.2.20.

### Why are the changes needed?

To bring the latest updates.
- 
https://github.com/GoogleCloudDataproc/hadoop-connectors/releases/tag/v2.2.20
- Add support for renaming folders using rename backend API for 
Hierarchical namespace buckets
- Upgrade java-storage to 2.32.1 and upgrade the version of related 
dependencies
- 
https://github.com/GoogleCloudDataproc/hadoop-connectors/releases/tag/v2.2.10

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

```
$ dev/make-distribution.sh -Phadoop-cloud
$ cd dist
$ export KEYFILE=~/.ssh/apache-spark.json
$ export EMAIL=$(jq -r '.client_email' < $KEYFILE)
$ export PRIVATE_KEY_ID=$(jq -r '.private_key_id' < $KEYFILE)
$ export PRIVATE_KEY="$(jq -r '.private_key' < $KEYFILE)"
$ bin/spark-shell \
-c spark.hadoop.fs.gs.auth.service.account.email=$EMAIL \
-c 
spark.hadoop.fs.gs.auth.service.account.private.key.id=$PRIVATE_KEY_ID \
-c 
spark.hadoop.fs.gs.auth.service.account.private.key="$PRIVATE_KEY"
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
setLogLevel(newLevel).
Welcome to
    __
 / __/__  ___ _/ /__
_\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 4.0.0-SNAPSHOT
  /_/

Using Scala version 2.13.12 (OpenJDK 64-Bit Server VM, Java 21.0.2)
Type in expressions to have them evaluated.
Type :help for more information.
24/03/14 09:33:41 WARN NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
Spark context Web UI available at http://localhost:4040
Spark context available as 'sc' (master = local[*], app id = 
local-1710434021996).
Spark session available as 'spark'.

scala> spark.read.text("gs://apache-spark-bucket/README.md").count()
val res0: Long = 124

scala> 
spark.read.orc("examples/src/main/resources/users.orc").write.mode("overwrite").orc("gs://apache-spark-bucket/users.orc")

scala> spark.read.orc("gs://apache-spark-bucket/users.orc").show()
+--+--++
|  name|favorite_color|favorite_numbers|
+--+--++
|Alyssa|  NULL|  [3, 9, 15, 20]|
|   Ben|   red|  []|
+--+--+----+
    
scala>
```

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45521 from dongjoon-hyun/SPARK-47400.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +-
 pom.xml   | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index 52d91d938ffb..1f915789e3ea 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -66,7 +66,7 @@ 
dropwizard-metrics-hadoop-metrics2-reporter/0.1.2//dropwizard-metrics-hadoop-met
 eclipse-collections-api/11.1.0//eclipse-collections-api-11.1.0.jar
 eclipse-collections/11.1.0//eclipse-collections-11.1.0.jar
 flatbuffers-java/23.5.26//flatbuffers-java-23.5.26.jar
-gcs-connector/hadoop3-2.2.18/shaded/gcs-connector-hadoop3-2.2.18-shaded.jar
+gcs-connector/hadoop3-2.2.20/shaded/gcs-connector-hadoop3-2.2.20-shaded.jar
 gmetric4j/1.0.10//gmetric4j-1.0.10.jar
 gson/2.2.4//gson-2.2.4.jar
 guava/14.0.1//guava-14.0.1.jar
diff --git a/pom.xml b/pom.xml
index 3f82f6321d5a..ecb0c3891e4e 100644
--- a/pom.xml
+++ b/pom.xml
@@ -163,7 +163,7 @@
 2.20.160
 
 0.12.8
-hadoop3-2.2.18
+hadoop3-2.2.20
 
 4.5.14
 4.4.16


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-47384][BUILD] Upgrade RoaringBitmap to 1.0.5

2024-03-14 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 205e826e7052 [SPARK-47384][BUILD] Upgrade RoaringBitmap to 1.0.5
205e826e7052 is described below

commit 205e826e7052f59f90673d8f1388e727136b5ff7
Author: panbingkun 
AuthorDate: Thu Mar 14 08:48:44 2024 -0700

[SPARK-47384][BUILD] Upgrade RoaringBitmap to 1.0.5

### What changes were proposed in this pull request?
The pr aims to upgrade `RoaringBitmap` from `1.0.1` to `1.0.5`.

### Why are the changes needed?
Release notes: 
https://github.com/RoaringBitmap/RoaringBitmap/releases/tag/1.0.5
This version includes some bugs fixed, eg:
- fix roaringbitmap - batchiterator's advanceIfNeeded to handle run lengths 
of zero by
- fix RangeBitmap#between bug in full section after empty section

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #45507 from panbingkun/SPARK-47384.

Authored-by: panbingkun 
Signed-off-by: Dongjoon Hyun 
---
 core/benchmarks/MapStatusesConvertBenchmark-jdk21-results.txt | 8 
 core/benchmarks/MapStatusesConvertBenchmark-results.txt   | 8 
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +-
 pom.xml   | 2 +-
 4 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/core/benchmarks/MapStatusesConvertBenchmark-jdk21-results.txt 
b/core/benchmarks/MapStatusesConvertBenchmark-jdk21-results.txt
index 4ca3c17fa45e..76c3a2ad6fb9 100644
--- a/core/benchmarks/MapStatusesConvertBenchmark-jdk21-results.txt
+++ b/core/benchmarks/MapStatusesConvertBenchmark-jdk21-results.txt
@@ -2,12 +2,12 @@
 MapStatuses Convert Benchmark
 

 
-OpenJDK 64-Bit Server VM 21.0.1+12-LTS on Linux 5.15.0-1053-azure
+OpenJDK 64-Bit Server VM 21.0.2+13-LTS on Linux 6.5.0-1016-azure
 AMD EPYC 7763 64-Core Processor
 MapStatuses Convert:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
 

-Num Maps: 5 Fetch partitions:500689694 
  5  0.0   688848614.0   1.0X
-Num Maps: 5 Fetch partitions:1000  1511   1517 
  7  0.0  1511337028.0   0.5X
-Num Maps: 5 Fetch partitions:1500  2279   2298 
 20  0.0  2278703144.0   0.3X
+Num Maps: 5 Fetch partitions:500696699 
  3  0.0   695980122.0   1.0X
+Num Maps: 5 Fetch partitions:1000  1593   1615 
 19  0.0  1592993119.0   0.4X
+Num Maps: 5 Fetch partitions:1500  2455   2476 
 22  0.0  2454771901.0   0.3X
 
 
diff --git a/core/benchmarks/MapStatusesConvertBenchmark-results.txt 
b/core/benchmarks/MapStatusesConvertBenchmark-results.txt
index a5cd0cf9b05b..eafd72dbe8b8 100644
--- a/core/benchmarks/MapStatusesConvertBenchmark-results.txt
+++ b/core/benchmarks/MapStatusesConvertBenchmark-results.txt
@@ -2,12 +2,12 @@
 MapStatuses Convert Benchmark
 

 
-OpenJDK 64-Bit Server VM 17.0.9+9-LTS on Linux 5.15.0-1053-azure
+OpenJDK 64-Bit Server VM 17.0.10+7-LTS on Linux 6.5.0-1016-azure
 AMD EPYC 7763 64-Core Processor
 MapStatuses Convert:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
 

-Num Maps: 5 Fetch partitions:500646655 
 13  0.0   645845205.0   1.0X
-Num Maps: 5 Fetch partitions:1000  1175   1195 
 18  0.0  1174727440.0   0.5X
-Num Maps: 5 Fetch partitions:1500  1767   1830 
 55  0.0  1767363076.0   0.4X
+Num Maps: 5 Fetch partitions:500714716 
  2  0.0   713899011.0   1.0X
+Num Maps: 5 Fetch partitions:1000  1602   1647 
 59  0.0  1602358288.0   0.4X
+Num Maps: 5 Fetch partitions:1500  2517   2538 
 22  0.0  2517027078.0   0.3X
 
 
diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index 6b357b3

(spark) branch master updated (5ce150735bc5 -> 168346f93303)

2024-03-14 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 5ce150735bc5 [SPARK-47394][SQL] Support TIMESTAMP WITH TIME ZONE for 
H2Dialect
 add 168346f93303 [SPARK-47391][SQL] Remove the test case workaround for 
JDK 8

No new revisions were added by this update.

Summary of changes:
 .../catalyst/encoders/ExpressionEncoderSuite.scala | 71 --
 1 file changed, 71 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-47394][SQL] Support TIMESTAMP WITH TIME ZONE for H2Dialect

2024-03-14 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 5ce150735bc5 [SPARK-47394][SQL] Support TIMESTAMP WITH TIME ZONE for 
H2Dialect
5ce150735bc5 is described below

commit 5ce150735bc57f482f18fa5a04d16caae0e24041
Author: Kent Yao 
AuthorDate: Thu Mar 14 07:49:41 2024 -0700

[SPARK-47394][SQL] Support TIMESTAMP WITH TIME ZONE for H2Dialect

### What changes were proposed in this pull request?

Following the guidelines of SPARK-47375, this PR supports TIMESTAMP WITH 
TIME ZONE for H2Dialect and maps it to TimestampType regardless of the option 
`preferTimestampNTZ`

https://www.h2database.com/html/datatypes.html#timestamp_with_time_zone_type

### Why are the changes needed?

H2Dialect improvement, we currently don't have a default mapping for 
`java.sql.Types.TIME_WITH_TIMEZONE, TIMESTAMP_WITH_TIMEZONE`

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

new tests

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #45516 from yaooqinn/SPARK-47394.

Authored-by: Kent Yao 
Signed-off-by: Dongjoon Hyun 
---
 .../scala/org/apache/spark/sql/jdbc/H2Dialect.scala|  3 ++-
 .../scala/org/apache/spark/sql/jdbc/JDBCSuite.scala| 18 +++---
 2 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/H2Dialect.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/H2Dialect.scala
index 74eca7e48577..f4a1650b3e8c 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/H2Dialect.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/H2Dialect.scala
@@ -35,7 +35,7 @@ import 
org.apache.spark.sql.connector.catalog.functions.UnboundFunction
 import org.apache.spark.sql.connector.catalog.index.TableIndex
 import org.apache.spark.sql.connector.expressions.{Expression, FieldReference, 
NamedReference}
 import org.apache.spark.sql.execution.datasources.jdbc.{JDBCOptions, JdbcUtils}
-import org.apache.spark.sql.types.{BooleanType, ByteType, DataType, 
DecimalType, MetadataBuilder, ShortType, StringType}
+import org.apache.spark.sql.types.{BooleanType, ByteType, DataType, 
DecimalType, MetadataBuilder, ShortType, StringType, TimestampType}
 
 private[sql] object H2Dialect extends JdbcDialect {
   override def canHandle(url: String): Boolean =
@@ -68,6 +68,7 @@ private[sql] object H2Dialect extends JdbcDialect {
 val scale = if (null != md) md.build().getLong("scale") else 0L
 val selectedScale = (DecimalType.MAX_PRECISION * (scale.toDouble / 
size.toDouble)).toInt
 Option(DecimalType(DecimalType.MAX_PRECISION, selectedScale))
+  case Types.TIMESTAMP_WITH_TIMEZONE | Types.TIME_WITH_TIMEZONE => 
Some(TimestampType)
   case _ => None
 }
   }
diff --git a/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala
index b8ca70e0b175..8f286eaa2c54 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala
@@ -1467,13 +1467,6 @@ class JDBCSuite extends QueryTest with 
SharedSparkSession {
   }
 
   test("unsupported types") {
-checkError(
-  exception = intercept[SparkSQLException] {
-spark.read.jdbc(urlWithUserAndPass, "TEST.TIMEZONE", new 
Properties()).collect()
-  },
-  errorClass = "UNRECOGNIZED_SQL_TYPE",
-  parameters =
-Map("typeName" -> "TIMESTAMP WITH TIME ZONE", "jdbcType" -> 
"TIMESTAMP_WITH_TIMEZONE"))
 checkError(
   exception = intercept[SparkSQLException] {
 spark.read.jdbc(urlWithUserAndPass, "TEST.ARRAY_TABLE", new 
Properties()).collect()
@@ -1482,6 +1475,17 @@ class JDBCSuite extends QueryTest with 
SharedSparkSession {
   parameters = Map("typeName" -> "INTEGER ARRAY", "jdbcType" -> "ARRAY"))
   }
 
+
+  test("SPARK-47394: Convert TIMESTAMP WITH TIME ZONE to TimestampType") {
+Seq(true, false).foreach { prefer =>
+  val df = spark.read
+.option("preferTimestampNTZ", prefer)
+.jdbc(urlWithUserAndPass, "TEST.TIMEZONE", new Properties())
+  val expected = sql("select timestamp'1999-01-08 04:05:06.543544-08:00'")
+  checkAnswer(df, expected)
+}
+  }
+
   test("SPARK-19318: Connection properties keys should be case-sensitive.") {
 def testJdbcOptions(options: JDBCOptions): Unit = {
   // Spark JDBC data source options are case-insensitive


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-47378][PROTOBUF][TESTS] Make the related Protobuf UT run well in IDE

2024-03-13 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new b75325ccefa6 [SPARK-47378][PROTOBUF][TESTS] Make the related Protobuf 
UT run well in IDE
b75325ccefa6 is described below

commit b75325ccefa67b0c2daee317264808c67d76854f
Author: panbingkun 
AuthorDate: Wed Mar 13 09:56:13 2024 -0700

[SPARK-47378][PROTOBUF][TESTS] Make the related Protobuf UT run well in IDE

### What changes were proposed in this pull request?
The pr aims to make the related Protobuf `UT` run well in IDE (IntelliJ 
IDEA).

### Why are the changes needed?
Facilitate developers to debug the related Protobuf `UT`.

Before:
https://github.com/apache/spark/assets/15246973/c00781b2-3477-4b2c-b871-ead997fda697";>

After:
https://github.com/apache/spark/assets/15246973/665fc67d-c69e-45c7-b37d-bb4ef8e72930";>

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
- Manually test.
- Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #45498 from panbingkun/SPARK-47378.

Authored-by: panbingkun 
    Signed-off-by: Dongjoon Hyun 
---
 .../test/scala/org/apache/spark/sql/protobuf/ProtobufTestBase.scala  | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git 
a/connector/protobuf/src/test/scala/org/apache/spark/sql/protobuf/ProtobufTestBase.scala
 
b/connector/protobuf/src/test/scala/org/apache/spark/sql/protobuf/ProtobufTestBase.scala
index e3add49f2b80..b53ba947216a 100644
--- 
a/connector/protobuf/src/test/scala/org/apache/spark/sql/protobuf/ProtobufTestBase.scala
+++ 
b/connector/protobuf/src/test/scala/org/apache/spark/sql/protobuf/ProtobufTestBase.scala
@@ -28,6 +28,9 @@ import org.apache.spark.sql.types.{DataType, StructType}
 
 trait ProtobufTestBase extends SQLTestUtils {
 
+  private val descriptorDir = getWorkspaceFilePath(
+"connector", "protobuf", "target", "generated-test-sources")
+
   /**
* Returns path for a Protobuf descriptor file used in the tests. These 
files are generated
* during the build. Maven and SBT create the descriptor files differently. 
Maven creates one
@@ -35,7 +38,7 @@ trait ProtobufTestBase extends SQLTestUtils {
* all the Protobuf files. As a result actual file path returned in each 
case is different.
*/
   protected def protobufDescriptorFile(fileName: String): String = {
-val dir = "target/generated-test-sources"
+val dir = descriptorDir.toFile.getCanonicalPath
 if (new File(s"$dir/$fileName").exists) {
   s"$dir/$fileName"
 } else { // sbt test


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-47373][SQL] Match FileSourceScanLike to get metadata instead of FileSourceScanExec

2024-03-13 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 7d0f0832ae1a [SPARK-47373][SQL] Match FileSourceScanLike to get 
metadata instead of FileSourceScanExec
7d0f0832ae1a is described below

commit 7d0f0832ae1a222bd9c2492587b37fc1939a51e5
Author: zwangsheng 
AuthorDate: Wed Mar 13 00:28:26 2024 -0700

[SPARK-47373][SQL] Match FileSourceScanLike to get metadata instead of 
FileSourceScanExec

### What changes were proposed in this pull request?

When get Spark Plan info, we should match basic trait `FileSourceScanLike` 
to get metadata instead of matching
subclass `FileSourceScanExec`.

So that user-define file scan operators(which extend `FileSourceScanLike`) 
can be matched.

### Why are the changes needed?

Match user-define file scan operators.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Exists Unit Test

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #45491 from zwangsheng/SPARK-47373.

Authored-by: zwangsheng 
Signed-off-by: Dongjoon Hyun 
---
 .../src/main/scala/org/apache/spark/sql/execution/SparkPlanInfo.scala   | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlanInfo.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlanInfo.scala
index 9b699801c97a..7c45b02ee846 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlanInfo.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlanInfo.scala
@@ -66,7 +66,7 @@ private[execution] object SparkPlanInfo {
 
 // dump the file scan metadata (e.g file path) to event log
 val metadata = plan match {
-  case fileScan: FileSourceScanExec => fileScan.metadata
+  case fileScan: FileSourceScanLike => fileScan.metadata
   case _ => Map[String, String]()
 }
 new SparkPlanInfo(


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-47349][SQL][TESTS] Refactor string function `startsWith` and `endsWith` tests

2024-03-12 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 806f8e0466b9 [SPARK-47349][SQL][TESTS] Refactor string function 
`startsWith` and `endsWith` tests
806f8e0466b9 is described below

commit 806f8e0466b968d3fe87c7bbe3326bdf5458677a
Author: Stevo Mitric 
AuthorDate: Tue Mar 12 16:54:55 2024 -0700

[SPARK-47349][SQL][TESTS] Refactor string function `startsWith` and 
`endsWith` tests

### What changes were proposed in this pull request?
Refactored tests inside `CollationSuite` by migrating `startsWith` and 
`endsWith` tests into new `UTF8StringWithCollationSuite` suite that does unit 
string-level tests. Changes originally proposed in [this 
PR](https://github.com/apache/spark/pull/45421#discussion_r1519451854).

### Why are the changes needed?
Removes cluttering of `CollationSuite`.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Test suite proposed in this PR

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #45477 from stevomitric/stevomitric/string-function-tests.

Authored-by: Stevo Mitric 
Signed-off-by: Dongjoon Hyun 
---
 .../unsafe/types/UTF8StringWithCollationSuite.java | 103 +
 .../org/apache/spark/sql/CollationSuite.scala  |  60 +---
 2 files changed, 105 insertions(+), 58 deletions(-)

diff --git 
a/common/unsafe/src/test/java/org/apache/spark/unsafe/types/UTF8StringWithCollationSuite.java
 
b/common/unsafe/src/test/java/org/apache/spark/unsafe/types/UTF8StringWithCollationSuite.java
new file mode 100644
index ..b60da7b945a4
--- /dev/null
+++ 
b/common/unsafe/src/test/java/org/apache/spark/unsafe/types/UTF8StringWithCollationSuite.java
@@ -0,0 +1,103 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.unsafe.types;
+
+import org.apache.spark.SparkException;
+import org.apache.spark.sql.catalyst.util.CollationFactory;
+import org.junit.jupiter.api.Test;
+
+import static org.junit.jupiter.api.Assertions.*;
+
+
+public class UTF8StringWithCollationSuite {
+
+  private void assertStartsWith(String pattern, String prefix, String 
collationName, boolean value)
+  throws SparkException {
+
assertEquals(UTF8String.fromString(pattern).startsWith(UTF8String.fromString(prefix),
+CollationFactory.collationNameToId(collationName)), value);
+  }
+
+  private void assertEndsWith(String pattern, String suffix, String 
collationName, boolean value)
+  throws SparkException {
+
assertEquals(UTF8String.fromString(pattern).endsWith(UTF8String.fromString(suffix),
+CollationFactory.collationNameToId(collationName)), value);
+  }
+
+  @Test
+  public void startsWithTest() throws SparkException {
+assertStartsWith("", "", "UTF8_BINARY", true);
+assertStartsWith("c", "", "UTF8_BINARY", true);
+assertStartsWith("", "c", "UTF8_BINARY", false);
+assertStartsWith("abcde", "a", "UTF8_BINARY", true);
+assertStartsWith("abcde", "A", "UTF8_BINARY", false);
+assertStartsWith("abcde", "bcd", "UTF8_BINARY", false);
+assertStartsWith("abcde", "BCD", "UTF8_BINARY", false);
+assertStartsWith("", "", "UNICODE", true);
+assertStartsWith("c", "", "UNICODE", true);
+assertStartsWith("", "c", "UNICODE", false);
+assertStartsWith("abcde", "a", "UNICODE", true);
+assertStartsWith("abcde", "A", "UNICODE", false);
+assertStartsWith("abcde", "bcd", "UNICODE", false);
+assertStartsWith("abcde", "BCD", "UNICODE", false);
+assertStartsWi

(spark) branch master updated: [SPARK-47364][CORE] Make `PluginEndpoint` warn when plugins reply for one-way message

2024-03-12 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 8fcef1657a02 [SPARK-47364][CORE] Make `PluginEndpoint` warn when 
plugins reply for one-way message
8fcef1657a02 is described below

commit 8fcef1657a02189f91d5485eabb5b165706cdce9
Author: Dongjoon Hyun 
AuthorDate: Tue Mar 12 12:44:01 2024 -0700

[SPARK-47364][CORE] Make `PluginEndpoint` warn when plugins reply for 
one-way message

### What changes were proposed in this pull request?

This PR aims to make `PluginEndpoint` warn when plugins reply for one-way 
message.

### Why are the changes needed?

Previously, it logs `INFO` level messages. Sometimes, it look 66% driver 
INFO logs. We had better increase the log level to make the users fix the 
issues.

### Does this PR introduce _any_ user-facing change?

No. Only a log level.

### How was this patch tested?

Manually.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45478 from dongjoon-hyun/SPARK-47364.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 .../main/scala/org/apache/spark/internal/plugin/PluginEndpoint.scala| 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/core/src/main/scala/org/apache/spark/internal/plugin/PluginEndpoint.scala 
b/core/src/main/scala/org/apache/spark/internal/plugin/PluginEndpoint.scala
index 989ef8f2edf2..bc45aefa560e 100644
--- a/core/src/main/scala/org/apache/spark/internal/plugin/PluginEndpoint.scala
+++ b/core/src/main/scala/org/apache/spark/internal/plugin/PluginEndpoint.scala
@@ -35,7 +35,7 @@ private class PluginEndpoint(
   try {
 val reply = plugin.receive(message)
 if (reply != null) {
-  logInfo(
+  logWarning(
 s"Plugin $pluginName returned reply for one-way message of 
type " +
 s"${message.getClass().getName()}.")
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-46919][BUILD][CONNECT] Upgrade `grpcio*` and `grpc-java` to 1.62.x

2024-03-12 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 6e5d1db9058d [SPARK-46919][BUILD][CONNECT] Upgrade `grpcio*` and 
`grpc-java` to 1.62.x
6e5d1db9058d is described below

commit 6e5d1db9058de62a45f35d3f41e028a72f688b70
Author: yangjie01 
AuthorDate: Tue Mar 12 08:00:37 2024 -0700

[SPARK-46919][BUILD][CONNECT] Upgrade `grpcio*` and `grpc-java` to 1.62.x

### What changes were proposed in this pull request?
This PR aims to upgrade `grpcio*` from 1.59.3 to 
[1.62.0](https://pypi.org/project/grpcio/1.62.0/)and `grpc-java` from 1.59.0 to 
1.62.2 for Apache Spark 4.0.0.

### Why are the changes needed?
grpc 1.60.0 start to support dualstack IPv4 and IPv6 backend support:

- Implemented dualstack IPv4 and IPv6 backend support, as per draft gRFC 
A61. xDS support currently guarded by GRPC_EXPERIMENTAL_XDS_DUALSTACK_ENDPOINTS 
env var.

Note that in `grpc-java` 1.61.0, since the dependency scope of 
`grpc-protobuf` on `grpc-protobuf-lite` has been changed from `compile` to 
`runtime`, we need to manually configure the dependency of the `connect` module 
on `grpc-protobuf-lite` and explicitly exclude the dependency on 
`protobuf-javalite` because `SparkConnectService` uses 
`io.grpc.protobuf.lite.ProtoLiteUtils`

- https://github.com/grpc/grpc-java/pull/10756/files

The relevant release notes are as follows:
- https://github.com/grpc/grpc/releases/tag/v1.60.0
- https://github.com/grpc/grpc/releases/tag/v1.60.1
- https://github.com/grpc/grpc/releases/tag/v1.61.0
- https://github.com/grpc/grpc/releases/tag/v1.61.1
- https://github.com/grpc/grpc/releases/tag/v1.62.0
- https://github.com/grpc/grpc-java/releases/tag/v1.60.0
- https://github.com/grpc/grpc-java/releases/tag/v1.60.1
- https://github.com/grpc/grpc-java/releases/tag/v1.61.0
- https://github.com/grpc/grpc-java/releases/tag/v1.61.1
- https://github.com/grpc/grpc-java/releases/tag/v1.62.2

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GitHub Actions

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #44929 from LuciferYang/grpc-16.

Lead-authored-by: yangjie01 
Co-authored-by: YangJie 
Signed-off-by: Dongjoon Hyun 
---
 .github/workflows/build_and_test.yml   |  4 ++--
 .github/workflows/maven_test.yml   |  2 +-
 connector/connect/common/src/main/buf.gen.yaml |  4 ++--
 connector/connect/server/pom.xml   | 11 +++
 dev/create-release/spark-rm/Dockerfile |  2 +-
 dev/infra/Dockerfile   |  2 +-
 dev/requirements.txt   |  4 ++--
 pom.xml|  2 +-
 project/SparkBuild.scala   |  2 +-
 python/docs/source/getting_started/install.rst |  4 ++--
 python/setup.py|  2 +-
 11 files changed, 25 insertions(+), 14 deletions(-)

diff --git a/.github/workflows/build_and_test.yml 
b/.github/workflows/build_and_test.yml
index 4f2be1c04f98..faa495fe5dfc 100644
--- a/.github/workflows/build_and_test.yml
+++ b/.github/workflows/build_and_test.yml
@@ -252,7 +252,7 @@ jobs:
 - name: Install Python packages (Python 3.9)
   if: (contains(matrix.modules, 'sql') && !contains(matrix.modules, 
'sql-')) || contains(matrix.modules, 'connect')
   run: |
-python3.9 -m pip install 'numpy>=1.20.0' pyarrow pandas scipy 
unittest-xml-reporting 'lxml==4.9.4' 'grpcio==1.59.3' 'grpcio-status==1.59.3' 
'protobuf==4.25.1'
+python3.9 -m pip install 'numpy>=1.20.0' pyarrow pandas scipy 
unittest-xml-reporting 'lxml==4.9.4' 'grpcio==1.62.0' 'grpcio-status==1.62.0' 
'protobuf==4.25.1'
 python3.9 -m pip list
 # Run the tests.
 - name: Run tests
@@ -702,7 +702,7 @@ jobs:
 python3.9 -m pip install 'sphinx==4.5.0' mkdocs 
'pydata_sphinx_theme>=0.13' sphinx-copybutton nbsphinx numpydoc jinja2 
markupsafe 'pyzmq<24.0.0' \
   ipython ipython_genutils sphinx_plotly_directive 'numpy>=1.20.0' 
pyarrow pandas 'plotly>=4.8' 'docutils<0.18.0' \
   'flake8==3.9.0' 'mypy==1.8.0' 'pytest==7.1.3' 
'pytest-mypy-plugins==1.9.3' 'black==23.9.1' \
-  'pandas-stubs==1.2.0.53' 'grpcio==1.59.3' 'grpc-stubs==1.24.11' 
'googleapis-common-protos-stubs==2.2.0' \
+  'pandas-stubs==1.2.0.53' 'grpcio==1.62.0' &#

(spark) branch master updated: [SPARK-47342][SQL] Support TimestampNTZ for DB2 TIMESTAMP WITH TIME ZONE

2024-03-12 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new e8bc176e6fd1 [SPARK-47342][SQL] Support TimestampNTZ for DB2 TIMESTAMP 
WITH TIME ZONE
e8bc176e6fd1 is described below

commit e8bc176e6fd145bab4cde6bf38931a7ad4c7eecd
Author: Kent Yao 
AuthorDate: Tue Mar 12 07:33:24 2024 -0700

[SPARK-47342][SQL] Support TimestampNTZ for DB2 TIMESTAMP WITH TIME ZONE

### What changes were proposed in this pull request?

This PR Supports TimestampNTZ for DB2 TIMESTAMP WITH TIME ZONE when 
`preferTimestampNTZ` option is set to true by users

### Why are the changes needed?

improve DB2 connector

### Does this PR introduce _any_ user-facing change?

yes, preferTimestampNTZ works for DB2 TIMESTAMP WITH TIME ZONE
### How was this patch tested?

new tests

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #45471 from yaooqinn/SPARK-47342.

Authored-by: Kent Yao 
Signed-off-by: Dongjoon Hyun 
---
 .../org/apache/spark/sql/jdbc/DB2IntegrationSuite.scala| 14 ++
 .../spark/sql/execution/datasources/jdbc/JdbcUtils.scala   | 10 --
 .../main/scala/org/apache/spark/sql/jdbc/DB2Dialect.scala  |  2 +-
 .../scala/org/apache/spark/sql/jdbc/JdbcDialects.scala |  7 +++
 .../scala/org/apache/spark/sql/jdbc/PostgresDialect.scala  | 13 +
 .../test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala   |  9 +++--
 6 files changed, 42 insertions(+), 13 deletions(-)

diff --git 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DB2IntegrationSuite.scala
 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DB2IntegrationSuite.scala
index cedb33d491fb..14776047cec4 100644
--- 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DB2IntegrationSuite.scala
+++ 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DB2IntegrationSuite.scala
@@ -19,6 +19,7 @@ package org.apache.spark.sql.jdbc
 
 import java.math.BigDecimal
 import java.sql.{Connection, Date, Timestamp}
+import java.time.LocalDateTime
 import java.util.Properties
 
 import org.scalatest.time.SpanSugar._
@@ -224,4 +225,17 @@ class DB2IntegrationSuite extends 
DockerJDBCIntegrationSuite {
 
 assert(actual === expected)
   }
+
+  test("SPARK-47342:gi Support TimestampNTZ for DB2 TIMESTAMP WITH TIME ZONE") 
{
+// The test only covers TIMESTAMP WITHOUT TIME ZONE so far, we shall 
support
+// TIMESTAMP WITH TIME ZONE but I don't figure it out to mock a TSTZ value.
+withDefaultTimeZone(UTC) {
+  val df = spark.read.format("jdbc")
+.option("url", jdbcUrl)
+.option("preferTimestampNTZ", "true")
+.option("query", "select ts from dates")
+.load()
+  checkAnswer(df, Row(LocalDateTime.of(2009, 2, 13, 23, 31, 30)))
+}
+  }
 }
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
index a7bbb832a839..27c032471b57 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
@@ -212,8 +212,7 @@ object JdbcUtils extends Logging with SQLConfHelper {
 case java.sql.Types.SQLXML => StringType
 case java.sql.Types.STRUCT => StringType
 case java.sql.Types.TIME => TimestampType
-case java.sql.Types.TIMESTAMP if isTimestampNTZ => TimestampNTZType
-case java.sql.Types.TIMESTAMP => TimestampType
+case java.sql.Types.TIMESTAMP => getTimestampType(isTimestampNTZ)
 case java.sql.Types.TINYINT => IntegerType
 case java.sql.Types.VARBINARY => BinaryType
 case java.sql.Types.VARCHAR if conf.charVarcharAsString => StringType
@@ -229,6 +228,13 @@ object JdbcUtils extends Logging with SQLConfHelper {
   throw QueryExecutionErrors.unrecognizedSqlTypeError(jdbcType, typeName)
   }
 
+  /**
+   * Return TimestampNTZType if isTimestampNT; otherwise TimestampType.
+   */
+  def getTimestampType(isTimestampNTZ: Boolean): DataType = {
+if (isTimestampNTZ) TimestampNTZType else TimestampType
+  }
+
   /**
* Returns the schema if the table already exists in the JDBC database.
*/
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/DB2Dialect.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/DB2Dialect.scala
index 62c31b1c4c5d..ff3e74eae205 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/DB2Dialect.scala
+++ b/sql/core/src/main/scala/org/

(spark) branch master updated: [SPARK-47335][BUILD] Upgrade `mvn-scalafmt` to `1.1.1684076452.9f83818` & `scalafmt` to `3.8.0`

2024-03-12 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 72ecba538406 [SPARK-47335][BUILD] Upgrade `mvn-scalafmt` to 
`1.1.1684076452.9f83818` & `scalafmt` to `3.8.0`
72ecba538406 is described below

commit 72ecba5384060720b114037bec70ff4328625889
Author: panbingkun 
AuthorDate: Tue Mar 12 07:32:18 2024 -0700

[SPARK-47335][BUILD] Upgrade `mvn-scalafmt` to `1.1.1684076452.9f83818` & 
`scalafmt` to `3.8.0`

### What changes were proposed in this pull request?
The pr aims to upgrade `mvn-scalafmt` from `1.1.1640084764.9f463a9` to 
`1.1.1684076452.9f83818`.

### Why are the changes needed?
- mvn-scalafmt
  The last `mvn-scalafmt` upgrade occurred 1 year ago, 
https://github.com/apache/spark/pull/37727
  The latest version of `mvn-scalafmt`  release notes: 
https://github.com/SimonJPegg/mvn_scalafmt/releases/tag/2.13-1.1.1684076452.9f83818

- scalafmt
  The latest version of `scalafmt`  release notes: 
https://github.com/scalameta/scalafmt/releases/tag/v3.8.0

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
- Manually test.
```
./build/mvn scalafmt:format -Dscalafmt.skip=false

...
[INFO] 

[INFO] BUILD SUCCESS
[INFO] 

[INFO] Total time:  2.216 s
[INFO] Finished at: 2024-03-10T20:30:11+08:00
[INFO] 


```

```
./dev/scalafmt

...
[INFO] 

[INFO] BUILD SUCCESS
[INFO] 

[INFO] Total time:  01:56 min
[INFO] Finished at: 2024-03-10T20:19:46+08:00
[INFO] 


```
- Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #45452 from panbingkun/SPARK-47335.

Authored-by: panbingkun 
Signed-off-by: Dongjoon Hyun 
---
 dev/.scalafmt.conf | 2 +-
 pom.xml| 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/dev/.scalafmt.conf b/dev/.scalafmt.conf
index b3a43a03651a..6d1ab0243dc5 100644
--- a/dev/.scalafmt.conf
+++ b/dev/.scalafmt.conf
@@ -32,4 +32,4 @@ fileOverride {
 runner.dialect = scala213
   }
 }
-version = 3.7.17
+version = 3.8.0
diff --git a/pom.xml b/pom.xml
index 146ded53dd8d..49a951405408 100644
--- a/pom.xml
+++ b/pom.xml
@@ -3564,7 +3564,7 @@
   
 org.antipathy
 mvn-scalafmt_${scala.binary.version}
-1.1.1640084764.9f463a9
+1.1.1684076452.9f83818
 
   ${scalafmt.validateOnly} 
   ${scalafmt.skip}


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated (f97da1638062 -> f40c693ad7fd)

2024-03-11 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from f97da1638062 [SPARK-45245][CONNECT][TESTS][FOLLOW-UP] Remove unneeded 
Matchers trait in the test
 add f40c693ad7fd [SPARK-47339][BUILD] Upgrade checkStyle to `10.14.0`

No new revisions were added by this update.

Summary of changes:
 launcher/src/main/java/org/apache/spark/launcher/SparkAppHandle.java | 2 +-
 pom.xml  | 2 +-
 project/plugins.sbt  | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated (610840e27e2e -> f97da1638062)

2024-03-11 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 610840e27e2e [SPARK-45827][SQL][FOLLOWUP] Fix for collation
 add f97da1638062 [SPARK-45245][CONNECT][TESTS][FOLLOW-UP] Remove unneeded 
Matchers trait in the test

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/api/python/PythonWorkerFactorySuite.scala  | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated (10be03215775 -> 610840e27e2e)

2024-03-11 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 10be03215775 [SPARK-47255][SQL] Assign names to the error classes 
_LEGACY_ERROR_TEMP_323[6-7] and _LEGACY_ERROR_TEMP_324[7-9]
 add 610840e27e2e [SPARK-45827][SQL][FOLLOWUP] Fix for collation

No new revisions were added by this update.

Summary of changes:
 sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala  | 2 +-
 .../sql/execution/datasources/SaveIntoDataSourceCommandSuite.scala | 3 ++-
 2 files changed, 3 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-44115][BUILD] Upgrade Apache ORC to 2.0.0

2024-03-08 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 76b1c122cb7d [SPARK-44115][BUILD] Upgrade Apache ORC to 2.0.0
76b1c122cb7d is described below

commit 76b1c122cb7d77e8f175b25b935b9296a669d5d8
Author: Dongjoon Hyun 
AuthorDate: Fri Mar 8 13:31:10 2024 -0800

[SPARK-44115][BUILD] Upgrade Apache ORC to 2.0.0

### What changes were proposed in this pull request?

This PR aims to Upgrade Apache ORC to 2.0.0 for Apache Spark 4.0.0.

Apache ORC community has 3-year support policy which is longer than Apache 
Spark. It's aligned like the following.
- Apache ORC 2.0.x <-> Apache Spark 4.0.x
- Apache ORC 1.9.x <-> Apache Spark 3.5.x
- Apache ORC 1.8.x <-> Apache Spark 3.4.x
- Apache ORC 1.7.x (Supported) <-> Apache Spark 3.3.x (End-Of-Support)

### Why are the changes needed?

**Release Note**
- https://github.com/apache/orc/releases/tag/v2.0.0

**Milestone**
- https://github.com/apache/orc/milestone/20?closed=1
  - https://github.com/apache/orc/pull/1728
  - https://github.com/apache/orc/issues/1801
  - https://github.com/apache/orc/issues/1498
  - https://github.com/apache/orc/pull/1627
  - https://github.com/apache/orc/issues/1497
  - https://github.com/apache/orc/pull/1509
  - https://github.com/apache/orc/pull/1554
  - https://github.com/apache/orc/pull/1708
  - https://github.com/apache/orc/pull/1733
  - https://github.com/apache/orc/pull/1760
  - https://github.com/apache/orc/pull/1743

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

    Closes #45443 from dongjoon-hyun/SPARK-44115.
    
Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 dev/deps/spark-deps-hadoop-3-hive-2.3 |  7 ---
 pom.xml   | 17 -
 sql/core/pom.xml  |  5 +
 3 files changed, 25 insertions(+), 4 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index 7e56e8914435..6b357b3e4b70 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -227,9 +227,10 @@ opencsv/2.3//opencsv-2.3.jar
 opentracing-api/0.33.0//opentracing-api-0.33.0.jar
 opentracing-noop/0.33.0//opentracing-noop-0.33.0.jar
 opentracing-util/0.33.0//opentracing-util-0.33.0.jar
-orc-core/1.9.2/shaded-protobuf/orc-core-1.9.2-shaded-protobuf.jar
-orc-mapreduce/1.9.2/shaded-protobuf/orc-mapreduce-1.9.2-shaded-protobuf.jar
-orc-shims/1.9.2//orc-shims-1.9.2.jar
+orc-core/2.0.0/shaded-protobuf/orc-core-2.0.0-shaded-protobuf.jar
+orc-format/1.0.0/shaded-protobuf/orc-format-1.0.0-shaded-protobuf.jar
+orc-mapreduce/2.0.0/shaded-protobuf/orc-mapreduce-2.0.0-shaded-protobuf.jar
+orc-shims/2.0.0//orc-shims-2.0.0.jar
 oro/2.0.8//oro-2.0.8.jar
 osgi-resource-locator/1.0.3//osgi-resource-locator-1.0.3.jar
 paranamer/2.8//paranamer-2.8.jar
diff --git a/pom.xml b/pom.xml
index 9f1c9ed13f23..404f37be1b5a 100644
--- a/pom.xml
+++ b/pom.xml
@@ -141,7 +141,7 @@
 
 10.16.1.1
 1.13.1
-1.9.2
+2.0.0
 shaded-protobuf
 11.0.20
 5.0.0
@@ -2593,6 +2593,13 @@
 
   
 
+  
+org.apache.orc
+orc-format
+1.0.0
+${orc.classifier}
+${orc.deps.scope}
+  
   
 org.apache.orc
 orc-core
@@ -2600,6 +2607,14 @@
 ${orc.classifier}
 ${orc.deps.scope}
 
+  
+org.apache.orc
+orc-format
+  
+  
+com.aayushatharva.brotli4j
+brotli4j
+  
   
 org.apache.hadoop
 hadoop-common
diff --git a/sql/core/pom.xml b/sql/core/pom.xml
index 0ad9e0f690c7..05f906206e5e 100644
--- a/sql/core/pom.xml
+++ b/sql/core/pom.xml
@@ -93,6 +93,11 @@
   org.scala-lang.modules
   
scala-parallel-collections_${scala.binary.version}
 
+
+  org.apache.orc
+  orc-format
+  ${orc.classifier}
+
 
   org.apache.orc
   orc-core


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated (db0e5c7bc464 -> 35bced42474e)

2024-03-04 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from db0e5c7bc464 [SPARK-47269][BUILD] Upgrade jetty to 11.0.20
 add 35bced42474e [SPARK-47242][BUILD] Bump ap-loader 3.0(v8) to support 
for async-profiler 3.0

No new revisions were added by this update.

Summary of changes:
 connector/profiler/pom.xml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated (6b5917beff30 -> db0e5c7bc464)

2024-03-04 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 6b5917beff30 [SPARK-46961][SS] Using ProcessorContext to store and 
retrieve handle
 add db0e5c7bc464 [SPARK-47269][BUILD] Upgrade jetty to 11.0.20

No new revisions were added by this update.

Summary of changes:
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 4 ++--
 pom.xml   | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated (1cd7bab5c5c2 -> 22f9a5a25304)

2024-02-29 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 1cd7bab5c5c2 [SPARK-47236][CORE] Fix `deleteRecursivelyUsingJavaIO` to 
skip non-existing file input
 add 22f9a5a25304 [SPARK-47235][CORE][TESTS] Disable 
`deleteRecursivelyUsingUnixNative` in Apple Silicon test env

No new revisions were added by this update.

Summary of changes:
 .../utils/src/main/java/org/apache/spark/network/util/JavaUtils.java  | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch branch-3.5 updated: [SPARK-47236][CORE] Fix `deleteRecursivelyUsingJavaIO` to skip non-existing file input

2024-02-29 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 9770016b180b [SPARK-47236][CORE] Fix `deleteRecursivelyUsingJavaIO` to 
skip non-existing file input
9770016b180b is described below

commit 9770016b180b0477060777d3739a2bfaabc6fcb3
Author: Dongjoon Hyun 
AuthorDate: Thu Feb 29 19:08:15 2024 -0800

[SPARK-47236][CORE] Fix `deleteRecursivelyUsingJavaIO` to skip non-existing 
file input

### What changes were proposed in this pull request?

This PR aims to fix `deleteRecursivelyUsingJavaIO` to skip non-existing 
file input.

### Why are the changes needed?

`deleteRecursivelyUsingJavaIO` is a fallback of 
`deleteRecursivelyUsingUnixNative`.
We should have identical capability. Currently, it fails.

```
[info]   java.nio.file.NoSuchFileException: 
/Users/dongjoon/APACHE/spark-merge/target/tmp/spark-e264d853-42c0-44a2-9a30-22049522b04f
[info]   at 
java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:92)
[info]   at 
java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106)
[info]   at 
java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
[info]   at 
java.base/sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55)
[info]   at 
java.base/sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:148)
[info]   at java.base/java.nio.file.Files.readAttributes(Files.java:1851)
[info]   at 
org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:126)
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

This is difficult to test this `private static` Java method. I tested this 
with #45344 .

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45346 from dongjoon-hyun/SPARK-47236.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 1cd7bab5c5c2bd8d595b131c88e6576486dbf123)
Signed-off-by: Dongjoon Hyun 
---
 common/utils/src/main/java/org/apache/spark/network/util/JavaUtils.java | 1 +
 1 file changed, 1 insertion(+)

diff --git 
a/common/utils/src/main/java/org/apache/spark/network/util/JavaUtils.java 
b/common/utils/src/main/java/org/apache/spark/network/util/JavaUtils.java
index bbe764b8366c..d6603dcbee1a 100644
--- a/common/utils/src/main/java/org/apache/spark/network/util/JavaUtils.java
+++ b/common/utils/src/main/java/org/apache/spark/network/util/JavaUtils.java
@@ -120,6 +120,7 @@ public class JavaUtils {
   private static void deleteRecursivelyUsingJavaIO(
   File file,
   FilenameFilter filter) throws IOException {
+if (!file.exists()) return;
 BasicFileAttributes fileAttributes =
   Files.readAttributes(file.toPath(), BasicFileAttributes.class);
 if (fileAttributes.isDirectory() && !isSymlink(file)) {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch branch-3.4 updated: [SPARK-47236][CORE] Fix `deleteRecursivelyUsingJavaIO` to skip non-existing file input

2024-02-29 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new 58a4a49389a5 [SPARK-47236][CORE] Fix `deleteRecursivelyUsingJavaIO` to 
skip non-existing file input
58a4a49389a5 is described below

commit 58a4a49389a5f9979f7dabc5320116a212eb4bdb
Author: Dongjoon Hyun 
AuthorDate: Thu Feb 29 19:08:15 2024 -0800

[SPARK-47236][CORE] Fix `deleteRecursivelyUsingJavaIO` to skip non-existing 
file input

### What changes were proposed in this pull request?

This PR aims to fix `deleteRecursivelyUsingJavaIO` to skip non-existing 
file input.

### Why are the changes needed?

`deleteRecursivelyUsingJavaIO` is a fallback of 
`deleteRecursivelyUsingUnixNative`.
We should have identical capability. Currently, it fails.

```
[info]   java.nio.file.NoSuchFileException: 
/Users/dongjoon/APACHE/spark-merge/target/tmp/spark-e264d853-42c0-44a2-9a30-22049522b04f
[info]   at 
java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:92)
[info]   at 
java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106)
[info]   at 
java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
[info]   at 
java.base/sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55)
[info]   at 
java.base/sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:148)
[info]   at java.base/java.nio.file.Files.readAttributes(Files.java:1851)
[info]   at 
org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:126)
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

This is difficult to test this `private static` Java method. I tested this 
with #45344 .

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45346 from dongjoon-hyun/SPARK-47236.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 1cd7bab5c5c2bd8d595b131c88e6576486dbf123)
Signed-off-by: Dongjoon Hyun 
---
 .../src/main/java/org/apache/spark/network/util/JavaUtils.java   | 1 +
 1 file changed, 1 insertion(+)

diff --git 
a/common/network-common/src/main/java/org/apache/spark/network/util/JavaUtils.java
 
b/common/network-common/src/main/java/org/apache/spark/network/util/JavaUtils.java
index 7e410e9eab22..59744ec5748a 100644
--- 
a/common/network-common/src/main/java/org/apache/spark/network/util/JavaUtils.java
+++ 
b/common/network-common/src/main/java/org/apache/spark/network/util/JavaUtils.java
@@ -124,6 +124,7 @@ public class JavaUtils {
   private static void deleteRecursivelyUsingJavaIO(
   File file,
   FilenameFilter filter) throws IOException {
+if (!file.exists()) return;
 BasicFileAttributes fileAttributes =
   Files.readAttributes(file.toPath(), BasicFileAttributes.class);
 if (fileAttributes.isDirectory() && !isSymlink(file)) {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated (cc0ea60d6eee -> 1cd7bab5c5c2)

2024-02-29 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from cc0ea60d6eee [SPARK-47218][SQL] XML: Ignore commented row tags in XML 
tokenizer
 add 1cd7bab5c5c2 [SPARK-47236][CORE] Fix `deleteRecursivelyUsingJavaIO` to 
skip non-existing file input

No new revisions were added by this update.

Summary of changes:
 common/utils/src/main/java/org/apache/spark/network/util/JavaUtils.java | 1 +
 1 file changed, 1 insertion(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated (70007c59177a -> 9ce43c85a5d2)

2024-02-29 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 70007c59177a [SPARK-47186][DOCKER][FOLLOWUP] Reduce test time for 
docker ITs
 add 9ce43c85a5d2 [SPARK-47229][CORE][SQL][SS][YARN][CONNECT] Change the 
never changed `var` to `val`

No new revisions were added by this update.

Summary of changes:
 .../sql/connect/planner/SparkConnectServiceSuite.scala   |  2 +-
 .../scala/org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala  |  2 +-
 .../spark/executor/CoarseGrainedExecutorBackend.scala|  2 +-
 core/src/test/scala/org/apache/spark/rdd/RDDSuite.scala  |  2 +-
 .../org/apache/spark/resource/ResourceProfileSuite.scala |  2 +-
 .../apache/spark/scheduler/TaskSchedulerImplSuite.scala  |  2 +-
 .../apache/spark/shuffle/ShuffleBlockPusherSuite.scala   |  2 +-
 .../main/scala/org/apache/spark/deploy/yarn/Client.scala |  2 +-
 .../apache/spark/sql/catalyst/parser/AstBuilder.scala|  2 +-
 .../spark/sql/catalyst/analysis/AnalysisErrorSuite.scala |  8 
 .../catalyst/expressions/StringExpressionsSuite.scala| 16 
 .../spark/sql/execution/streaming/state/RocksDB.scala|  2 +-
 .../scala/org/apache/spark/sql/ConfigBehaviorSuite.scala |  2 +-
 .../datasources/parquet/ParquetVectorizedSuite.scala |  2 +-
 .../sql/hive/thriftserver/HiveThriftServer2Suites.scala  |  2 +-
 15 files changed, 25 insertions(+), 25 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated (813934c69df6 -> 70007c59177a)

2024-02-29 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 813934c69df6 [SPARK-47015][SQL] Disable partitioning on collated 
columns
 add 70007c59177a [SPARK-47186][DOCKER][FOLLOWUP] Reduce test time for 
docker ITs

No new revisions were added by this update.

Summary of changes:
 .../sql/jdbc/DockerJDBCIntegrationSuite.scala  | 39 --
 1 file changed, 14 insertions(+), 25 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-41392][BUILD][TESTS] Add `bouncy-castle` test dependencies to `sql/core` module for Hadoop 3.4.0

2024-02-29 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 28fd3de0fea0 [SPARK-41392][BUILD][TESTS] Add `bouncy-castle` test 
dependencies to `sql/core` module for Hadoop 3.4.0
28fd3de0fea0 is described below

commit 28fd3de0fea0e952aa1494838d00185613389277
Author: yangjie01 
AuthorDate: Thu Feb 29 07:56:29 2024 -0800

[SPARK-41392][BUILD][TESTS] Add `bouncy-castle` test dependencies to 
`sql/core` module for Hadoop 3.4.0

### What changes were proposed in this pull request?

Adds bouncy-castle jdk18 artifacts to test builds in spark-sql.

Based on #38974
* only applies the test import changes
* dependencies are those of #44359

### Why are the changes needed?

Forthcoming Hadoop 3.4.0 release doesn't export the bouncy-castle
JARs; maven builds fail.

### Does this PR introduce _any_ user-facing change?

No: test time dependency declarations only.

### How was this patch tested?

This was done through the release build/test project
https://github.com/apache/hadoop-release-support

1. Latest RC2 artifacts pulled from apache maven staging
2. Spark maven build triggered with the hadoop-version passed down.
3. The 3.3.6 release template worked with spark master (as it should!)
4. With this change the 3.4.0 RC build worked with this change

Note: have not *yet* done a maven test run through this yet

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #45317 from steveloughran/SPARK-41392-HADOOP-3.4.0.

Authored-by: yangjie01 
Signed-off-by: Dongjoon Hyun 
---
 sql/core/pom.xml | 12 
 1 file changed, 12 insertions(+)

diff --git a/sql/core/pom.xml b/sql/core/pom.xml
index 8b1b51352a20..0ad9e0f690c7 100644
--- a/sql/core/pom.xml
+++ b/sql/core/pom.xml
@@ -223,6 +223,18 @@
   htmlunit3-driver
   test
 
+
+
+  org.bouncycastle
+  bcprov-jdk18on
+  test
+
+
+  org.bouncycastle
+  bcpkix-jdk18on
+  test
+
   
   
 
target/scala-${scala.binary.version}/classes


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated (a7825f6e8907 -> 919c19c008b8)

2024-02-29 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from a7825f6e8907 [SPARK-47227][DOCS] Improve documentation for Spark 
Connect
 add 919c19c008b8 [SPARK-47231][CORE][TESTS] FakeTask should reference its 
TaskMetrics to avoid TaskMetrics accumulators being GCed before stage completion

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/scheduler/FakeTask.scala| 18 +-
 1 file changed, 13 insertions(+), 5 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated (944a00db6f83 -> a7825f6e8907)

2024-02-29 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 944a00db6f83 [SPARK-47224][PS][TESTS] Split `test_split_apply_basic` 
and `test_split_apply_adv`
 add a7825f6e8907 [SPARK-47227][DOCS] Improve documentation for Spark 
Connect

No new revisions were added by this update.

Summary of changes:
 docs/spark-connect-overview.md | 20 
 1 file changed, 20 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-47215][CORE][TESTS] Reduce the number of required threads in `MasterSuite`

2024-02-28 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new da2ae29c9cab [SPARK-47215][CORE][TESTS] Reduce the number of required 
threads in `MasterSuite`
da2ae29c9cab is described below

commit da2ae29c9cabe336f95ab3737e97aa8a5bd33ada
Author: Dongjoon Hyun 
AuthorDate: Wed Feb 28 16:21:26 2024 -0800

[SPARK-47215][CORE][TESTS] Reduce the number of required threads in 
`MasterSuite`

### What changes were proposed in this pull request?

This PR aims to reduce the umber of required threads in `MasterSuite` test.

### Why are the changes needed?

- 
https://github.com/apache/spark/actions/workflows/build_maven_java21_macos14.yml
  - https://github.com/apache/spark/actions/runs/8070641575/job/22086547398
```
- SPARK-46881: scheduling with workerSelectionPolicy - CORES_FREE_ASC (true)
- SPARK-46881: scheduling with workerSelectionPolicy - CORES_FREE_ASC 
(false)
Warning: [3943.730s][warning][os,thread] Failed to start thread "Unknown 
thread" - pthread_create failed (EAGAIN) for attributes: stacksize: 4096k, 
guardsize: 16k, detached.
Warning: [3943.730s][warning][os,thread] Failed to start the native thread 
for java.lang.Thread "rpc-server-13566-3"
Warning: [3943.730s][warning][os,thread] Failed to start thread "Unknown 
thread" - pthread_create failed (EAGAIN) for attributes: stacksize: 4096k, 
guardsize: 16k, detached.
Warning: [3943.730s][warning][os,thread] Failed to start the native thread 
for java.lang.Thread "globalEventExecutor-3-961"
*** RUN ABORTED ***
An exception or error caused a run to abort: unable to create native 
thread: possibly out of memory or process/resource limits reached
  java.lang.OutOfMemoryError: unable to create native thread: possibly out 
of memory or process/resource limits reached
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?
    
No.

Closes #45320 from dongjoon-hyun/SPARK-47215.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 .../scala/org/apache/spark/deploy/master/MasterSuite.scala  | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git 
a/core/src/test/scala/org/apache/spark/deploy/master/MasterSuite.scala 
b/core/src/test/scala/org/apache/spark/deploy/master/MasterSuite.scala
index 9992c2020f27..b4981ca3d9c6 100644
--- a/core/src/test/scala/org/apache/spark/deploy/master/MasterSuite.scala
+++ b/core/src/test/scala/org/apache/spark/deploy/master/MasterSuite.scala
@@ -687,9 +687,9 @@ class MasterSuite extends SparkFunSuite
   private val workerSelectionPolicyTestCases = Seq(
 (CORES_FREE_ASC, true, List("10001", "10002")),
 (CORES_FREE_ASC, false, List("10001")),
-(CORES_FREE_DESC, true, List("10004", "10005")),
-(CORES_FREE_DESC, false, List("10005")),
-(MEMORY_FREE_ASC, true, List("10001", "10005")),
+(CORES_FREE_DESC, true, List("10002", "10003")),
+(CORES_FREE_DESC, false, List("10003")),
+(MEMORY_FREE_ASC, true, List("10001", "10003")),
 (MEMORY_FREE_ASC, false, List("10001")),
 (MEMORY_FREE_DESC, true, List("10002", "10003")),
 (MEMORY_FREE_DESC, false, Seq("10002")),
@@ -701,11 +701,14 @@ class MasterSuite extends SparkFunSuite
   val conf = new SparkConf()
 .set(WORKER_SELECTION_POLICY.key, policy.toString)
 .set(SPREAD_OUT_APPS.key, spreadOut.toString)
+.set(UI_ENABLED.key, "false")
+.set(Network.RPC_NETTY_DISPATCHER_NUM_THREADS, 1)
+.set(Network.RPC_IO_THREADS, 1)
   val master = makeAliveMaster(conf)
 
   // Use different core and memory values to simplify the tests
   MockWorker.counter.set(1)
-  (1 to 5).foreach { idx =>
+  (1 to 3).foreach { idx =>
 val worker = new MockWorker(master.self, conf)
 worker.rpcEnv.setupEndpoint(s"worker-$idx", worker)
 val workerReg = RegisterWorker(
@@ -713,7 +716,7 @@ class MasterSuite extends SparkFunSuite
   "localhost",
   worker.self.address.port,
   worker.self,
-  idx * 10,
+  4 + idx,
   10240 * (if (idx < 2) idx else (6 - idx)),
   "http://localhost:8080";,
   RpcAddress("localhost", 1))


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-47207][CORE] Support `spark.driver.timeout` and `DriverTimeoutPlugin`

2024-02-28 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 2a93e46eb062 [SPARK-47207][CORE] Support `spark.driver.timeout` and 
`DriverTimeoutPlugin`
2a93e46eb062 is described below

commit 2a93e46eb0627df9cd288156bffa0a0815906c3c
Author: Dongjoon Hyun 
AuthorDate: Wed Feb 28 09:27:53 2024 -0800

[SPARK-47207][CORE] Support `spark.driver.timeout` and `DriverTimeoutPlugin`

### What changes were proposed in this pull request?

This PR aims to support `spark.driver.timeout` and `DriverTimeoutPlugin`.

### Why are the changes needed?

Sometime, Spark applications fall into abnormal situation and hang.

We had better provide a way to guarantee the termination after pre-defined 
timeout via a standard way.
- spark.plugins=org.apache.spark.deploy.DriverTimeoutPlugin
- spark.driver.timeout=1min

```
$ bin/spark-shell -c 
spark.plugins=org.apache.spark.deploy.DriverTimeoutPlugin -c 
spark.driver.timeout=1min
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
setLogLevel(newLevel).
Welcome to
    __
 / __/__  ___ _/ /__
_\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 4.0.0-SNAPSHOT
  /_/

Using Scala version 2.13.12 (OpenJDK 64-Bit Server VM, Java 17.0.10)
Type in expressions to have them evaluated.
Type :help for more information.
24/02/28 06:53:34 WARN NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
Spark context Web UI available at http://localhost:4040
Spark context available as 'sc' (master = local[*], app id = 
local-1709132014477).
Spark session available as 'spark'.

scala> 24/02/28 06:54:34 WARN DriverTimeoutDriverPlugin: Terminate Driver 
JVM because it runs after 1 minute

$ echo $?
124
```

### Does this PR introduce _any_ user-facing change?

No, this is a new feature and a built-in plugin.

### How was this patch tested?

Manually because this invokes `System.exit`.

1. Timeout with 1 minute
```
$ bin/spark-shell -c 
spark.plugins=org.apache.spark.deploy.DriverTimeoutPlugin -c 
spark.driver.timeout=1min
...
scala> 24/02/28 06:54:34 WARN DriverTimeoutDriverPlugin: Terminate Driver 
JVM because it runs after 1 minute

$ echo $?
124
```

2. `DriverTimeoutPlugin` will be ignored if the default value of 
`spark.driver.timeout` is used.
```
$ bin/spark-shell -c 
spark.plugins=org.apache.spark.deploy.DriverTimeoutPlugin
...
24/02/28 01:02:57 WARN DriverTimeoutDriverPlugin: Disabled with the timeout 
value 0.
...
scala>
```

3. `spark.driver.timeout` will be ignored if `DriverTimeoutPlugin` is not 
provided.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45313 from dongjoon-hyun/SPARK-47207.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 .../apache/spark/deploy/DriverTimeoutPlugin.scala  | 62 ++
 .../org/apache/spark/internal/config/package.scala |  9 
 .../org/apache/spark/util/SparkExitCode.scala  |  3 ++
 docs/configuration.md  | 11 
 4 files changed, 85 insertions(+)

diff --git 
a/core/src/main/scala/org/apache/spark/deploy/DriverTimeoutPlugin.scala 
b/core/src/main/scala/org/apache/spark/deploy/DriverTimeoutPlugin.scala
new file mode 100644
index ..9b141d607572
--- /dev/null
+++ b/core/src/main/scala/org/apache/spark/deploy/DriverTimeoutPlugin.scala
@@ -0,0 +1,62 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy
+
+import java.util.{Map => JMap}
+import java.util.concurrent.{ScheduledExecutorService, TimeUnit}
+
+import scala.jdk.Col

(spark) branch master updated (7e7ba4eaf071 -> ea2587f695cf)

2024-02-28 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 7e7ba4eaf071 [MINOR][SQL] Remove out-of-dated comment in 
`CollectLimitExec`
 add ea2587f695cf [SPARK-47209][BUILD] Upgrade slf4j to 2.0.12

No new revisions were added by this update.

Summary of changes:
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 6 +++---
 pom.xml   | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-47199][PYTHON][TESTS] Add prefix into TemporaryDirectory to avoid flakiness

2024-02-27 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 6eed08cb0c12 [SPARK-47199][PYTHON][TESTS] Add prefix into 
TemporaryDirectory to avoid flakiness
6eed08cb0c12 is described below

commit 6eed08cb0c12c46b9de4665ab130cea1695b9a5b
Author: Hyukjin Kwon 
AuthorDate: Tue Feb 27 23:11:25 2024 -0800

[SPARK-47199][PYTHON][TESTS] Add prefix into TemporaryDirectory to avoid 
flakiness

### What changes were proposed in this pull request?

This PR proposes to set `prefix` for `TemporaryDirectory` to deflake the 
tests. Sometimes the test fail because the temporary directory names are same 
(https://github.com/apache/spark/actions/runs/8066850485/job/22036007390).

```
File "/__w/spark/spark/python/pyspark/sql/dataframe.py", line ?, in 
pyspark.sql.dataframe.DataFrame.writeStream
Failed example:
with tempfile.TemporaryDirectory() as d:
# Create a table with Rate source.
df.writeStream.toTable(
"my_table", checkpointLocation=d)
Exception raised:
Traceback (most recent call last):
  File "/usr/lib/python3.11/doctest.py", line 1353, in __run
exec(compile(example.source, filename, "single",
  File "", line 
1, in 
with tempfile.TemporaryDirectory() as d:
  File "/usr/lib/python3.11/tempfile.py", line 1043, in __exit__
self.cleanup()
  File "/usr/lib/python3.11/tempfile.py", line 1047, in cleanup
self._rmtree(self.name, ignore_errors=self._ignore_cleanup_errors)
  File "/usr/lib/python3.11/tempfile.py", line 1029, in _rmtree
_rmtree(name, onerror=onerror)
  File "/usr/lib/python3.11/shutil.py", line 738, in rmtree
onerror(os.rmdir, path, sys.exc_info())
  File "/usr/lib/python3.11/shutil.py", line 736, in rmtree
os.rmdir(path, dir_fd=dir_fd)
OSError: [Errno 39] Directory not empty: 
'/__w/spark/spark/python/target/4f062b09-213f-4ac2-a10a-2d704990141b/tmp29irqweq'
```

### Why are the changes needed?

To make the tests more robust.

### Does this PR introduce _any_ user-facing change?

No, test-only. There's a bit of user-facing documentation change but pretty 
trivial.

### How was this patch tested?

Manually tested. CI in this PR should test them out as well.

### Was this patch authored or co-authored using generative AI tooling?

    No.
    
Closes #45298 from HyukjinKwon/SPARK-47199.

Authored-by: Hyukjin Kwon 
Signed-off-by: Dongjoon Hyun 
---
 dev/connect-check-protos.py|  2 +-
 python/pyspark/broadcast.py|  6 +-
 python/pyspark/context.py  | 26 
 python/pyspark/files.py|  2 +-
 .../connect/test_legacy_mode_classification.py |  2 +-
 .../tests/connect/test_legacy_mode_evaluation.py   |  6 +-
 .../ml/tests/connect/test_legacy_mode_feature.py   |  6 +-
 .../ml/tests/connect/test_legacy_mode_pipeline.py  |  2 +-
 .../ml/tests/connect/test_legacy_mode_tuning.py|  2 +-
 python/pyspark/ml/tests/test_als.py|  2 +-
 python/pyspark/rdd.py  | 18 +++---
 python/pyspark/sql/catalog.py  |  8 +--
 python/pyspark/sql/dataframe.py|  6 +-
 python/pyspark/sql/protobuf/functions.py   |  4 +-
 python/pyspark/sql/readwriter.py   | 73 +++---
 python/pyspark/sql/session.py  |  2 +-
 python/pyspark/sql/streaming/readwriter.py | 50 ---
 .../sql/tests/connect/client/test_artifact.py  | 16 ++---
 .../sql/tests/connect/test_connect_basic.py| 20 +++---
 .../pyspark/sql/tests/streaming/test_streaming.py  |  2 +-
 python/pyspark/sql/tests/test_catalog.py   |  2 +-
 python/pyspark/sql/tests/test_python_datasource.py |  6 +-
 python/pyspark/sql/tests/test_udf_profiler.py  |  4 +-
 python/pyspark/sql/tests/test_udtf.py  | 16 ++---
 python/pyspark/tests/test_install_spark.py |  2 +-
 python/pyspark/tests/test_memory_profiler.py   |  4 +-
 python/pyspark/tests/test_profiler.py  |  2 +-
 python/pyspark/tests/test_shuffle.py   | 10 +--
 python/pyspark/util.py |  2 +-
 29 files changed, 154 insertions(+), 149 deletions(-)

diff --git a/dev/connect-check-protos.py b/dev/connect-check-protos.py
index 513938f8d4f8..ffc74d7b1608 100755
--- a/dev/connect-check-protos.py
+++ b/dev/connect-check-protos.py
@@ -45,7 +45,7 @@ def run_cmd(

(spark) branch branch-3.5 updated (cbf25fb633f4 -> b4118e0dbb50)

2024-02-27 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


from cbf25fb633f4 Revert "[SPARK-45599][CORE] Use object equality in 
OpenHashSet"
 add b4118e0dbb50 [SPARK-45599][CORE][3.5] Use object equality in 
OpenHashSet

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/util/collection/OpenHashSet.scala | 16 +++--
 .../spark/util/collection/OpenHashMapSuite.scala   | 30 +
 .../spark/util/collection/OpenHashSetSuite.scala   | 39 ++
 .../sql-tests/analyzer-results/ansi/array.sql.out  | 14 
 .../analyzer-results/ansi/literals.sql.out |  7 
 .../sql-tests/analyzer-results/array.sql.out   | 14 
 .../sql-tests/analyzer-results/group-by.sql.out| 19 +++
 .../sql-tests/analyzer-results/literals.sql.out|  7 
 .../src/test/resources/sql-tests/inputs/array.sql  |  4 +++
 .../test/resources/sql-tests/inputs/group-by.sql   | 15 +
 .../test/resources/sql-tests/inputs/literals.sql   |  3 ++
 .../resources/sql-tests/results/ansi/array.sql.out | 16 +
 .../sql-tests/results/ansi/literals.sql.out|  8 +
 .../test/resources/sql-tests/results/array.sql.out | 16 +
 .../resources/sql-tests/results/group-by.sql.out   | 22 
 .../resources/sql-tests/results/literals.sql.out   |  8 +
 .../apache/spark/sql/DataFrameAggregateSuite.scala | 33 ++
 17 files changed, 268 insertions(+), 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch branch-3.4 updated: [SPARK-47196][CORE][BUILD][3.4] Fix `core` module to succeed SBT tests

2024-02-27 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new 5ce628f803d3 [SPARK-47196][CORE][BUILD][3.4] Fix `core` module to 
succeed SBT tests
5ce628f803d3 is described below

commit 5ce628f803d3253ab5f6e97ab4572d73b79f1fd8
Author: Dongjoon Hyun 
AuthorDate: Tue Feb 27 18:22:08 2024 -0800

[SPARK-47196][CORE][BUILD][3.4] Fix `core` module to succeed SBT tests

### What changes were proposed in this pull request?

This PR aims to fix `core` module to succeed SBT tests by preserving 
`mockito-core`'s `byte-buddy` test dependency.

Currently, `Maven` respects `mockito-core`'s byte-buddy dependency while 
SBT doesn't.
**MAVEN**
```
$ build/mvn dependency:tree -pl core | grep byte-buddy
...
[INFO] |  +- net.bytebuddy:byte-buddy:jar:1.12.10:test
[INFO] |  +- net.bytebuddy:byte-buddy-agent:jar:1.12.10:test
```

**SBT**
```
$ build/sbt "core/test:dependencyTree" | grep byte-buddy
...
[info]   | | | | +-net.bytebuddy:byte-buddy:1.12.10 (evicted by: 1.12.18)
[info]   | | | | +-net.bytebuddy:byte-buddy:1.12.18
...
```

Note that this happens at `branch-3.4` from Apache Spark 3.4.0~3.4.2 only. 
branch-3.3/branch-3.5/master are okay.

### Why are the changes needed?

**BEFORE**
```
$ build/sbt "core/testOnly *.DAGSchedulerSuite"
[info] DAGSchedulerSuite:
[info] - [SPARK-3353] parent stage should have lower stage id *** FAILED 
*** (439 milliseconds)
[info]   java.lang.IllegalStateException: Could not initialize plugin: 
interface org.mockito.plugins.MockMaker (alternate: null)
...
[info] *** 1 SUITE ABORTED ***
[info] *** 118 TESTS FAILED ***
[error] Error during tests:
[error] org.apache.spark.scheduler.DAGSchedulerSuite
[error] (core / Test / testOnly) sbt.TestsFailedException: Tests 
unsuccessful
[error] Total time: 48 s, completed Feb 27, 2024, 1:26:27 PM
```

**AFTER**
```
$ build/sbt "core/testOnly *.DAGSchedulerSuite"
...
[info] All tests passed.
[success] Total time: 22 s, completed Feb 27, 2024, 1:24:34 PM
```

### Does this PR introduce _any_ user-facing change?

No, this is a test-only fix.

### How was this patch tested?

Pass the CIs and manual tests.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45295 from dongjoon-hyun/SPARK-47196.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 pom.xml | 12 
 1 file changed, 12 insertions(+)

diff --git a/pom.xml b/pom.xml
index 26f0b71a5114..373d17b76c09 100644
--- a/pom.xml
+++ b/pom.xml
@@ -423,6 +423,12 @@
   org.scalatestplus
   selenium-4-7_${scala.binary.version}
   test
+  
+
+  net.bytebuddy
+  byte-buddy
+
+  
 
 
   junit
@@ -725,6 +731,12 @@
 htmlunit-driver
 ${htmlunit-driver.version}
 test
+
+  
+net.bytebuddy
+byte-buddy
+  
+
   
   
   


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch branch-3.5 updated: Revert "[SPARK-45599][CORE] Use object equality in OpenHashSet"

2024-02-27 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new cbf25fb633f4 Revert "[SPARK-45599][CORE] Use object equality in 
OpenHashSet"
cbf25fb633f4 is described below

commit cbf25fb633f4bf2f83a6f6e39aafaa80bf47e160
Author: Dongjoon Hyun 
AuthorDate: Tue Feb 27 08:38:54 2024 -0800

Revert "[SPARK-45599][CORE] Use object equality in OpenHashSet"

This reverts commit 588a55d010fefda7a63cde3b616ac38728fe4cfe.
---
 .../apache/spark/util/collection/OpenHashSet.scala | 16 ++---
 .../spark/util/collection/OpenHashMapSuite.scala   | 30 -
 .../spark/util/collection/OpenHashSetSuite.scala   | 39 --
 .../sql-tests/analyzer-results/ansi/array.sql.out  | 14 
 .../analyzer-results/ansi/literals.sql.out |  7 
 .../sql-tests/analyzer-results/array.sql.out   | 14 
 .../sql-tests/analyzer-results/group-by.sql.out| 19 ---
 .../sql-tests/analyzer-results/literals.sql.out|  7 
 .../src/test/resources/sql-tests/inputs/array.sql  |  4 ---
 .../test/resources/sql-tests/inputs/group-by.sql   | 15 -
 .../test/resources/sql-tests/inputs/literals.sql   |  3 --
 .../resources/sql-tests/results/ansi/array.sql.out | 16 -
 .../sql-tests/results/ansi/literals.sql.out|  8 -
 .../test/resources/sql-tests/results/array.sql.out | 16 -
 .../resources/sql-tests/results/group-by.sql.out   | 22 
 .../resources/sql-tests/results/literals.sql.out   |  8 -
 .../apache/spark/sql/DataFrameAggregateSuite.scala | 33 --
 17 files changed, 3 insertions(+), 268 deletions(-)

diff --git 
a/core/src/main/scala/org/apache/spark/util/collection/OpenHashSet.scala 
b/core/src/main/scala/org/apache/spark/util/collection/OpenHashSet.scala
index 435cf1a03cbc..6815e47a198d 100644
--- a/core/src/main/scala/org/apache/spark/util/collection/OpenHashSet.scala
+++ b/core/src/main/scala/org/apache/spark/util/collection/OpenHashSet.scala
@@ -126,17 +126,6 @@ class OpenHashSet[@specialized(Long, Int, Double, Float) 
T: ClassTag](
 this
   }
 
-  /**
-   * Check if a key exists at the provided position using object equality 
rather than
-   * cooperative equality. Otherwise, hash sets will mishandle values for 
which `==`
-   * and `equals` return different results, like 0.0/-0.0 and NaN/NaN.
-   *
-   * See: https://issues.apache.org/jira/browse/SPARK-45599
-   */
-  @annotation.nowarn("cat=other-non-cooperative-equals")
-  private def keyExistsAtPos(k: T, pos: Int) =
-_data(pos) equals k
-
   /**
* Add an element to the set. This one differs from add in that it doesn't 
trigger rehashing.
* The caller is responsible for calling rehashIfNeeded.
@@ -157,7 +146,8 @@ class OpenHashSet[@specialized(Long, Int, Double, Float) T: 
ClassTag](
 _bitset.set(pos)
 _size += 1
 return pos | NONEXISTENCE_MASK
-  } else if (keyExistsAtPos(k, pos)) {
+  } else if (_data(pos) == k) {
+// Found an existing key.
 return pos
   } else {
 // quadratic probing with values increase by 1, 2, 3, ...
@@ -191,7 +181,7 @@ class OpenHashSet[@specialized(Long, Int, Double, Float) T: 
ClassTag](
 while (true) {
   if (!_bitset.get(pos)) {
 return INVALID_POS
-  } else if (keyExistsAtPos(k, pos)) {
+  } else if (k == _data(pos)) {
 return pos
   } else {
 // quadratic probing with values increase by 1, 2, 3, ...
diff --git 
a/core/src/test/scala/org/apache/spark/util/collection/OpenHashMapSuite.scala 
b/core/src/test/scala/org/apache/spark/util/collection/OpenHashMapSuite.scala
index f7b026ab565f..1af99e9017c9 100644
--- 
a/core/src/test/scala/org/apache/spark/util/collection/OpenHashMapSuite.scala
+++ 
b/core/src/test/scala/org/apache/spark/util/collection/OpenHashMapSuite.scala
@@ -249,34 +249,4 @@ class OpenHashMapSuite extends SparkFunSuite with Matchers 
{
 map(null) = null
 assert(map.get(null) === Some(null))
   }
-
-  test("SPARK-45599: 0.0 and -0.0 should count distinctly; NaNs should count 
together") {
-// Exactly these elements provided in roughly this order trigger a 
condition where lookups of
-// 0.0 and -0.0 in the bitset happen to collide, causing their counts to 
be merged incorrectly
-// and inconsistently if `==` is used to check for key equality.
-val spark45599Repro = Seq(
-  Double.NaN,
-  2.0,
-  168.0,
-  Double.NaN,
-  Double.NaN,
-  -0.0,
-  153.0,
-  0.0
-)
-
-val map1 = new OpenHashMap[Double, Int]()
-spark45599Repro.foreach(map1.changeValue(_, 1, {_ + 1}))
-assert(map1(0.0) == 1)
-assert(map1(-0.0) == 1)
-assert(map1(Double.NaN) == 3)
-
-val map2 = new OpenHashM

(spark) branch master updated: [SPARK-47185][SS][TESTS] Increase timeout between actions in KafkaContinuousSourceSuite

2024-02-27 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new ba773fe9a286 [SPARK-47185][SS][TESTS] Increase timeout between actions 
in KafkaContinuousSourceSuite
ba773fe9a286 is described below

commit ba773fe9a28640af6d2ddebd8104b05e28778f58
Author: Hyukjin Kwon 
AuthorDate: Tue Feb 27 07:36:08 2024 -0800

[SPARK-47185][SS][TESTS] Increase timeout between actions in 
KafkaContinuousSourceSuite

### What changes were proposed in this pull request?

This PR proposes to increase the timeout between between actions in 
`KafkaContinuousSourceSuite`.

### Why are the changes needed?

In Mac OS build, those tests fail indeterministically, see
- https://github.com/apache/spark/actions/runs/8054862135/job/22000404856
- https://github.com/apache/spark/actions/runs/8040413156/job/21958488693
- https://github.com/apache/spark/actions/runs/8032862212/job/21942732320
- https://github.com/apache/spark/actions/runs/8024427919/job/21937366481

`KafkaContinuousSourceSuite` is specifically slow in Mac OS. Kafka 
producers send the messages correctly, but the consumers can't get the messages 
for some reasons. You can't get the offsets for long time. This is not an issue 
in micro batch but I fail to identify the difference.

I just decided to increase the timeout between actions for now. This is 
more just a workaround.

### Does this PR introduce _any_ user-facing change?

No, test-only.

### How was this patch tested?

Manually tested in my Mac.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45283 from HyukjinKwon/SPARK-47185.

Authored-by: Hyukjin Kwon 
Signed-off-by: Dongjoon Hyun 
---
 .../org/apache/spark/sql/kafka010/KafkaContinuousSourceSuite.scala | 3 +++
 1 file changed, 3 insertions(+)

diff --git 
a/connector/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaContinuousSourceSuite.scala
 
b/connector/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaContinuousSourceSuite.scala
index e42662c7a62b..fa1db6bfaccc 100644
--- 
a/connector/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaContinuousSourceSuite.scala
+++ 
b/connector/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaContinuousSourceSuite.scala
@@ -18,6 +18,7 @@
 package org.apache.spark.sql.kafka010
 
 import org.apache.kafka.clients.producer.ProducerRecord
+import org.scalatest.time.SpanSugar._
 
 import org.apache.spark.sql.Dataset
 import org.apache.spark.sql.execution.datasources.v2.ContinuousScanExec
@@ -28,6 +29,8 @@ import org.apache.spark.sql.streaming.Trigger
 class KafkaContinuousSourceSuite extends KafkaSourceSuiteBase with 
KafkaContinuousTest {
   import testImplicits._
 
+  override val streamingTimeout = 60.seconds
+
   test("read Kafka transactional messages: read_committed") {
 val table = "kafka_continuous_source_test"
 withTable(table) {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-47182][BUILD] Exclude `commons-(io|lang3)` transitive dependencies from `commons-compress` and `avro*`

2024-02-26 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 209d0fcf22b1 [SPARK-47182][BUILD] Exclude `commons-(io|lang3)` 
transitive dependencies from `commons-compress` and `avro*`
209d0fcf22b1 is described below

commit 209d0fcf22b174c308d2ae239795d6193e2ca85e
Author: Dongjoon Hyun 
AuthorDate: Mon Feb 26 22:57:01 2024 -0800

[SPARK-47182][BUILD] Exclude `commons-(io|lang3)` transitive dependencies 
from `commons-compress` and `avro*`

### Why are the changes needed?

This PR aims to exclude `commons-(io|lang3)` transitive dependencies from 
`commons-compress`, `avro`, and `avro-mapred` dependencies.

### Does this PR introduce _any_ user-facing change?

Apache Spark define and use our own versions. The exclusion of the 
transitive dependencies will clarify that.


https://github.com/apache/spark/blob/1a408033daf458f1ceebbe14a560355a1a2c0a70/pom.xml#L198


https://github.com/apache/spark/blob/1a408033daf458f1ceebbe14a560355a1a2c0a70/pom.xml#L194

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45278 from dongjoon-hyun/SPARK-47182.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 pom.xml | 28 
 1 file changed, 28 insertions(+)

diff --git a/pom.xml b/pom.xml
index 22606caaf65c..8e977395378c 100644
--- a/pom.xml
+++ b/pom.xml
@@ -619,6 +619,16 @@
 org.apache.commons
 commons-compress
 ${commons-compress.version}
+
+  
+commons-io
+commons-io
+  
+  
+org.apache.commons
+commons-lang3
+  
+
   
   
 org.apache.commons
@@ -1484,6 +1494,16 @@
 org.apache.avro
 avro
 ${avro.version}
+
+  
+commons-io
+commons-io
+  
+  
+org.apache.commons
+commons-lang3
+  
+
   
   
 org.apache.avro
@@ -1523,6 +1543,14 @@
 com.github.luben
 zstd-jni
   
+  
+commons-io
+commons-io
+  
+  
+org.apache.commons
+commons-lang3
+  
 
   
   

(spark) branch master updated (031b90b2ac0b -> 1a408033daf4)

2024-02-26 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 031b90b2ac0b [SPARK-47178][PYTHON][TESTS] Add a test case for 
createDataFrame with dataclasses
 add 1a408033daf4 [SPARK-47181][CORE][TESTS] Fix `MasterSuite` to validate 
the number of registered workers

No new revisions were added by this update.

Summary of changes:
 .../src/test/scala/org/apache/spark/deploy/master/MasterSuite.scala | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-45527][CORE][TESTS][FOLLOW-UP] Reduce the number of test cases in fraction resource calculation

2024-02-26 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new afa9f9679bc0 [SPARK-45527][CORE][TESTS][FOLLOW-UP] Reduce the number 
of test cases in fraction resource calculation
afa9f9679bc0 is described below

commit afa9f9679bc01e8afbf7e4a47c203bfcc1a0652a
Author: Hyukjin Kwon 
AuthorDate: Mon Feb 26 18:54:07 2024 -0800

[SPARK-45527][CORE][TESTS][FOLLOW-UP] Reduce the number of test cases in 
fraction resource calculation

### What changes were proposed in this pull request?

There are two more instances to fix in 
https://github.com/apache/spark/pull/45268 mistakenly missed. This PR fixes 
both.

### Why are the changes needed?

See https://github.com/apache/spark/pull/45268

### Does this PR introduce _any_ user-facing change?

No, test-only.

### How was this patch tested?

Manually

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45272 from HyukjinKwon/SPARK-45527-followup2.

Authored-by: Hyukjin Kwon 
Signed-off-by: Dongjoon Hyun 
---
 .../scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git 
a/core/src/test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala 
b/core/src/test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala
index 3248e64bcc58..df5031e05887 100644
--- 
a/core/src/test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala
+++ 
b/core/src/test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala
@@ -2374,7 +2374,7 @@ class TaskSchedulerImplSuite extends SparkFunSuite with 
LocalSparkContext
   // 1 executor with 4 GPUS
   Seq(true, false).foreach { barrierMode =>
 val barrier = if (barrierMode) "barrier" else ""
-(1 to 20).foreach { taskNum =>
+scala.util.Random.shuffle((1 to 20).toList).take(5).foreach { taskNum =>
   val gpuTaskAmount = 
ResourceAmountUtils.toFractionalResource(ONE_ENTIRE_RESOURCE / taskNum)
   test(s"SPARK-45527 TaskResourceProfile with 
task.gpu.amount=${gpuTaskAmount} can " +
 s"restrict $taskNum $barrier tasks run in the same executor") {
@@ -2423,7 +2423,7 @@ class TaskSchedulerImplSuite extends SparkFunSuite with 
LocalSparkContext
   // 4 executors, each of which has 1 GPU
   Seq(true, false).foreach { barrierMode =>
 val barrier = if (barrierMode) "barrier" else ""
-(1 to 20).foreach { taskNum =>
+scala.util.Random.shuffle((1 to 20).toList).take(5).foreach { taskNum =>
   val gpuTaskAmount = 
ResourceAmountUtils.toFractionalResource(ONE_ENTIRE_RESOURCE / taskNum)
   test(s"SPARK-45527 TaskResourceProfile with 
task.gpu.amount=${gpuTaskAmount} can " +
 s"restrict $taskNum $barrier tasks run on the different executor") {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-45527][CORE][TESTS][FOLLOWUP] Reduce the number of threads from 1k to 100 in `TaskSchedulerImplSuite`

2024-02-26 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 76c4fd56c5a5 [SPARK-45527][CORE][TESTS][FOLLOWUP] Reduce the number of 
threads from 1k to 100 in `TaskSchedulerImplSuite`
76c4fd56c5a5 is described below

commit 76c4fd56c5a53bf9f726820a44ca0f610f7b91f6
Author: Dongjoon Hyun 
AuthorDate: Mon Feb 26 14:32:10 2024 -0800

[SPARK-45527][CORE][TESTS][FOLLOWUP] Reduce the number of threads from 1k 
to 100 in `TaskSchedulerImplSuite`

### What changes were proposed in this pull request?

This PR is a follow-up of #43494 in order to reduce the number of threads 
of SparkContext from 1k to 100 in the test environment.

### Why are the changes needed?

To reduce the test resource requirement. 1000 threads seem to be too large 
for some CI systems with a limited resource.
- 
https://github.com/apache/spark/actions/workflows/build_maven_java21_macos14.yml
  - https://github.com/apache/spark/actions/runs/8054862135/job/22000403549
```
Warning: [766.327s][warning][os,thread] Failed to start thread "Unknown 
thread" - pthread_create failed (EAGAIN) for attributes: stacksize: 4096k, 
guardsize: 16k, detached.
Warning: [766.327s][warning][os,thread] Failed to start the native thread 
for java.lang.Thread "dispatcher-event-loop-840"
*** RUN ABORTED ***
An exception or error caused a run to abort: unable to create native 
thread: possibly out of memory or process/resource limits reached
  java.lang.OutOfMemoryError: unable to create native thread: possibly out 
of memory or process/resource limits reached
```

### Does this PR introduce _any_ user-facing change?

No, this is a test-case update.

### How was this patch tested?

Pass the CIs and monitor Daily Apple Silicon test.

### Was this patch authored or co-authored using generative AI tooling?

No.

    Closes #45264 from dongjoon-hyun/SPARK-45527.
    
Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 .../test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala  | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/core/src/test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala 
b/core/src/test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala
index 3e43442583ec..f7b868c66468 100644
--- 
a/core/src/test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala
+++ 
b/core/src/test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala
@@ -2489,7 +2489,7 @@ class TaskSchedulerImplSuite extends SparkFunSuite with 
LocalSparkContext
 val taskCpus = 1
 val taskGpus = 0.3
 val executorGpus = 4
-val executorCpus = 1000
+val executorCpus = 100
 
 // each tasks require 0.3 gpu
 val taskScheduler = setupScheduler(numCores = executorCpus,


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated (298134fd5e98 -> a939a7d0fd9c)

2024-02-26 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 298134fd5e98 [SPARK-47009][SQL] Enable create table support for 
collation
 add a939a7d0fd9c [SPARK-47170][BUILD][CONNECT] Remove 
`jakarta.servlet-api` and `javax.servlet-api` dependency scope in 
`connect/server` module

No new revisions were added by this update.

Summary of changes:
 connector/connect/server/pom.xml | 2 --
 1 file changed, 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-47163][BUILD] Fix `make-distribution.sh` to check `jackson-core-asl-1.9.13.jar` existence first

2024-02-25 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 242dd2e819b4 [SPARK-47163][BUILD] Fix `make-distribution.sh` to check 
`jackson-core-asl-1.9.13.jar` existence first
242dd2e819b4 is described below

commit 242dd2e819b4512fd46e6d02b1bd0f937ad5419d
Author: Dongjoon Hyun 
AuthorDate: Sun Feb 25 23:54:21 2024 -0800

[SPARK-47163][BUILD] Fix `make-distribution.sh` to check 
`jackson-core-asl-1.9.13.jar` existence first

### What changes were proposed in this pull request?

This PR aims to fix `make-distribution.sh` script to check 
`jackson-*-asl-*.jar` existence first before copying.

### Why are the changes needed?

Currently, `make-distribution.sh` script fails if it builds without 
`hive-thriftserver`.

### Does this PR introduce _any_ user-facing change?

No, this bug is introduced by unreleased feature.

### How was this patch tested?

Pass the CIs and manually build without Hive like the following.

```
$ dev/make-distribution.sh
$ ls dist/
LICENSENOTICE README.md  RELEASEbinconf   data  
 examples   jars   kubernetes licenses   python sbin
```

```
$ dev/make-distribution.sh -Phive-thriftserver
$ ls dist
LICENSE  NOTICE   README.mdRELEASE  bin  conf   
  data examples hive-jackson jars licenses python   
sbin
```

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #45253 from dongjoon-hyun/SPARK-47163.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 dev/make-distribution.sh | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/dev/make-distribution.sh b/dev/make-distribution.sh
index 5c4c36df37a6..70684a02a8dd 100755
--- a/dev/make-distribution.sh
+++ b/dev/make-distribution.sh
@@ -190,10 +190,12 @@ echo "Build flags: $@" >> "$DISTDIR/RELEASE"
 cp "$SPARK_HOME"/assembly/target/scala*/jars/* "$DISTDIR/jars/"
 
 # Only create the hive-jackson directory if they exist.
-for f in "$DISTDIR"/jars/jackson-*-asl-*.jar; do
-  mkdir -p "$DISTDIR"/hive-jackson
-  mv $f "$DISTDIR"/hive-jackson/
-done
+if [ -f "$DISTDIR"/jars/jackson-core-asl-1.9.13.jar ]; then
+  for f in "$DISTDIR"/jars/jackson-*-asl-*.jar; do
+mkdir -p "$DISTDIR"/hive-jackson
+mv $f "$DISTDIR"/hive-jackson/
+  done
+fi
 
 # Only create the yarn directory if the yarn artifacts were built.
 if [ -f 
"$SPARK_HOME"/common/network-yarn/target/scala*/spark-*-yarn-shuffle.jar ]; then


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-47160][K8S] Update K8s `Dockerfile` to include `hive-jackson` directory if exists

2024-02-25 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 5c5b47c5ad51 [SPARK-47160][K8S] Update K8s `Dockerfile` to include 
`hive-jackson` directory if exists
5c5b47c5ad51 is described below

commit 5c5b47c5ad51131fd92a5682140481361b023d51
Author: Dongjoon Hyun 
AuthorDate: Sun Feb 25 23:21:49 2024 -0800

[SPARK-47160][K8S] Update K8s `Dockerfile` to include `hive-jackson` 
directory if exists

### What changes were proposed in this pull request?

This PR aims to update K8s `Dockerfile` to include `hive-jackson` jar 
directory if exists.

### Why are the changes needed?

After SPARK-47152, we can have `hive-jackson` directory.

### Does this PR introduce _any_ user-facing change?

No, this is used by Spark internal by default.

### How was this patch tested?

Pass the CIs and manual check.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45251 from dongjoon-hyun/SPARK-47160.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 .../kubernetes/docker/src/main/dockerfiles/spark/Dockerfile | 2 ++
 1 file changed, 2 insertions(+)

diff --git 
a/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile 
b/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile
index 25d7e076169b..421639cf2880 100644
--- a/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile
+++ b/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile
@@ -42,6 +42,8 @@ RUN set -ex && \
 rm -rf /var/cache/apt/* && rm -rf /var/lib/apt/lists/*
 
 COPY jars /opt/spark/jars
+# Copy hive-jackson directory if exists
+COPY hive-jackso[n] /opt/spark/hive-jackson
 # Copy RELEASE file if exists
 COPY RELEAS[E] /opt/spark/RELEASE
 COPY bin /opt/spark/bin


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-47161][INFRA][R] Uses hash key properly for SparkR build on Windows

2024-02-25 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 1ca60d353792 [SPARK-47161][INFRA][R] Uses hash key properly for SparkR 
build on Windows
1ca60d353792 is described below

commit 1ca60d353792d2f823a35354376d34797bfc60c2
Author: Hyukjin Kwon 
AuthorDate: Sun Feb 25 23:17:33 2024 -0800

[SPARK-47161][INFRA][R] Uses hash key properly for SparkR build on Windows

### What changes were proposed in this pull request?

This PR fixes the mistake in https://github.com/apache/spark/pull/45175 
that sets the hash key wrongly for Maven cache.

### Why are the changes needed?

To use the cache properly. SparkR on Windows does not find its cache 
properly: 
https://github.com/apache/spark/actions/runs/8039485831/job/2195633

![Screenshot 2024-02-26 at 2 48 07 
PM](https://github.com/apache/spark/assets/6477701/1c151c04-c07c-4968-af3a-b745cc7af391)

### Does this PR introduce _any_ user-facing change?

No, dev-only.

### How was this patch tested?

Will monitor the CI.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45252 from HyukjinKwon/SPARK-47161.

Authored-by: Hyukjin Kwon 
Signed-off-by: Dongjoon Hyun 
---
 .github/workflows/build_sparkr_window.yml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.github/workflows/build_sparkr_window.yml 
b/.github/workflows/build_sparkr_window.yml
index a7a265965662..155422d22e03 100644
--- a/.github/workflows/build_sparkr_window.yml
+++ b/.github/workflows/build_sparkr_window.yml
@@ -42,7 +42,7 @@ jobs:
   uses: actions/cache@v4
   with:
 path: ~/.m2/repository
-key: build-sparkr-maven-${{ hashFiles('**/pom.xml') }}
+key: build-sparkr-windows-maven-${{ hashFiles('**/pom.xml') }}
 restore-keys: |
   build-sparkr-windows-maven-
 - name: Install Java 17


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated (679b468854ab -> 0ff18e579c2f)

2024-02-25 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 679b468854ab [MINOR][CONNECT][TESTS] Chain waitFor after 
destroyForcibly in SparkConnectServerUtils
 add 0ff18e579c2f [SPARK-46802][PYTHON][TESTS][FOLLOWUP] Remove obsolete 
comment in run-tests-with-coverage

No new revisions were added by this update.

Summary of changes:
 python/run-tests-with-coverage | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-47154][SS][TESTS] Fix `kafka-0-10-sql` to use `ResetSystemProperties` if `KafkaTestUtils` is used

2024-02-24 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 18b86068ff4c [SPARK-47154][SS][TESTS] Fix `kafka-0-10-sql` to use 
`ResetSystemProperties` if `KafkaTestUtils` is used
18b86068ff4c is described below

commit 18b86068ff4c72ba686d3d9275f9284d58cd3ef4
Author: Dongjoon Hyun 
AuthorDate: Sat Feb 24 11:15:05 2024 -0800

[SPARK-47154][SS][TESTS] Fix `kafka-0-10-sql` to use 
`ResetSystemProperties` if `KafkaTestUtils` is used

### What changes were proposed in this pull request?

This PR aims to fix `kafka-0-10-sql` module to use `ResetSystemProperties` 
if `KafkaTestUtils` is used. The following test suites are fixed.

- ConsumerStrategySuite
- KafkaDataConsumerSuite
- KafkaMissingOffsetsTest
  - KafkaDontFailOnDataLossSuite
  - KafkaSourceStressForDontFailOnDataLossSuite
- KafkaTest
  - KafkaDelegationTokenSuite
  - KafkaMicroBatchSourceSuite
- KafkaMicroBatchV1SourceWithAdminSuite
- KafkaMicroBatchV2SourceWithAdminSuite
- KafkaMicroBatchV1SourceSuite
- KafkaMicroBatchV2SourceSuite
- KafkaSourceStressSuite
  - KafkaOffsetReaderSuite
  - KafkaRelationSuite
- KafkaRelationSuiteWithAdminV1
- KafkaRelationSuiteWithAdminV2
- KafkaRelationSuiteV1
- KafkaRelationSuiteV2
  - KafkaSinkSuite
- KafkaSinkMicroBatchStreamingSuite
- KafkaContinuousSinkSuite
- KafkaSinkBatchSuiteV1
- KafkaSinkBatchSuiteV2

### Why are the changes needed?

Apache Spark `master` branch has two `KafkaTestUtils` classes.

```
$ find . -name KafkaTestUtils.scala

./connector/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaTestUtils.scala

./connector/kafka-0-10/src/test/scala/org/apache/spark/streaming/kafka010/KafkaTestUtils.scala
```

`KafkaTestUtils` of `kafka-0-10-sql` uses `System.setProperty` and affects 
8 files. We need to use `ResetSystemProperties` to isolate the test cases.


https://github.com/apache/spark/blob/ee312ecb40ea5b5303fc794a3d494b6f27cda923/connector/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaTestUtils.scala#L290

```
$ git grep KafkaTestUtils connector/kafka-0-10-sql | awk -F: '{print $1}' | 
sort | uniq
connector/kafka-0-10-sql/src/test/resources/log4j2.properties

connector/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/ConsumerStrategySuite.scala

connector/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaDelegationTokenSuite.scala

connector/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaDontFailOnDataLossSuite.scala

connector/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala

connector/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaOffsetReaderSuite.scala

connector/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaRelationSuite.scala

connector/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaSinkSuite.scala

connector/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaTestUtils.scala

connector/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/consumer/KafkaDataConsumerSuite.scala
```

### Does this PR introduce _any_ user-facing change?

No. This is a test-only PR.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45239 from dongjoon-hyun/SPARK-47154.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 .../scala/org/apache/spark/sql/kafka010/ConsumerStrategySuite.scala| 3 ++-
 .../org/apache/spark/sql/kafka010/KafkaDontFailOnDataLossSuite.scala   | 3 ++-
 .../src/test/scala/org/apache/spark/sql/kafka010/KafkaTest.scala   | 3 ++-
 .../apache/spark/sql/kafka010/consumer/KafkaDataConsumerSuite.scala| 2 ++
 4 files changed, 8 insertions(+), 3 deletions(-)

diff --git 
a/connector/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/ConsumerStrategySuite.scala
 
b/connector/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/ConsumerStrategySuite.scala
index 44baab7f2468..cbbbcf9317cd 100644
--- 
a/connector/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/ConsumerStrategySuite.scala
+++ 
b/connector/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/ConsumerStrategySuite.scala
@@ -27,8 +27,9 @@ import org.apache.kafka.common.TopicPartition
 import org.mockito.Mockito.mock
 
 import org.apache.spark.{SparkConf, SparkEnv, SparkFunSuite}
+import org.apache.spark.util.ResetSystemProperties
 
-class ConsumerStrategySui

(spark) branch master updated: [SPARK-47152][SQL][BUILD] Provide `CodeHaus Jackson` dependencies via a new optional directory

2024-02-24 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new dd3f81c3d610 [SPARK-47152][SQL][BUILD] Provide `CodeHaus Jackson` 
dependencies via a new optional directory
dd3f81c3d610 is described below

commit dd3f81c3d6102fe1427702e97f7f42aa64b0bf5e
Author: Dongjoon Hyun 
AuthorDate: Sat Feb 24 11:05:41 2024 -0800

[SPARK-47152][SQL][BUILD] Provide `CodeHaus Jackson` dependencies via a new 
optional directory

### What changes were proposed in this pull request?

This PR aims to provide `Apache Hive`'s `CodeHaus Jackson` dependencies via 
a new optional directory, `hive-jackson`, instead of the standard `jars` 
directory of Apache Spark binary distribution. Additionally, two internal 
configurations are added whose default values are `hive-jackson/*`.

  - `spark.driver.defaultExtraClassPath`
  - `spark.executor.defaultExtraClassPath`

For example, Apache Spark distributions have been providing 
`spark-*-yarn-shuffle.jar` file under `yarn` directory instead of `jars`.

**YARN SHUFFLE EXAMPLE**
```
$ ls -al yarn/*jar
-rw-r--r--  1 dongjoon  staff  77352048 Sep  8 19:08 
yarn/spark-3.5.0-yarn-shuffle.jar
```

This PR changes `Apache Hive`'s `CodeHaus Jackson` dependencies in a 
similar way.

**BEFORE**
```
$ ls -al jars/*asl*
-rw-r--r--  1 dongjoon  staff  232248 Sep  8 19:08 
jars/jackson-core-asl-1.9.13.jar
-rw-r--r--  1 dongjoon  staff  780664 Sep  8 19:08 
jars/jackson-mapper-asl-1.9.13.jar
```

**AFTER**
```
$ ls -al jars/*asl*
zsh: no matches found: jars/*asl*

$ ls -al hive-jackson
total 1984
drwxr-xr-x   4 dongjoon  staff 128 Feb 23 15:37 .
drwxr-xr-x  16 dongjoon  staff 512 Feb 23 16:34 ..
-rw-r--r--   1 dongjoon  staff  232248 Feb 23 15:37 
jackson-core-asl-1.9.13.jar
-rw-r--r--   1 dongjoon  staff  780664 Feb 23 15:37 
jackson-mapper-asl-1.9.13.jar
```

### Why are the changes needed?

Since Apache Hadoop 3.3.5, only Apache Hive requires old CodeHaus Jackson 
dependencies.

Apache Spark 3.5.0 tried to eliminate them completely but it's reverted due 
to Hive UDF support.

  - https://github.com/apache/spark/pull/40893
  - https://github.com/apache/spark/pull/42446

SPARK-47119 added a way to exclude Apache Hive Jackson dependencies at the 
distribution building stage for Apache Spark 4.0.0.

  - #45201

This PR provides a way to exclude Apache Hive Jackson dependencies at 
runtime for Apache Spark 4.0.0.

- Spark Shell without Apache Hive Jackson dependencies.
```
$ bin/spark-shell --driver-default-class-path ""
```

- Spark SQL Shell without Apache Hive Jackson dependencies.
```
$ bin/spark-sql --driver-default-class-path ""
```

- Spark Thrift Server without Apache Hive Jackson dependencies.
```
$ sbin/start-thriftserver.sh --driver-default-class-path ""
```

In addition, last but not least, this PR eliminates `CodeHaus Jackson` 
dependencies from the following Apache Spark deamons (using `spark-daemon.sh 
start`) because they don't require Hive `CodeHaus Jackson` dependencies

- Spark Master
- Spark Worker
- Spark History Server

```
$ grep 'spark-daemon.sh start' *
start-history-server.sh:exec "${SPARK_HOME}/sbin"/spark-daemon.sh start 
$CLASS 1 "$"
start-master.sh:"${SPARK_HOME}/sbin"/spark-daemon.sh start $CLASS 1 \
start-worker.sh:  "${SPARK_HOME}/sbin"/spark-daemon.sh start $CLASS 
$WORKER_NUM \
```

### Does this PR introduce _any_ user-facing change?

No. There is no user-facing change by default.

- For the distributions with `hive-jackson-provided` profile, the `scope` 
of Apache Hive Jackson dependencies is `provided` and `hive-jackson` directory 
is not created at all.
- For the distributions with default setting, the `scope` of Apache Hive 
Jackson dependencies is still `compile`. In addition, they are in the Apache 
Spark's built-in class path like the following.

![Screenshot 2024-02-23 at 16 48 
08](https://github.com/apache/spark/assets/9700541/99ed0f02-2792-4666-ae19-ce4f4b7b8ff9)

- The following Spark Deamon don't use `CodeHaus Jackson` dependencies.
  - Spark Master
  - Spark Worker
  - Spark History Server

### How was this patch tested?

Pass the CIs and manually build a distribution and check the class paths in 
the `Environment` Tab.

```
$ dev/make-distribution.sh -Phive,hive-thriftserver
```

### Was this patch authored or co-authored using generative 

(spark) branch master updated (c2dbb6d04bc9 -> ee312ecb40ea)

2024-02-23 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from c2dbb6d04bc9 [SPARK-47099][SQL][FOLLOWUP] Regenerate 
`try_arithmetic.sql.out.java21`
 add ee312ecb40ea [SPARK-47151][PYTHON][PS][BUILD] Upgrade to `pandas` 2.2.1

No new revisions were added by this update.

Summary of changes:
 dev/infra/Dockerfile   | 4 ++--
 python/pyspark/pandas/supported_api_gen.py | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated (772a445e412b -> c2dbb6d04bc9)

2024-02-23 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 772a445e412b [SPARK-47035][SS][CONNECT] Protocol for Client-Side 
Listener
 add c2dbb6d04bc9 [SPARK-47099][SQL][FOLLOWUP] Regenerate 
`try_arithmetic.sql.out.java21`

No new revisions were added by this update.

Summary of changes:
 .../src/test/resources/sql-tests/results/try_arithmetic.sql.out.java21  | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated (d20650bc8cf2 -> 28951ed6681f)

2024-02-23 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from d20650bc8cf2 [SPARK-46975][PS] Support dedicated fallback methods
 add 28951ed6681f [SPARK-47118][BUILD][CORE][SQL][UI] Migrate from Jetty 10 
to Jetty 11

No new revisions were added by this update.

Summary of changes:
 connector/connect/server/pom.xml   |  10 ++
 .../sql/connect/ui/SparkConnectServerPage.scala|   3 +-
 .../connect/ui/SparkConnectServerSessionPage.scala |   4 +-
 .../connect/ui/SparkConnectServerPageSuite.scala   |   2 +-
 core/pom.xml   |   5 +-
 .../main/scala/org/apache/spark/deploy/Utils.scala |   3 +-
 .../spark/deploy/history/ApplicationCache.scala|   4 +-
 .../apache/spark/deploy/history/HistoryPage.scala  |   4 +-
 .../spark/deploy/history/HistoryServer.scala   |   2 +-
 .../org/apache/spark/deploy/history/LogPage.scala  |   4 +-
 .../spark/deploy/master/ui/ApplicationPage.scala   |   4 +-
 .../apache/spark/deploy/master/ui/LogPage.scala|   4 +-
 .../apache/spark/deploy/master/ui/MasterPage.scala |   3 +-
 .../spark/deploy/master/ui/MasterWebUI.scala   |   3 +-
 .../spark/deploy/rest/RestSubmissionClient.scala   |   2 +-
 .../spark/deploy/rest/RestSubmissionServer.scala   |   3 +-
 .../spark/deploy/rest/StandaloneRestServer.scala   |   3 +-
 .../apache/spark/deploy/worker/ui/LogPage.scala|   3 +-
 .../apache/spark/deploy/worker/ui/WorkerPage.scala |   3 +-
 .../spark/deploy/worker/ui/WorkerWebUI.scala   |   3 +-
 .../apache/spark/metrics/sink/MetricsServlet.scala |   2 +-
 .../spark/metrics/sink/PrometheusServlet.scala |   2 +-
 .../spark/status/api/v1/ApiRootResource.scala  |   8 +-
 .../status/api/v1/ApplicationListResource.scala|   5 +-
 .../spark/status/api/v1/JacksonMessageWriter.scala |   6 +-
 .../status/api/v1/OneApplicationResource.scala |   5 +-
 .../spark/status/api/v1/PrometheusResource.scala   |   5 +-
 .../spark/status/api/v1/SimpleDateParam.scala  |   7 +-
 .../spark/status/api/v1/StagesResource.scala   |   5 +-
 .../scala/org/apache/spark/ui/DriverLogPage.scala  |   4 +-
 .../scala/org/apache/spark/ui/GraphUIData.scala|   3 +-
 .../org/apache/spark/ui/HttpSecurityFilter.scala   |   4 +-
 .../scala/org/apache/spark/ui/JettyUtils.scala |   4 +-
 .../scala/org/apache/spark/ui/PagedTable.scala |   2 +-
 .../main/scala/org/apache/spark/ui/SparkUI.scala   |   2 +-
 .../main/scala/org/apache/spark/ui/UIUtils.scala   |   4 +-
 .../src/main/scala/org/apache/spark/ui/WebUI.scala |   4 +-
 .../org/apache/spark/ui/env/EnvironmentPage.scala  |   4 +-
 .../spark/ui/exec/ExecutorHeapHistogramPage.scala  |   4 +-
 .../spark/ui/exec/ExecutorThreadDumpPage.scala |   4 +-
 .../org/apache/spark/ui/exec/ExecutorsTab.scala|   4 +-
 .../org/apache/spark/ui/jobs/AllJobsPage.scala |   2 +-
 .../org/apache/spark/ui/jobs/AllStagesPage.scala   |   4 +-
 .../scala/org/apache/spark/ui/jobs/JobPage.scala   |   2 +-
 .../scala/org/apache/spark/ui/jobs/JobsTab.scala   |   2 +-
 .../scala/org/apache/spark/ui/jobs/PoolPage.scala  |   4 +-
 .../scala/org/apache/spark/ui/jobs/PoolTable.scala |   3 +-
 .../scala/org/apache/spark/ui/jobs/StagePage.scala |   3 +-
 .../org/apache/spark/ui/jobs/StageTable.scala  |   3 +-
 .../scala/org/apache/spark/ui/jobs/StagesTab.scala |   2 +-
 .../apache/spark/ui/jobs/TaskThreadDumpPage.scala  |   4 +-
 .../org/apache/spark/ui/storage/RDDPage.scala  |   3 +-
 .../org/apache/spark/ui/storage/StoragePage.scala  |   4 +-
 .../main/scala/org/apache/spark/util/Utils.scala   |   2 +-
 .../deploy/history/ApplicationCacheSuite.scala |   2 +-
 .../deploy/history/HistoryServerPageSuite.scala|   2 +-
 .../spark/deploy/history/HistoryServerSuite.scala  |   4 +-
 .../history/RealBrowserUIHistoryServerSuite.scala  |   3 +-
 .../deploy/master/ui/ApplicationPageSuite.scala|   2 +-
 .../master/ui/ReadOnlyMasterWebUISuite.scala   |   3 +-
 .../deploy/rest/StandaloneRestSubmitSuite.scala|   2 +-
 .../spark/status/api/v1/SimpleDateParamSuite.scala |   3 +-
 .../org/apache/spark/ui/DriverLogPageSuite.scala   |   2 +-
 .../apache/spark/ui/HttpSecurityFilterSuite.scala  |   4 +-
 .../scala/org/apache/spark/ui/StagePageSuite.scala |   3 +-
 .../org/apache/spark/ui/UISeleniumSuite.scala  |   2 +-
 .../test/scala/org/apache/spark/ui/UISuite.scala   |   4 +-
 .../apache/spark/ui/env/EnvironmentPageSuite.scala |   3 +-
 .../apache/spark/ui/storage/StoragePageSuite.scala |   3 +-
 dev/deps/spark-deps-hadoop-3-hive-2.3  |  36 +++---
 docs/core-migration-guide.md   |   2 +
 mllib/pom.xml  |   4 +
 pom.xml|  26 ++--
 project/MimaExcludes.scala |   5 +-
 project/SparkBuild.scala   |   4 +-
 .../deploy

(spark) branch master updated: [SPARK-47129][CONNECT][SQL] Make `ResolveRelations` cache connect plan properly

2024-02-23 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 06c741a0061b [SPARK-47129][CONNECT][SQL] Make `ResolveRelations` cache 
connect plan properly
06c741a0061b is described below

commit 06c741a0061bcf2c6e2c08212cab9f4e774cb70a
Author: Ruifeng Zheng 
AuthorDate: Fri Feb 23 09:26:13 2024 -0800

[SPARK-47129][CONNECT][SQL] Make `ResolveRelations` cache connect plan 
properly

### What changes were proposed in this pull request?
Make `ResolveRelations` handle plan id properly

### Why are the changes needed?
bug fix for Spark Connect, it won't affect classic Spark SQL

before this PR:
```
from pyspark.sql import functions as sf

spark.range(10).withColumn("value_1", 
sf.lit(1)).write.saveAsTable("test_table_1")
spark.range(10).withColumnRenamed("id", "index").withColumn("value_2", 
sf.lit(2)).write.saveAsTable("test_table_2")

df1 = spark.read.table("test_table_1")
df2 = spark.read.table("test_table_2")
df3 = spark.read.table("test_table_1")

join1 = df1.join(df2, on=df1.id==df2.index).select(df2.index, df2.value_2)
join2 = df3.join(join1, how="left", on=join1.index==df3.id)

join2.schema
```

fails with
```
AnalysisException: [CANNOT_RESOLVE_DATAFRAME_COLUMN] Cannot resolve 
dataframe column "id". It's probably because of illegal references like 
`df1.select(df2.col("a"))`. SQLSTATE: 42704
```

That is due to existing plan caching in `ResolveRelations` doesn't work 
with Spark Connect

```
=== Applying Rule 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations ===
 '[#12]Join LeftOuter, '`==`('index, 'id) '[#12]Join 
LeftOuter, '`==`('index, 'id)
!:- '[#9]UnresolvedRelation [test_table_1], [], false :- 
'[#9]SubqueryAlias spark_catalog.default.test_table_1
!+- '[#11]Project ['index, 'value_2]  :  +- 
'UnresolvedCatalogRelation `spark_catalog`.`default`.`test_table_1`, [], false
!   +- '[#10]Join Inner, '`==`('id, 'index)   +- 
'[#11]Project ['index, 'value_2]
!  :- '[#7]UnresolvedRelation [test_table_1], [], false  +- 
'[#10]Join Inner, '`==`('id, 'index)
!  +- '[#8]UnresolvedRelation [test_table_2], [], false :- 
'[#9]SubqueryAlias spark_catalog.default.test_table_1
!   :  +- 
'UnresolvedCatalogRelation `spark_catalog`.`default`.`test_table_1`, [], false
!   +- 
'[#8]SubqueryAlias spark_catalog.default.test_table_2
!  +- 
'UnresolvedCatalogRelation `spark_catalog`.`default`.`test_table_2`, [], false

Can not resolve 'id with plan 7
```

`[#7]UnresolvedRelation [test_table_1], [], false` was wrongly resolved to 
the cached one
```
:- '[#9]SubqueryAlias spark_catalog.default.test_table_1
   +- 'UnresolvedCatalogRelation `spark_catalog`.`default`.`test_table_1`, 
[], false
```

### Does this PR introduce _any_ user-facing change?
yes, bug fix

### How was this patch tested?
added ut

### Was this patch authored or co-authored using generative AI tooling?
ci

Closes #45214 from zhengruifeng/connect_fix_read_join.

Authored-by: Ruifeng Zheng 
Signed-off-by: Dongjoon Hyun 
---
 python/pyspark/sql/tests/test_readwriter.py| 23 +-
 .../spark/sql/catalyst/analysis/Analyzer.scala | 27 --
 2 files changed, 42 insertions(+), 8 deletions(-)

diff --git a/python/pyspark/sql/tests/test_readwriter.py 
b/python/pyspark/sql/tests/test_readwriter.py
index 70a320fc53b6..85057f37a181 100644
--- a/python/pyspark/sql/tests/test_readwriter.py
+++ b/python/pyspark/sql/tests/test_readwriter.py
@@ -20,7 +20,7 @@ import shutil
 import tempfile
 
 from pyspark.errors import AnalysisException
-from pyspark.sql.functions import col
+from pyspark.sql.functions import col, lit
 from pyspark.sql.readwriter import DataFrameWriterV2
 from pyspark.sql.types import StructType, StructField, StringType
 from pyspark.testing.sqlutils import ReusedSQLTestCase
@@ -181,6 +181,27 @@ class ReadwriterTestsMixin:
 df.write.mode("overwrite").insertInto("test_table", False)
 self.assertEqual(6, self.spark.sql(&quo

(spark) branch master updated: [SPARK-44914][BUILD] Upgrade Apache Ivy to 2.5.2

2024-02-23 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 3baa60afe25c [SPARK-44914][BUILD] Upgrade Apache Ivy to 2.5.2
3baa60afe25c is described below

commit 3baa60afe25c821ced1e956502f7c77b719f73dd
Author: Dongjoon Hyun 
AuthorDate: Fri Feb 23 08:36:32 2024 -0800

[SPARK-44914][BUILD] Upgrade Apache Ivy to 2.5.2

### What changes were proposed in this pull request?

This PR aims to upgrade Apache Ivy to 2.5.2 and protect old Ivy-based 
systems like old Spark from Apache Ivy 2.5.2's incompatibility by introducing a 
new `.ivy2.5.2` directory.

- Apache Spark 4.0.0 will create this once and reuse this directory while 
all the other systems like old Sparks uses the old one, `.ivy2`. So, the 
behavior is the same with the case where Apache Spark 4.0.0 is installed and 
used in a new machine.

- For the environments with `User-provided Ivy-path`es, the user might hit 
the incompatibility still. However, the users can mitigate them because they 
already have full control on `Ivy-path`es.

### Why are the changes needed?

This was tried once and reverted logically due to Java 11 and Java 17 
failures in Daily CIs.
- #42613
- #42668

Currently, PR Builder also fails as of now. If the PR passes CIes, we can 
achieve the following.

- [Release 
notes](https://lists.apache.org/thread/9gcz4xrsn8c7o9gb377xfzvkb8jltffr)
- FIX: CVE-2022-46751: Apache Ivy Is Vulnerable to XML External Entity 
Injections

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs including `HiveExternalCatalogVersionsSuite`.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45075 from dongjoon-hyun/SPARK-44914.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 .../main/scala/org/apache/spark/util/MavenUtils.scala   | 17 ++---
 .../test/scala/org/apache/spark/util/IvyTestUtils.scala |  3 ++-
 .../org/apache/spark/internal/config/package.scala  |  4 ++--
 dev/deps/spark-deps-hadoop-3-hive-2.3   |  2 +-
 dev/run-tests.py|  2 ++
 docs/core-migration-guide.md|  2 ++
 pom.xml |  6 +-
 7 files changed, 24 insertions(+), 12 deletions(-)

diff --git a/common/utils/src/main/scala/org/apache/spark/util/MavenUtils.scala 
b/common/utils/src/main/scala/org/apache/spark/util/MavenUtils.scala
index 65530b7fa473..08291859a32c 100644
--- a/common/utils/src/main/scala/org/apache/spark/util/MavenUtils.scala
+++ b/common/utils/src/main/scala/org/apache/spark/util/MavenUtils.scala
@@ -324,6 +324,14 @@ private[spark] object MavenUtils extends Logging {
 val ivySettings: IvySettings = new IvySettings
 try {
   ivySettings.load(file)
+  if (ivySettings.getDefaultIvyUserDir == null && 
ivySettings.getDefaultCache == null) {
+// To protect old Ivy-based systems like old Spark from Apache Ivy 
2.5.2's incompatibility.
+// `processIvyPathArg` can overwrite these later.
+val alternateIvyDir = System.getProperty("ivy.home",
+  System.getProperty("user.home") + File.separator + ".ivy2.5.2")
+ivySettings.setDefaultIvyUserDir(new File(alternateIvyDir))
+ivySettings.setDefaultCache(new File(alternateIvyDir, "cache"))
+  }
 } catch {
   case e @ (_: IOException | _: ParseException) =>
 throw new SparkException(s"Failed when loading Ivy settings from 
$settingsFile", e)
@@ -335,10 +343,13 @@ private[spark] object MavenUtils extends Logging {
 
   /* Set ivy settings for location of cache, if option is supplied */
   private def processIvyPathArg(ivySettings: IvySettings, ivyPath: 
Option[String]): Unit = {
-ivyPath.filterNot(_.trim.isEmpty).foreach { alternateIvyDir =>
-  ivySettings.setDefaultIvyUserDir(new File(alternateIvyDir))
-  ivySettings.setDefaultCache(new File(alternateIvyDir, "cache"))
+val alternateIvyDir = ivyPath.filterNot(_.trim.isEmpty).getOrElse {
+  // To protect old Ivy-based systems like old Spark from Apache Ivy 
2.5.2's incompatibility.
+  System.getProperty("ivy.home",
+System.getProperty("user.home") + File.separator + ".ivy2.5.2")
 }
+ivySettings.setDefaultIvyUserDir(new File(alternateIvyDir))
+ivySettings.setDefaultCache(new File(alternateIvyDir, "cache"))
   }
 
   /* Add any optional additional remote repositories */
diff --git 
a/common/utils/src/test/scala/org/apache/spark/util/IvyTestUtils.scala 
b/common/

(spark) branch master updated (d466c0beabcf -> 09739294ba1d)

2024-02-22 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from d466c0beabcf [SPARK-47142][K8S][TESTS] Use `spark.jars.ivy` instead 
`spark.driver.extraJavaOptions` in `DepsTestsSuite`
 add 09739294ba1d [SPARK-47143][CONNECT][TESTS] Improve `ArtifactSuite` to 
use unique `MavenCoordinate`s

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/connect/client/ArtifactSuite.scala  | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated (a053b40ac0e9 -> d466c0beabcf)

2024-02-22 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from a053b40ac0e9 [SPARK-47099][SQL][FOLLOW-UP] Uses ordinalNumber in 
UNEXPECTED_INPUT_TYPE
 add d466c0beabcf [SPARK-47142][K8S][TESTS] Use `spark.jars.ivy` instead 
`spark.driver.extraJavaOptions` in `DepsTestsSuite`

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/deploy/k8s/integrationtest/DepsTestsSuite.scala| 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-47137][PYTHON][CONNECT] Add getAll to spark.conf for feature parity with Scala

2024-02-22 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 511839b6eac9 [SPARK-47137][PYTHON][CONNECT] Add getAll to spark.conf 
for feature parity with Scala
511839b6eac9 is described below

commit 511839b6eac974351410a1713f5a90329e49abe9
Author: Takuya UESHIN 
AuthorDate: Thu Feb 22 20:22:43 2024 -0800

[SPARK-47137][PYTHON][CONNECT] Add getAll to spark.conf for feature parity 
with Scala

### What changes were proposed in this pull request?

Adds `getAll` to `spark.conf` for feature parity with Scala.

```py
>>> spark.conf.getAll
{'spark.sql.warehouse.dir': ...}
```

### Why are the changes needed?

Scala API provides `spark.conf.getAll`; whereas Python doesn't.

```scala
scala> spark.conf.getAll
val res0: Map[String,String] = HashMap(spark.sql.warehouse.dir -> ...
```

### Does this PR introduce _any_ user-facing change?

Yes, `spark.conf.getAll` will be available in PySpark.

### How was this patch tested?

Added the related tests.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45222 from ueshin/issues/SPARK-47137/getAll.

Authored-by: Takuya UESHIN 
Signed-off-by: Dongjoon Hyun 
---
 python/pyspark/sql/conf.py | 16 +-
 python/pyspark/sql/connect/conf.py | 15 +-
 python/pyspark/sql/tests/test_conf.py  | 63 ++
 .../scala/org/apache/spark/sql/RuntimeConfig.scala |  6 +++
 4 files changed, 75 insertions(+), 25 deletions(-)

diff --git a/python/pyspark/sql/conf.py b/python/pyspark/sql/conf.py
index e77039565dd1..dd43991b0706 100644
--- a/python/pyspark/sql/conf.py
+++ b/python/pyspark/sql/conf.py
@@ -16,7 +16,7 @@
 #
 
 import sys
-from typing import Any, Optional, Union
+from typing import Any, Dict, Optional, Union
 
 from py4j.java_gateway import JavaObject
 
@@ -93,6 +93,20 @@ class RuntimeConfig:
 self._check_type(default, "default")
 return self._jconf.get(key, default)
 
+@property
+def getAll(self) -> Dict[str, str]:
+"""
+Returns all properties set in this conf.
+
+.. versionadded:: 4.0.0
+
+Returns
+---
+dict
+A dictionary containing all properties set in this conf.
+"""
+return dict(self._jconf.getAllAsJava())
+
 def unset(self, key: str) -> None:
 """
 Resets the configuration property for the given key.
diff --git a/python/pyspark/sql/connect/conf.py 
b/python/pyspark/sql/connect/conf.py
index 3548a31fef03..57a669aca889 100644
--- a/python/pyspark/sql/connect/conf.py
+++ b/python/pyspark/sql/connect/conf.py
@@ -19,7 +19,7 @@ from pyspark.sql.connect.utils import check_dependencies
 
 check_dependencies(__name__)
 
-from typing import Any, Optional, Union, cast
+from typing import Any, Dict, Optional, Union, cast
 import warnings
 
 from pyspark import _NoValue
@@ -68,6 +68,19 @@ class RuntimeConf:
 
 get.__doc__ = PySparkRuntimeConfig.get.__doc__
 
+@property
+def getAll(self) -> Dict[str, str]:
+op_get_all = proto.ConfigRequest.GetAll()
+operation = proto.ConfigRequest.Operation(get_all=op_get_all)
+result = self._client.config(operation)
+confs: Dict[str, str] = dict()
+for key, value in result.pairs:
+assert value is not None
+confs[key] = value
+return confs
+
+getAll.__doc__ = PySparkRuntimeConfig.getAll.__doc__
+
 def unset(self, key: str) -> None:
 op_unset = proto.ConfigRequest.Unset(keys=[key])
 operation = proto.ConfigRequest.Operation(unset=op_unset)
diff --git a/python/pyspark/sql/tests/test_conf.py 
b/python/pyspark/sql/tests/test_conf.py
index 9b939205b1d1..68b147f09746 100644
--- a/python/pyspark/sql/tests/test_conf.py
+++ b/python/pyspark/sql/tests/test_conf.py
@@ -50,32 +50,49 @@ class ConfTestsMixin:
 def test_conf_with_python_objects(self):
 spark = self.spark
 
-for value, expected in [(True, "true"), (False, "false")]:
-spark.conf.set("foo", value)
-self.assertEqual(spark.conf.get("foo"), expected)
-
-spark.conf.set("foo", 1)
-self.assertEqual(spark.conf.get("foo"), "1")
-
-with self.assertRaises(IllegalArgumentException):
-spark.conf.set("foo", None)
-
-with self.assertRaises(Exception):
-spark.conf.set("foo", Decimal(1))
+try:
+for value, expected in 

(spark) branch master updated: [SPARK-43259][SQL][FOLLOWUP] Regenerate `sql-error-conditions.md` to recover `SparkThrowableSuite`

2024-02-22 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 6ae0abb64289 [SPARK-43259][SQL][FOLLOWUP] Regenerate 
`sql-error-conditions.md` to recover `SparkThrowableSuite`
6ae0abb64289 is described below

commit 6ae0abb64289c2124b2a2dd4043d010a06a14465
Author: Dongjoon Hyun 
AuthorDate: Thu Feb 22 17:26:32 2024 -0800

[SPARK-43259][SQL][FOLLOWUP] Regenerate `sql-error-conditions.md` to 
recover `SparkThrowableSuite`

### What changes were proposed in this pull request?

This is a follow-up of #45095

### Why are the changes needed?

To recover the broken `master` branch.
- https://github.com/apache/spark/actions/runs/8008631301/job/21875499011

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

I manually verified like the following.
```
[info] SparkThrowableSuite:
[info] - No duplicate error classes (23 milliseconds)
[info] - Error classes are correctly formatted (37 milliseconds)
[info] - SQLSTATE is mandatory (1 millisecond)
[info] - Error category and error state / SQLSTATE invariants (21 
milliseconds)
[info] - Message invariants (6 milliseconds)
[info] - Message format invariants (9 milliseconds)
[info] - Error classes match with document (54 milliseconds)
[info] - Round trip (23 milliseconds)
[info] - Error class names should contain only capital letters, numbers and 
underscores (5 milliseconds)
[info] - Check if error class is missing (14 milliseconds)
[info] - Check if message parameters match message format (2 milliseconds)
[info] - Error message is formatted (0 milliseconds)
[info] - Error message does not do substitution on values (0 milliseconds)
[info] - Try catching legacy SparkError (1 millisecond)
[info] - Try catching SparkError with error class (1 millisecond)
[info] - Try catching internal SparkError (1 millisecond)
[info] - Get message in the specified format (3 milliseconds)
[info] - overwrite error classes (47 milliseconds)
[info] - prohibit dots in error class names (15 milliseconds)
[info] Run completed in 1 second, 90 milliseconds.
[info] Total number of tests run: 19
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 19, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 7 s, completed Feb 22, 2024, 5:22:24 PM
```

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45226 from dongjoon-hyun/SPARK-43259.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 docs/sql-error-conditions.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/sql-error-conditions.md b/docs/sql-error-conditions.md
index 0745de995799..bb982a77fca0 100644
--- a/docs/sql-error-conditions.md
+++ b/docs/sql-error-conditions.md
@@ -1148,7 +1148,7 @@ Please increase executor memory using the 
--executor-memory option or "`
 
 [SQLSTATE: 
42001](sql-error-conditions-sqlstates.html#class-42-syntax-error-or-access-rule-violation)
 
-Found an invalid expression encoder. Expects an instance of 
`ExpressionEncoder` but got ``. For more information consult 
'``/api/java/index.html?org/apache/spark/sql/Encoder.html'.
+Found an invalid expression encoder. Expects an instance of ExpressionEncoder 
but got ``. For more information consult 
'``/api/java/index.html?org/apache/spark/sql/Encoder.html'.
 
 ### INVALID_EXTRACT_BASE_FIELD_TYPE
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-47136][CORE][TESTS] Fix `MavenUtilsSuite` to use `MavenUtils.resolveMavenCoordinates` properly

2024-02-22 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 9bc273ee0dad [SPARK-47136][CORE][TESTS] Fix `MavenUtilsSuite` to use 
`MavenUtils.resolveMavenCoordinates` properly
9bc273ee0dad is described below

commit 9bc273ee0daddef3a0d453ba6311e996bc56830d
Author: Dongjoon Hyun 
AuthorDate: Thu Feb 22 15:26:01 2024 -0800

[SPARK-47136][CORE][TESTS] Fix `MavenUtilsSuite` to use 
`MavenUtils.resolveMavenCoordinates` properly

### What changes were proposed in this pull request?

This PR aims the following.
1. Fix `MavenUtilsSuite` to use `MavenUtils.resolveMavenCoordinates` 
properly by using `ivyPath` parameter of `MavenUtils.loadIvySettings` method 
consistently.
2. Make all test cases isolated by adding `beforeEach` and `afterEach` 
instead of a single `beforeAll`

### Why are the changes needed?

1. `MavenUtils` assumes to set the following together inside if it receives 
`ivyPath`.


https://github.com/apache/spark/blob/9debaeaa5a079a73605cddb90b1a77274c5284d3/common/utils/src/main/scala/org/apache/spark/util/MavenUtils.scala#L337-L342

3. `MavenUtilsSuite` uses `tempIvyPath` for all 
`MavenUtils.resolveMavenCoordinates` except one test case.


https://github.com/apache/spark/blob/9debaeaa5a079a73605cddb90b1a77274c5284d3/common/utils/src/test/scala/org/apache/spark/util/MavenUtilsSuite.scala#L175-L175

4. The following is the missed case and this PR aims to fix.


https://github.com/apache/spark/blob/9debaeaa5a079a73605cddb90b1a77274c5284d3/common/utils/src/test/scala/org/apache/spark/util/MavenUtilsSuite.scala#L253

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

```
$ build/sbt "common-utils/testOnly *MavenUtilsSuite"
...
[info] MavenUtilsSuite:
[info] - incorrect maven coordinate throws error (9 milliseconds)
[info] - create repo resolvers (19 milliseconds)
[info] - create additional resolvers (7 milliseconds)
:: loading settings :: url = 
jar:file:/Users/dongjoon/Library/Caches/Coursier/v1/https/repo1.maven.org/maven2/org/apache/ivy/ivy/2.5.1/ivy-2.5.1.jar!/org/apache/ivy/core/settings/ivysettings.xml
[info] - add dependencies works correctly (29 milliseconds)
[info] - excludes works correctly (2 milliseconds)
[info] - ivy path works correctly (661 milliseconds)
[info] - search for artifact at local repositories (405 milliseconds)
[info] - dependency not found throws RuntimeException (198 milliseconds)
:: loading settings :: url = 
jar:file:/Users/dongjoon/Library/Caches/Coursier/v1/https/repo1.maven.org/maven2/org/apache/ivy/ivy/2.5.1/ivy-2.5.1.jar!/org/apache/ivy/core/settings/ivysettings.xml
[info] - neglects Spark and Spark's dependencies (388 milliseconds)
[info] - exclude dependencies end to end (385 milliseconds)
:: loading settings :: file = 
/Users/dongjoon/APACHE/spark-merge/target/tmp/ivy-9aa3863e-9dba-4002-996b-5e86b2f1281f/ivysettings.xml
[info] - load ivy settings file (103 milliseconds)
[info] - SPARK-10878: test resolution files cleaned after resolving 
artifact (70 milliseconds)
Spark was unable to load org/apache/spark/log4j2-defaults.properties
[info] - SPARK-34624: should ignore non-jar dependencies (247 milliseconds)
[info] Run completed in 3 seconds, 16 milliseconds.
[info] Total number of tests run: 13
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 13, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 3 s, completed Feb 22, 2024, 2:21:18 PM
```

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45220 from dongjoon-hyun/SPARK-47136.

    Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 .../scala/org/apache/spark/util/MavenUtilsSuite.scala| 16 ++--
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git 
a/common/utils/src/test/scala/org/apache/spark/util/MavenUtilsSuite.scala 
b/common/utils/src/test/scala/org/apache/spark/util/MavenUtilsSuite.scala
index 642eca3cf933..d30422ca8dd5 100644
--- a/common/utils/src/test/scala/org/apache/spark/util/MavenUtilsSuite.scala
+++ b/common/utils/src/test/scala/org/apache/spark/util/MavenUtilsSuite.scala
@@ -28,14 +28,14 @@ import scala.jdk.CollectionConverters._
 import org.apache.ivy.core.module.descriptor.MDArtifact
 import org.apache.ivy.core.settings.IvySettings
 import org.apache.ivy.plugins.resolver.{AbstractResolver, ChainResolver, 
FileSystemResolver, IBiblioResolver}
-import org.scalatest.BeforeAndAfterAll
+import org.scalatest.BeforeAndAfterEach
 import org.scalatest.funsu

<    4   5   6   7   8   9   10   11   12   13   >