(spark) branch master updated: [SPARK-47112][INFRA] Write logs into a file in SparkR Windows build

2024-02-20 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 0d938de112df [SPARK-47112][INFRA] Write logs into a file in SparkR 
Windows build
0d938de112df is described below

commit 0d938de112dfe142f40d4c6f86cfa1e6e32210ec
Author: Hyukjin Kwon 
AuthorDate: Wed Feb 21 16:39:37 2024 +0900

[SPARK-47112][INFRA] Write logs into a file in SparkR Windows build

### What changes were proposed in this pull request?

We used to write Log4J logs into `target/unit-tests.log` instead of 
console. This seems to be broken in SparkR Windows job. This PR fixes it.

### Why are the changes needed?


https://github.com/apache/spark/actions/runs/7977185456/job/21779508822#step:10:89
This write too many logs, and difficult to see the real test cases.

### Does this PR introduce _any_ user-facing change?

No, dev-only.

### How was this patch tested?

In my fork

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45192 from HyukjinKwon/reduce-logs.

Authored-by: Hyukjin Kwon 
Signed-off-by: Hyukjin Kwon 
---
 .github/workflows/build_sparkr_window.yml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.github/workflows/build_sparkr_window.yml 
b/.github/workflows/build_sparkr_window.yml
index 07f4ebe91ad2..fbaca36f9f87 100644
--- a/.github/workflows/build_sparkr_window.yml
+++ b/.github/workflows/build_sparkr_window.yml
@@ -71,7 +71,7 @@ jobs:
   run: |
 set HADOOP_HOME=%USERPROFILE%\hadoop-3.3.5
 set PATH=%HADOOP_HOME%\bin;%PATH%
-.\bin\spark-submit2.cmd --driver-java-options 
"-Dlog4j.configuration=file:///%CD:\=/%/R/log4j2.properties" --conf 
spark.hadoop.fs.defaultFS="file:///" R\pkg\tests\run-all.R
+.\bin\spark-submit2.cmd --driver-java-options 
"-Dlog4j.configurationFile=file:///%CD:\=/%/R/log4j2.properties" --conf 
spark.hadoop.fs.defaultFS="file:///" R\pkg\tests\run-all.R
   shell: cmd
   env:
 NOT_CRAN: true


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-47113][CORE] Revert S3A endpoint fixup logic of SPARK-35878

2024-02-20 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new b3e34629080b [SPARK-47113][CORE] Revert S3A endpoint fixup logic of 
SPARK-35878
b3e34629080b is described below

commit b3e34629080bfbbc0615bb16a961b9298c5d4756
Author: Steve Loughran 
AuthorDate: Tue Feb 20 23:12:11 2024 -0800

[SPARK-47113][CORE] Revert S3A endpoint fixup logic of SPARK-35878

### What changes were proposed in this pull request?

Revert [SPARK-35878][CORE] Add fs.s3a.endpoint if unset and 
fs.s3a.endpoint.region is null

Removing the region/endpoint patching code of SPARK-35878 avoids 
authentication problems with versions of the S3A connector built with AWS v2 
SDK -as is the case in Hadoop 3.4.0.

That is: if fs.s3a.endpoint is unset it will stay unset.

The v2 SDK does its binding to AWS Services differently, in what can be 
described as "region first" binding. Spark setting the endpoint blocks S3 
Express support and is incompatible with HADOOP-18975 S3A: Add option 
fs.s3a.endpoint.fips to use AWS FIPS endpoints

- https://github.com/apache/hadoop/pull/6277

The change is compatible with all releases of the s3a connector other than 
hadoop 3.3.1 binaries deployed outside EC2 and without the endpoint explicitly 
set.

### Why are the changes needed?

AWS v2 SDK has a different/complex binding mechanism; it doesn't need the 
endpoint to
be set if the region (fs.s3a.region) value is set. This means the spark 
code to
fix an endpoint is not only un-needed, it causes problems when trying to 
use specific
storage options (S3 Express) or security options (FIPS)

### Does this PR introduce _any_ user-facing change?

Only visible on hadoop 3.3.1 s3a connector when deployed outside of EC2 
-the situation the original patch was added to work around. All other 3.3.x 
releases are good.

### How was this patch tested?

Removed some obsolete tests. Relying on github and jenkins to do the 
testing so marking this PR as WiP until they are happy.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #45193 from dongjoon-hyun/SPARK-47113.

Authored-by: Steve Loughran 
Signed-off-by: Dongjoon Hyun 
---
 .../org/apache/spark/deploy/SparkHadoopUtil.scala  | 10 ---
 .../apache/spark/deploy/SparkHadoopUtilSuite.scala | 33 --
 2 files changed, 43 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala 
b/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala
index 628b688dedba..2edd80db2637 100644
--- a/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala
@@ -529,16 +529,6 @@ private[spark] object SparkHadoopUtil extends Logging {
 if 
(conf.getOption("spark.hadoop.fs.s3a.downgrade.syncable.exceptions").isEmpty) {
   hadoopConf.set("fs.s3a.downgrade.syncable.exceptions", "true", 
setBySpark)
 }
-// In Hadoop 3.3.1, AWS region handling with the default "" endpoint only 
works
-// in EC2 deployments or when the AWS CLI is installed.
-// The workaround is to set the name of the S3 endpoint explicitly,
-// if not already set. See HADOOP-17771.
-if (hadoopConf.get("fs.s3a.endpoint", "").isEmpty &&
-  hadoopConf.get("fs.s3a.endpoint.region") == null) {
-  // set to US central endpoint which can also connect to buckets
-  // in other regions at the expense of a HEAD request during fs creation
-  hadoopConf.set("fs.s3a.endpoint", "s3.amazonaws.com", setBySpark)
-}
   }
 
   private def appendSparkHiveConfigs(conf: SparkConf, hadoopConf: 
Configuration): Unit = {
diff --git 
a/core/src/test/scala/org/apache/spark/deploy/SparkHadoopUtilSuite.scala 
b/core/src/test/scala/org/apache/spark/deploy/SparkHadoopUtilSuite.scala
index 2326d10d4164..9a81cb947257 100644
--- a/core/src/test/scala/org/apache/spark/deploy/SparkHadoopUtilSuite.scala
+++ b/core/src/test/scala/org/apache/spark/deploy/SparkHadoopUtilSuite.scala
@@ -39,19 +39,6 @@ class SparkHadoopUtilSuite extends SparkFunSuite {
 assertConfigMatches(hadoopConf, "orc.filterPushdown", "true", 
SOURCE_SPARK_HADOOP)
 assertConfigMatches(hadoopConf, "fs.s3a.downgrade.syncable.exceptions", 
"true",
   SET_TO_DEFAULT_VALUES)
-assertConfigMatches(hadoopConf, "fs.s3a.endpoint", "s3.amazonaws.com", 
SET_TO_DEFAULT_VALUES)
-  }
-
-  /**
-   * An empty S3A endpoint will be overridden just as a null value
-   * would.
-   */
-  test("appendSparkHadoopConfigs with S3A endpoint set to empty string") {
-val sc = new SparkConf()
-val hadoopConf = new Configuration(false)
-sc.set("spark.hadoop.fs.s3a.endpoint", "")
-   

(spark) branch master updated: [SPARK-47115][INFRA] Use larger memory for Maven builds

2024-02-20 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new f4bb6528596b [SPARK-47115][INFRA] Use larger memory for Maven builds
f4bb6528596b is described below

commit f4bb6528596b6809391664e594be8c7a154529d8
Author: Hyukjin Kwon 
AuthorDate: Wed Feb 21 16:06:26 2024 +0900

[SPARK-47115][INFRA] Use larger memory for Maven builds

### What changes were proposed in this pull request?

This PR proposes to use bigger memory during Maven builds. GitHub Actions 
runners now have more memory than before 
(https://docs.github.com/en/actions/using-github-hosted-runners/about-larger-runners/about-larger-runners)
 so we can increase.

https://github.com/HyukjinKwon/spark/actions/runs/7984135094/job/21800463337

### Why are the changes needed?

For stable Maven builds.
Some tests consistently fail:

```
*** RUN ABORTED ***
An exception or error caused a run to abort: unable to create native 
thread: possibly out of memory or process/resource limits reached
  java.lang.OutOfMemoryError: unable to create native thread: possibly out 
of memory or process/resource limits reached
  at java.base/java.lang.Thread.start0(Native Method)
  at java.base/java.lang.Thread.start(Thread.java:1553)
  at java.base/java.lang.System$2.start(System.java:2577)
  at 
java.base/jdk.internal.vm.SharedThreadContainer.start(SharedThreadContainer.java:152)
  at 
java.base/java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:953)
  at 
java.base/java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1364)
  at 
org.apache.spark.rpc.netty.SharedMessageLoop.$anonfun$threadpool$1(MessageLoop.scala:128)
  at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:190)
  at 
org.apache.spark.rpc.netty.SharedMessageLoop.(MessageLoop.scala:127)
  at 
org.apache.spark.rpc.netty.Dispatcher.sharedLoop$lzycompute(Dispatcher.scala:46)
  ...
Warning:  The requested profile "volcano" could not be activated because it 
does not exist.
Warning:  The requested profile "hive" could not be activated because it 
does not exist.
Error:  Failed to execute goal 
org.scalatest:scalatest-maven-plugin:2.2.0:test (test) on project 
spark-core_2.13: There are test failures -> [Help 1]
Error:
Error:  To see the full stack trace of the errors, re-run Maven with the -e 
switch.
Error:  Re-run Maven using the -X switch to enable full debug logging.
Error:
Error:  For more information about the errors and possible solutions, 
please read the following articles:
Error:  [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
Error:
Error:  After correcting the problems, you can resume the build with the 
command
Error:mvn  -rf :spark-core_2.13
Error: Process completed with exit code 1.
```

### Does this PR introduce _any_ user-facing change?

No, dev-only

### How was this patch tested?

Will monitor the scheduled jobs. It's a simple memory configuration change.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45195 from HyukjinKwon/bigger-macos.

Authored-by: Hyukjin Kwon 
Signed-off-by: Hyukjin Kwon 
---
 .github/workflows/maven_test.yml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.github/workflows/maven_test.yml b/.github/workflows/maven_test.yml
index d63066a521f9..b2f0118d2e7b 100644
--- a/.github/workflows/maven_test.yml
+++ b/.github/workflows/maven_test.yml
@@ -185,7 +185,7 @@ jobs:
   - name: Run tests
 env: ${{ fromJSON(inputs.envs) }}
 run: |
-  export MAVEN_OPTS="-Xss64m -Xmx4g -Xms4g 
-XX:ReservedCodeCacheSize=128m -Dorg.slf4j.simpleLogger.defaultLogLevel=WARN"
+  export MAVEN_OPTS="-Xss64m -Xmx6g -Xms6g 
-XX:ReservedCodeCacheSize=128m -Dorg.slf4j.simpleLogger.defaultLogLevel=WARN"
   export MAVEN_CLI_OPTS="--no-transfer-progress"
   export JAVA_VERSION=${{ matrix.java }}
   # Replace with the real module name, for example, 
connector#kafka-0-10 -> connector/kafka-0-10


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: Revert "[SPARK-35878][CORE] Revert S3A endpoint fixup logic of SPARK-35878"

2024-02-20 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 011b843687d8 Revert "[SPARK-35878][CORE] Revert S3A endpoint fixup 
logic of SPARK-35878"
011b843687d8 is described below

commit 011b843687d8ae36b03e8d3d177b0bf43e7d29b6
Author: Dongjoon Hyun 
AuthorDate: Tue Feb 20 22:26:56 2024 -0800

Revert "[SPARK-35878][CORE] Revert S3A endpoint fixup logic of SPARK-35878"

This reverts commit 36f199d1e41276c78036355eac1dac092e65aabe.
---
 .../org/apache/spark/deploy/SparkHadoopUtil.scala  | 10 +++
 .../apache/spark/deploy/SparkHadoopUtilSuite.scala | 33 ++
 2 files changed, 43 insertions(+)

diff --git a/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala 
b/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala
index 2edd80db2637..628b688dedba 100644
--- a/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala
@@ -529,6 +529,16 @@ private[spark] object SparkHadoopUtil extends Logging {
 if 
(conf.getOption("spark.hadoop.fs.s3a.downgrade.syncable.exceptions").isEmpty) {
   hadoopConf.set("fs.s3a.downgrade.syncable.exceptions", "true", 
setBySpark)
 }
+// In Hadoop 3.3.1, AWS region handling with the default "" endpoint only 
works
+// in EC2 deployments or when the AWS CLI is installed.
+// The workaround is to set the name of the S3 endpoint explicitly,
+// if not already set. See HADOOP-17771.
+if (hadoopConf.get("fs.s3a.endpoint", "").isEmpty &&
+  hadoopConf.get("fs.s3a.endpoint.region") == null) {
+  // set to US central endpoint which can also connect to buckets
+  // in other regions at the expense of a HEAD request during fs creation
+  hadoopConf.set("fs.s3a.endpoint", "s3.amazonaws.com", setBySpark)
+}
   }
 
   private def appendSparkHiveConfigs(conf: SparkConf, hadoopConf: 
Configuration): Unit = {
diff --git 
a/core/src/test/scala/org/apache/spark/deploy/SparkHadoopUtilSuite.scala 
b/core/src/test/scala/org/apache/spark/deploy/SparkHadoopUtilSuite.scala
index 9a81cb947257..2326d10d4164 100644
--- a/core/src/test/scala/org/apache/spark/deploy/SparkHadoopUtilSuite.scala
+++ b/core/src/test/scala/org/apache/spark/deploy/SparkHadoopUtilSuite.scala
@@ -39,6 +39,19 @@ class SparkHadoopUtilSuite extends SparkFunSuite {
 assertConfigMatches(hadoopConf, "orc.filterPushdown", "true", 
SOURCE_SPARK_HADOOP)
 assertConfigMatches(hadoopConf, "fs.s3a.downgrade.syncable.exceptions", 
"true",
   SET_TO_DEFAULT_VALUES)
+assertConfigMatches(hadoopConf, "fs.s3a.endpoint", "s3.amazonaws.com", 
SET_TO_DEFAULT_VALUES)
+  }
+
+  /**
+   * An empty S3A endpoint will be overridden just as a null value
+   * would.
+   */
+  test("appendSparkHadoopConfigs with S3A endpoint set to empty string") {
+val sc = new SparkConf()
+val hadoopConf = new Configuration(false)
+sc.set("spark.hadoop.fs.s3a.endpoint", "")
+new SparkHadoopUtil().appendSparkHadoopConfigs(sc, hadoopConf)
+assertConfigMatches(hadoopConf, "fs.s3a.endpoint", "s3.amazonaws.com", 
SET_TO_DEFAULT_VALUES)
   }
 
   /**
@@ -48,8 +61,28 @@ class SparkHadoopUtilSuite extends SparkFunSuite {
 val sc = new SparkConf()
 val hadoopConf = new Configuration(false)
 sc.set("spark.hadoop.fs.s3a.downgrade.syncable.exceptions", "false")
+sc.set("spark.hadoop.fs.s3a.endpoint", "s3-eu-west-1.amazonaws.com")
 new SparkHadoopUtil().appendSparkHadoopConfigs(sc, hadoopConf)
 assertConfigValue(hadoopConf, "fs.s3a.downgrade.syncable.exceptions", 
"false")
+assertConfigValue(hadoopConf, "fs.s3a.endpoint",
+  "s3-eu-west-1.amazonaws.com")
+  }
+
+  /**
+   * If the endpoint region is set (even to a blank string) in
+   * "spark.hadoop.fs.s3a.endpoint.region" then the endpoint is not set,
+   * even when the s3a endpoint is "".
+   * This supports a feature in hadoop 3.3.1 where this configuration
+   * pair triggers a revert to the "SDK to work out the region" algorithm,
+   * which works on EC2 deployments.
+   */
+  test("appendSparkHadoopConfigs with S3A endpoint region set to an empty 
string") {
+val sc = new SparkConf()
+val hadoopConf = new Configuration(false)
+sc.set("spark.hadoop.fs.s3a.endpoint.region", "")
+new SparkHadoopUtil().appendSparkHadoopConfigs(sc, hadoopConf)
+// the endpoint value will not have been set
+assertConfigValue(hadoopConf, "fs.s3a.endpoint", null)
   }
 
   /**


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated (0b907ed11e6e -> 36f199d1e412)

2024-02-20 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 0b907ed11e6e [SPARK-46928][SS] Add support for ListState in Arbitrary 
State API v2
 add 36f199d1e412 [SPARK-35878][CORE] Revert S3A endpoint fixup logic of 
SPARK-35878

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/deploy/SparkHadoopUtil.scala  | 10 ---
 .../apache/spark/deploy/SparkHadoopUtilSuite.scala | 33 --
 2 files changed, 43 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-46928][SS] Add support for ListState in Arbitrary State API v2

2024-02-20 Thread kabhwan
This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 0b907ed11e6e [SPARK-46928][SS] Add support for ListState in Arbitrary 
State API v2
0b907ed11e6e is described below

commit 0b907ed11e6ec6bc4c7d07926ed352806636d58a
Author: Bhuwan Sahni 
AuthorDate: Wed Feb 21 14:43:27 2024 +0900

[SPARK-46928][SS] Add support for ListState in Arbitrary State API v2

### What changes were proposed in this pull request?

This PR adds changes for ListState implementation in State Api v2. As a 
list contains multiple values for a single key, we utilize RocksDB merge 
operator to persist multiple values.

Changes include

1. A new encoder/decoder to encode multiple values inside a single byte[] 
array (stored in RocksDB). The encoding scheme is compatible with RocksDB 
StringAppendOperator merge operator.
2. Support merge operations in ChangelogCheckpointing v2.
3. Extend StateStore to support merge operation, and read multiple values 
for a single key (via a Iterator). Note that these changes are only supported 
for RocksDB currently.

### Why are the changes needed?

These changes are needed to support list values in the State Store. The 
changes are part of the work around adding new stateful streaming operator for 
arbitrary state mgmt that provides a bunch of new features listed in the SPIP 
JIRA here - https://issues.apache.org/jira/browse/SPARK-45939

### Does this PR introduce _any_ user-facing change?

Yes
This PR introduces a new state type (ListState) that users can use in their 
Spark streaming queries.

### How was this patch tested?

1. Added a new test suite for ListState to ensure the state produces 
correct results.
2. Added additional testcases for input validation.
3. Added tests for merge operator with RocksDB.
4. Added tests for changelog checkpointing merge operator.
5. Added tests for reading merged values in RocksDBStateStore.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #44961 from sahnib/state-api-v2-list-state.

Authored-by: Bhuwan Sahni 
Signed-off-by: Jungtaek Lim 
---
 .../src/main/resources/error/error-classes.json|  18 ++
 ...itions-illegal-state-store-value-error-class.md |  41 +++
 docs/sql-error-conditions.md   |   8 +
 .../{ValueState.scala => ListState.scala}  |  30 +-
 .../sql/streaming/StatefulProcessorHandle.scala|  10 +
 .../apache/spark/sql/streaming/ValueState.scala|   2 +-
 .../v2/state/StatePartitionReader.scala|   2 +-
 .../sql/execution/streaming/ListStateImpl.scala| 121 
 .../streaming/StateTypesEncoderUtils.scala |  88 ++
 .../streaming/StatefulProcessorHandleImpl.scala|   8 +-
 .../streaming/TransformWithStateExec.scala |   6 +-
 .../sql/execution/streaming/ValueStateImpl.scala   |  61 +---
 .../state/HDFSBackedStateStoreProvider.scala   |  27 +-
 .../sql/execution/streaming/state/RocksDB.scala|  37 +++
 .../streaming/state/RocksDBStateEncoder.scala  |  96 +-
 .../state/RocksDBStateStoreProvider.scala  |  53 +++-
 .../sql/execution/streaming/state/StateStore.scala |  53 +++-
 .../streaming/state/StateStoreChangelog.scala  |  48 ++-
 .../streaming/state/StateStoreErrors.scala |  12 +
 .../execution/streaming/state/StateStoreRDD.scala  |   5 +-
 .../state/SymmetricHashJoinStateManager.scala  |   3 +-
 .../sql/execution/streaming/state/package.scala|   6 +-
 .../streaming/state/MemoryStateStore.scala |  11 +-
 .../streaming/state/RocksDBStateStoreSuite.scala   |  56 +++-
 .../execution/streaming/state/RocksDBSuite.scala   |  84 ++
 .../streaming/state/StateStoreSuite.scala  |   7 +-
 .../streaming/state/ValueStateSuite.scala  |  12 +-
 .../apache/spark/sql/streaming/StreamSuite.scala   |   3 +-
 .../streaming/TransformWithListStateSuite.scala| 328 +
 .../sql/streaming/TransformWithStateSuite.scala|   2 +-
 30 files changed, 1120 insertions(+), 118 deletions(-)

diff --git a/common/utils/src/main/resources/error/error-classes.json 
b/common/utils/src/main/resources/error/error-classes.json
index c1b1171b5dc8..b30b1d60bb4a 100644
--- a/common/utils/src/main/resources/error/error-classes.json
+++ b/common/utils/src/main/resources/error/error-classes.json
@@ -1380,6 +1380,24 @@
 ],
 "sqlState" : "42601"
   },
+  "ILLEGAL_STATE_STORE_VALUE" : {
+"message" : [
+  "Illegal value provided to the State Store"
+],
+"subClass" : {
+  "EMPTY_LIST_VALUE" : {
+"message" : [
+  "Cannot write empty list values to State Store for StateName 
."
+]
+  },
+  "NULL_VALUE" : {
+

(spark) branch master updated: [SPARK-47095][INFRA][FOLLOW-UP] Remove TTY specific workaround in Maven build

2024-02-20 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new b96914b637ee [SPARK-47095][INFRA][FOLLOW-UP] Remove TTY specific 
workaround in Maven build
b96914b637ee is described below

commit b96914b637ee0692f3c836c2637863704c6b73fa
Author: Hyukjin Kwon 
AuthorDate: Tue Feb 20 21:42:12 2024 -0800

[SPARK-47095][INFRA][FOLLOW-UP] Remove TTY specific workaround in Maven 
build

### What changes were proposed in this pull request?

This PR is a followup of https://github.com/apache/spark/pull/45171 that 
broke the scheduled build of macos-14.
Here I remove TTY specific workaround in Maven build, and skips 
`AmmoniteTest` that needs the workaround.
We should enable the tests back when the bug is fixed (see 
https://github.com/apache/spark/pull/40675#issuecomment-1513102087)

### Why are the changes needed?

To fix up the build, It fails 
https://github.com/apache/spark/actions/runs/7979285164

See also https://github.com/apache/spark/pull/45186#discussion_r1496839930

### Does this PR introduce _any_ user-facing change?

No, test-only.

### How was this patch tested?

In my fork.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45186 from HyukjinKwon/SPARK-47095-followup.

Lead-authored-by: Hyukjin Kwon 
Co-authored-by: Hyukjin Kwon 
Signed-off-by: Dongjoon Hyun 
---
 .github/workflows/maven_test.yml | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/.github/workflows/maven_test.yml b/.github/workflows/maven_test.yml
index 80898b3a507a..d63066a521f9 100644
--- a/.github/workflows/maven_test.yml
+++ b/.github/workflows/maven_test.yml
@@ -73,13 +73,19 @@ jobs:
 
connector#kafka-0-10,connector#kafka-0-10-sql,connector#kafka-0-10-token-provider,connector#spark-ganglia-lgpl,connector#protobuf,connector#avro
   - >-
 
sql#api,sql#catalyst,resource-managers#yarn,resource-managers#kubernetes#core
-  - >-
-connect
 # Here, we split Hive and SQL tests into some of slow ones and the 
rest of them.
 included-tags: [ "" ]
 excluded-tags: [ "" ]
 comment: [ "" ]
 include:
+  # Connect tests
+  - modules: connect
+java: ${{ inputs.java }}
+hadoop: ${{ inputs.hadoop }}
+hive: hive2.3
+# TODO(SPARK-47110): Reenble AmmoniteTest tests in Maven builds
+excluded-tags: org.apache.spark.tags.AmmoniteTest
+comment: ""
   # Hive tests
   - modules: sql#hive
 java: ${{ inputs.java }}
@@ -178,13 +184,7 @@ jobs:
   # Run the tests.
   - name: Run tests
 env: ${{ fromJSON(inputs.envs) }}
-# The command script takes different options ubuntu vs macos-14, see 
also SPARK-47095.
-shell: '[[ "${{ inputs.os }}" == *"ubuntu"* ]] && script -q -e -c 
"bash {0}" || script -q -e "bash {0}"'
 run: |
-  # Fix for TTY related issues when launching the Ammonite REPL in 
tests.
-  export TERM=vt100
-  # `set -e` to make the exit status as expected due to use `script -q 
-e -c` to run the commands
-  set -e
   export MAVEN_OPTS="-Xss64m -Xmx4g -Xms4g 
-XX:ReservedCodeCacheSize=128m -Dorg.slf4j.simpleLogger.defaultLogLevel=WARN"
   export MAVEN_CLI_OPTS="--no-transfer-progress"
   export JAVA_VERSION=${{ matrix.java }}
@@ -193,10 +193,10 @@ jobs:
   ./build/mvn $MAVEN_CLI_OPTS -DskipTests -Pyarn -Pkubernetes 
-Pvolcano -Phive -Phive-thriftserver -Phadoop-cloud -Pspark-ganglia-lgpl 
-Djava.version=${JAVA_VERSION/-ea} clean install
   if [[ "$INCLUDED_TAGS" != "" ]]; then
 ./build/mvn $MAVEN_CLI_OPTS -pl "$TEST_MODULES" -Pyarn 
-Pkubernetes -Pvolcano -Phive -Phive-thriftserver -Phadoop-cloud 
-Pspark-ganglia-lgpl -Djava.version=${JAVA_VERSION/-ea} 
-Dtest.include.tags="$INCLUDED_TAGS" test -fae
+  elif [[ "$MODULES_TO_TEST" == "connect" ]]; then
+./build/mvn $MAVEN_CLI_OPTS -Dtest.exclude.tags="$EXCLUDED_TAGS" 
-Djava.version=${JAVA_VERSION/-ea} -pl 
connector/connect/client/jvm,connector/connect/common,connector/connect/server 
test -fae
   elif [[ "$EXCLUDED_TAGS" != "" ]]; then
 ./build/mvn $MAVEN_CLI_OPTS -pl "$TEST_MODULES" -Pyarn 
-Pkubernetes -Pvolcano -Phive -Phive-thriftserver -Phadoop-cloud 
-Pspark-ganglia-lgpl -Djava.version=${JAVA_VERSION/-ea} 
-Dtest.exclude.tags="$EXCLUDED_TAGS" test -fae
-  elif [[ "$MODULES_TO_TEST" == "connect" ]]; then
-./build/mvn $MAVEN_CLI_OPTS -Djava.version=${JAVA_VERSION/-ea} -pl 
connector/connect/client/jvm,connector/connect/common,connector/co

(spark) branch master updated: [SPARK-47052][SS] Separate state tracking variables from MicroBatchExecution/StreamExecution

2024-02-20 Thread kabhwan
This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new bffa92c838d6 [SPARK-47052][SS] Separate state tracking variables from 
MicroBatchExecution/StreamExecution
bffa92c838d6 is described below

commit bffa92c838d6650249a6e71bb0ef8189cf970383
Author: Jerry Peng 
AuthorDate: Wed Feb 21 12:58:48 2024 +0900

[SPARK-47052][SS] Separate state tracking variables from 
MicroBatchExecution/StreamExecution

### What changes were proposed in this pull request?

To improve code clarity and maintainability, I propose that we move all the 
variables that track mutable state and metrics for a streaming query into a 
separate class.  With this refactor, it would be easy to track and find all the 
mutable state a microbatch can have.

### Why are the changes needed?

To improve code clarity and maintainability.  All the state and metrics 
that is needed for the execution lifecycle of a microbatch is consolidated into 
one class.  If we decide to modify or add additional state to a streaming 
query, it will be easier to determine 1) where to add it 2) what existing state 
are there.

### Does this PR introduce _any_ user-facing change?

No
### How was this patch tested?

Existing tests should suffice

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #45109 from jerrypeng/SPARK-47052.

Authored-by: Jerry Peng 
Signed-off-by: Jungtaek Lim 
---
 .../sql/execution/streaming/AsyncLogPurge.scala|  11 +-
 .../AsyncProgressTrackingMicroBatchExecution.scala |  30 +-
 .../execution/streaming/MicroBatchExecution.scala  | 422 ++---
 .../sql/execution/streaming/ProgressReporter.scala | 521 +
 .../sql/execution/streaming/StreamExecution.scala  | 112 +++--
 .../streaming/StreamExecutionContext.scala | 233 +
 .../sql/execution/streaming/TriggerExecutor.scala  |  24 +-
 .../streaming/continuous/ContinuousExecution.scala |  56 ++-
 .../streaming/ProcessingTimeExecutorSuite.scala|   6 +-
 9 files changed, 945 insertions(+), 470 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/AsyncLogPurge.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/AsyncLogPurge.scala
index b3729dbc7b45..aa393211a1c1 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/AsyncLogPurge.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/AsyncLogPurge.scala
@@ -29,11 +29,8 @@ import org.apache.spark.util.ThreadUtils
  */
 trait AsyncLogPurge extends Logging {
 
-  protected var currentBatchId: Long
-
   protected val minLogEntriesToMaintain: Int
 
-
   protected[sql] val errorNotifier: ErrorNotifier
 
   protected val sparkSession: SparkSession
@@ -47,15 +44,11 @@ trait AsyncLogPurge extends Logging {
 
   protected lazy val useAsyncPurge: Boolean = 
sparkSession.conf.get(SQLConf.ASYNC_LOG_PURGE)
 
-  protected def purgeAsync(): Unit = {
+  protected def purgeAsync(batchId: Long): Unit = {
 if (purgeRunning.compareAndSet(false, true)) {
-  // save local copy because currentBatchId may get updated.  There are 
not really
-  // any concurrency issues here in regards to calculating the purge 
threshold
-  // but for the sake of defensive coding lets make a copy
-  val currentBatchIdCopy: Long = currentBatchId
   asyncPurgeExecutorService.execute(() => {
 try {
-  purge(currentBatchIdCopy - minLogEntriesToMaintain)
+  purge(batchId - minLogEntriesToMaintain)
 } catch {
   case throwable: Throwable =>
 logError("Encountered error while performing async log purge", 
throwable)
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/AsyncProgressTrackingMicroBatchExecution.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/AsyncProgressTrackingMicroBatchExecution.scala
index 206efb9a5450..ec24ec0fd335 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/AsyncProgressTrackingMicroBatchExecution.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/AsyncProgressTrackingMicroBatchExecution.scala
@@ -110,12 +110,12 @@ class AsyncProgressTrackingMicroBatchExecution(
 }
   }
 
-  override def markMicroBatchExecutionStart(): Unit = {
+  override def markMicroBatchExecutionStart(execCtx: 
MicroBatchExecutionContext): Unit = {
 // check if streaming query is stateful
 checkNotStatefulStreamingQuery
   }
 
-  override def cleanUpLastExecutedMicroBatch(): Unit = {
+  override def cleanUpLastExecutedMicroBatch(execCtx: 
MicroBatchExecutionContext): Unit = {
 // this is a no op for async progress tracking si

(spark) branch master updated: [SPARK-47111][SQL][TESTS] Upgrade `PostgreSQL` JDBC driver to 42.7.2 and docker image to 16.2

2024-02-20 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 6e6197073a57 [SPARK-47111][SQL][TESTS] Upgrade `PostgreSQL` JDBC 
driver to 42.7.2 and docker image to 16.2
6e6197073a57 is described below

commit 6e6197073a57a40502f12e9fc0cfd8f2d5f84585
Author: Dongjoon Hyun 
AuthorDate: Tue Feb 20 19:49:44 2024 -0800

[SPARK-47111][SQL][TESTS] Upgrade `PostgreSQL` JDBC driver to 42.7.2 and 
docker image to 16.2

### What changes were proposed in this pull request?

This PR aims to upgrade `PostgreSQL` JDBC driver and docker images.
- JDBC Driver: `org.postgresql:postgresql` from 42.7.0 to 42.7.2
- Docker Image: `postgres` from `15.1-alpine` to `16.2-alpine`

### Why are the changes needed?

To use the latest PostgreSQL combination in the following integration tests.

- PostgresIntegrationSuite
- PostgresKrbIntegrationSuite
- GeneratedSubquerySuite
- PostgreSQLQueryTestSuite
- v2/PostgresIntegrationSuite
- v2/PostgresNamespaceSuite

### Does this PR introduce _any_ user-facing change?

No. This is a pure test-environment update.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45191 from dongjoon-hyun/SPARK-47111.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 .../scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala  | 6 +++---
 .../org/apache/spark/sql/jdbc/PostgresKrbIntegrationSuite.scala | 6 +++---
 .../apache/spark/sql/jdbc/querytest/GeneratedSubquerySuite.scala| 6 +++---
 .../apache/spark/sql/jdbc/querytest/PostgreSQLQueryTestSuite.scala  | 6 +++---
 .../org/apache/spark/sql/jdbc/v2/PostgresIntegrationSuite.scala | 6 +++---
 .../scala/org/apache/spark/sql/jdbc/v2/PostgresNamespaceSuite.scala | 6 +++---
 pom.xml | 2 +-
 7 files changed, 19 insertions(+), 19 deletions(-)

diff --git 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala
 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala
index 92d1c3761ba8..968ca09cb3d5 100644
--- 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala
+++ 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala
@@ -30,9 +30,9 @@ import org.apache.spark.sql.types.{ArrayType, DecimalType, 
FloatType, ShortType}
 import org.apache.spark.tags.DockerTest
 
 /**
- * To run this test suite for a specific version (e.g., postgres:15.1):
+ * To run this test suite for a specific version (e.g., postgres:16.2):
  * {{{
- *   ENABLE_DOCKER_INTEGRATION_TESTS=1 POSTGRES_DOCKER_IMAGE_NAME=postgres:15.1
+ *   ENABLE_DOCKER_INTEGRATION_TESTS=1 POSTGRES_DOCKER_IMAGE_NAME=postgres:16.2
  * ./build/sbt -Pdocker-integration-tests
  * "docker-integration-tests/testOnly 
org.apache.spark.sql.jdbc.PostgresIntegrationSuite"
  * }}}
@@ -40,7 +40,7 @@ import org.apache.spark.tags.DockerTest
 @DockerTest
 class PostgresIntegrationSuite extends DockerJDBCIntegrationSuite {
   override val db = new DatabaseOnDocker {
-override val imageName = sys.env.getOrElse("POSTGRES_DOCKER_IMAGE_NAME", 
"postgres:15.1-alpine")
+override val imageName = sys.env.getOrElse("POSTGRES_DOCKER_IMAGE_NAME", 
"postgres:16.2-alpine")
 override val env = Map(
   "POSTGRES_PASSWORD" -> "rootpass"
 )
diff --git 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresKrbIntegrationSuite.scala
 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresKrbIntegrationSuite.scala
index 92c3378b4065..d08be3b5f40e 100644
--- 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresKrbIntegrationSuite.scala
+++ 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresKrbIntegrationSuite.scala
@@ -25,9 +25,9 @@ import 
org.apache.spark.sql.execution.datasources.jdbc.connection.SecureConnecti
 import org.apache.spark.tags.DockerTest
 
 /**
- * To run this test suite for a specific version (e.g., postgres:15.1):
+ * To run this test suite for a specific version (e.g., postgres:16.2):
  * {{{
- *   ENABLE_DOCKER_INTEGRATION_TESTS=1 POSTGRES_DOCKER_IMAGE_NAME=postgres:15.1
+ *   ENABLE_DOCKER_INTEGRATION_TESTS=1 POSTGRES_DOCKER_IMAGE_NAME=postgres:16.2
  * ./build/sbt -Pdocker-integration-tests
  * "docker-integration-tests/testOnly *PostgresKrbIntegrationSuite"
  * }}}
@@ -38,7 +38,7 @@ class PostgresKrbIntegrationSuite extends 
DockerKrbJDBCIntegrationSuite {
   o

(spark) branch master updated: [SPARK-47109][BUILD] Upgrade `commons-compress` to 1.26.0

2024-02-20 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 434acaf48849 [SPARK-47109][BUILD] Upgrade `commons-compress` to 1.26.0
434acaf48849 is described below

commit 434acaf48849263986d83480002d8d969eb33f12
Author: Dongjoon Hyun 
AuthorDate: Tue Feb 20 18:56:42 2024 -0800

[SPARK-47109][BUILD] Upgrade `commons-compress` to 1.26.0

### What changes were proposed in this pull request?

This PR aims to upgrade `commons-compress` to 1.26.0.

### Why are the changes needed?

To bring the latest bug fixes.
- 
https://commons.apache.org/proper/commons-compress/changes-report.html#a1.26.0

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45189 from dongjoon-hyun/SPARK-47109.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +-
 pom.xml   | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index cc0145e004a0..97205011e265 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -40,7 +40,7 @@ commons-codec/1.16.1//commons-codec-1.16.1.jar
 commons-collections/3.2.2//commons-collections-3.2.2.jar
 commons-collections4/4.4//commons-collections4-4.4.jar
 commons-compiler/3.1.9//commons-compiler-3.1.9.jar
-commons-compress/1.25.0//commons-compress-1.25.0.jar
+commons-compress/1.26.0//commons-compress-1.26.0.jar
 commons-crypto/1.1.0//commons-crypto-1.1.0.jar
 commons-dbcp/1.4//commons-dbcp-1.4.jar
 commons-io/2.15.1//commons-io-2.15.1.jar
diff --git a/pom.xml b/pom.xml
index f7acb65b991e..b56fb857ee46 100644
--- a/pom.xml
+++ b/pom.xml
@@ -192,7 +192,7 @@
 1.1.10.5
 3.0.3
 1.16.1
-1.25.0
+1.26.0
 2.15.1
 
 2.6


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated (8ede494ad6c7 -> 76575ee7481c)

2024-02-20 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 8ede494ad6c7 [SPARK-46906][SS] Add a check for stateful operator 
change for streaming
 add 76575ee7481c [MINOR][SQL] Remove `toLowerCase(Locale.ROOT)` for 
`CATALOG_IMPLEMENTATION`

No new revisions were added by this update.

Summary of changes:
 repl/src/main/scala/org/apache/spark/repl/Main.scala  | 5 +
 sql/core/src/main/scala/org/apache/spark/sql/api/r/SQLUtils.scala | 5 ++---
 2 files changed, 3 insertions(+), 7 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch dependabot/maven/org.postgresql-postgresql-42.7.2 deleted (was 3ee67ff7af76)

2024-02-20 Thread github-bot
This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a change to branch 
dependabot/maven/org.postgresql-postgresql-42.7.2
in repository https://gitbox.apache.org/repos/asf/spark.git


 was 3ee67ff7af76 Bump org.postgresql:postgresql from 42.7.0 to 42.7.2

The revisions that were on this branch are still contained in
other references; therefore, this change does not discard any commits
from the repository.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch dependabot/maven/org.apache.commons-commons-compress-1.26.0 deleted (was 3488d39e5396)

2024-02-20 Thread github-bot
This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a change to branch 
dependabot/maven/org.apache.commons-commons-compress-1.26.0
in repository https://gitbox.apache.org/repos/asf/spark.git


 was 3488d39e5396 Bump org.apache.commons:commons-compress from 1.25.0 to 
1.26.0

The revisions that were on this branch are still contained in
other references; therefore, this change does not discard any commits
from the repository.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch dependabot/maven/org.apache.commons-commons-compress-1.26.0 created (now 3488d39e5396)

2024-02-20 Thread github-bot
This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a change to branch 
dependabot/maven/org.apache.commons-commons-compress-1.26.0
in repository https://gitbox.apache.org/repos/asf/spark.git


  at 3488d39e5396 Bump org.apache.commons:commons-compress from 1.25.0 to 
1.26.0

No new revisions were added by this update.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch dependabot/maven/org.postgresql-postgresql-42.7.2 created (now 3ee67ff7af76)

2024-02-20 Thread github-bot
This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a change to branch 
dependabot/maven/org.postgresql-postgresql-42.7.2
in repository https://gitbox.apache.org/repos/asf/spark.git


  at 3ee67ff7af76 Bump org.postgresql:postgresql from 42.7.0 to 42.7.2

No new revisions were added by this update.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated (eb71268a6a38 -> 8ede494ad6c7)

2024-02-20 Thread kabhwan
This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from eb71268a6a38 [SPARK-47108][CORE] Set 
`derby.connection.requireAuthentication` to `false` explicitly in CLIs
 add 8ede494ad6c7 [SPARK-46906][SS] Add a check for stateful operator 
change for streaming

No new revisions were added by this update.

Summary of changes:
 .../src/main/resources/error/error-classes.json|  7 ++
 docs/sql-error-conditions.md   |  7 ++
 .../spark/sql/errors/QueryExecutionErrors.scala| 13 
 .../v2/state/metadata/StateMetadataSource.scala|  2 +-
 .../execution/streaming/IncrementalExecution.scala | 61 +++-
 .../execution/streaming/statefulOperators.scala|  2 +-
 .../state/OperatorStateMetadataSuite.scala | 81 +-
 7 files changed, 166 insertions(+), 7 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated (9d9675922543 -> eb71268a6a38)

2024-02-20 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 9d9675922543 [SPARK-45615][BUILD] Remove undated "Auto-application to 
`()` is deprecated" compile suppression rules
 add eb71268a6a38 [SPARK-47108][CORE] Set 
`derby.connection.requireAuthentication` to `false` explicitly in CLIs

No new revisions were added by this update.

Summary of changes:
 .../main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java   | 1 +
 1 file changed, 1 insertion(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch branch-3.5 updated: [SPARK-47085][SQL][3.5] reduce the complexity of toTRowSet from n^2 to n

2024-02-20 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 92a333ada7c5 [SPARK-47085][SQL][3.5] reduce the complexity of 
toTRowSet from n^2 to n
92a333ada7c5 is described below

commit 92a333ada7c56b6f3dacffc18010880e37e66ee2
Author: Izek Greenfield 
AuthorDate: Tue Feb 20 12:39:24 2024 -0800

[SPARK-47085][SQL][3.5] reduce the complexity of toTRowSet from n^2 to n

### What changes were proposed in this pull request?
reduce the complexity of RowSetUtils.toTRowSet from n^2 to n

### Why are the changes needed?
This causes performance issues.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Tests + test manually on AWS EMR

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #45165 from igreenfield/branch-3.5.

Authored-by: Izek Greenfield 
Signed-off-by: Dongjoon Hyun 
---
 .../apache/spark/sql/hive/thriftserver/RowSetUtils.scala   | 14 --
 .../hive/thriftserver/SparkExecuteStatementOperation.scala |  2 +-
 2 files changed, 5 insertions(+), 11 deletions(-)

diff --git 
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/RowSetUtils.scala
 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/RowSetUtils.scala
index 9625021f392c..047f0612898d 100644
--- 
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/RowSetUtils.scala
+++ 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/RowSetUtils.scala
@@ -52,11 +52,7 @@ object RowSetUtils {
   rows: Seq[Row],
   schema: Array[DataType],
   timeFormatters: TimeFormatters): TRowSet = {
-var i = 0
-val rowSize = rows.length
-val tRows = new java.util.ArrayList[TRow](rowSize)
-while (i < rowSize) {
-  val row = rows(i)
+val tRows = rows.map { row =>
   val tRow = new TRow()
   var j = 0
   val columnSize = row.length
@@ -65,9 +61,8 @@ object RowSetUtils {
 tRow.addToColVals(columnValue)
 j += 1
   }
-  i += 1
-  tRows.add(tRow)
-}
+  tRow
+}.asJava
 new TRowSet(startRowOffSet, tRows)
   }
 
@@ -159,8 +154,7 @@ object RowSetUtils {
 val size = rows.length
 val ret = new java.util.ArrayList[T](size)
 var idx = 0
-while (idx < size) {
-  val row = rows(idx)
+rows.foreach { row =>
   if (row.isNullAt(ordinal)) {
 nulls.set(idx, true)
 ret.add(idx, defaultVal)
diff --git 
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala
 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala
index a9b46739fa66..e6b4c70bb395 100644
--- 
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala
+++ 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala
@@ -114,7 +114,7 @@ private[hive] class SparkExecuteStatementOperation(
 val offset = iter.getPosition
 val rows = iter.take(maxRows).toList
 log.debug(s"Returning result set with ${rows.length} rows from offsets " +
-  s"[${iter.getFetchStart}, ${offset}) with $statementId")
+  s"[${iter.getFetchStart}, ${iter.getPosition}) with $statementId")
 RowSetUtils.toTRowSet(offset, rows, dataTypes, getProtocolVersion, 
getTimeFormatters)
   }
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated (6a5649410d83 -> 9d9675922543)

2024-02-20 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 6a5649410d83 [SPARK-47098][INFRA] Migrate from AppVeyor to GitHub 
Actions for SparkR tests on Windows
 add 9d9675922543 [SPARK-45615][BUILD] Remove undated "Auto-application to 
`()` is deprecated" compile suppression rules

No new revisions were added by this update.

Summary of changes:
 pom.xml  | 8 
 project/SparkBuild.scala | 6 --
 2 files changed, 14 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



(spark) branch master updated: [SPARK-47098][INFRA] Migrate from AppVeyor to GitHub Actions for SparkR tests on Windows

2024-02-20 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 6a5649410d83 [SPARK-47098][INFRA] Migrate from AppVeyor to GitHub 
Actions for SparkR tests on Windows
6a5649410d83 is described below

commit 6a5649410d83610777bd3d67c7a6f567215118ae
Author: Hyukjin Kwon 
AuthorDate: Tue Feb 20 08:02:30 2024 -0800

[SPARK-47098][INFRA] Migrate from AppVeyor to GitHub Actions for SparkR 
tests on Windows

### What changes were proposed in this pull request?

This PR proposes to migrate from AppVeyor to GitHub Actions for SparkR 
tests on Windows.

### Why are the changes needed?

Reduce the tools we use for better maintenance.

### Does this PR introduce _any_ user-facing change?

No, dev-only.

### How was this patch tested?

- [x] Tested in my fork

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45175 from HyukjinKwon/SPARK-47098.

Authored-by: Hyukjin Kwon 
Signed-off-by: Dongjoon Hyun 
---
 .github/labeler.yml   |   1 -
 .github/workflows/build_sparkr_window.yml |  81 +
 README.md |   1 -
 appveyor.yml  |  75 
 dev/appveyor-guide.md | 186 --
 dev/appveyor-install-dependencies.ps1 | 153 
 dev/sparktestsupport/utils.py |   7 +-
 project/build.properties  |   1 -
 8 files changed, 83 insertions(+), 422 deletions(-)

diff --git a/.github/labeler.yml b/.github/labeler.yml
index 20b5c936941c..7d24390f2968 100644
--- a/.github/labeler.yml
+++ b/.github/labeler.yml
@@ -21,7 +21,6 @@ INFRA:
   - changed-files:
 - any-glob-to-any-file: [
  '.github/**/*',
- 'appveyor.yml',
  'tools/**/*',
  'dev/create-release/**/*',
  '.asf.yaml',
diff --git a/.github/workflows/build_sparkr_window.yml 
b/.github/workflows/build_sparkr_window.yml
new file mode 100644
index ..07f4ebe91ad2
--- /dev/null
+++ b/.github/workflows/build_sparkr_window.yml
@@ -0,0 +1,81 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+name: "Build / SparkR-only (master, 4.3.2, windows-2019)"
+
+on:
+  schedule:
+- cron: '0 17 * * *'
+
+jobs:
+  build:
+name: "Build module: sparkr"
+runs-on: windows-2019
+timeout-minutes: 300
+steps:
+- name: Download winutils Hadoop binary
+  uses: actions/checkout@v4
+  with:
+repository: cdarlint/winutils
+- name: Move Hadoop winutil into home directory
+  run: |
+Move-Item -Path hadoop-3.3.5 -Destination ~\
+- name: Checkout Spark repository
+  uses: actions/checkout@v4
+- name: Cache Maven local repository
+  uses: actions/cache@v4
+  with:
+path: ~/.m2/repository
+key: build-sparkr-maven-${{ hashFiles('**/pom.xml') }}
+restore-keys: |
+  build-sparkr-windows-maven-
+- name: Install Java 17
+  uses: actions/setup-java@v4
+  with:
+distribution: zulu
+java-version: 17
+- name: Install R 4.3.2
+  uses: r-lib/actions/setup-r@v2
+  with:
+r-version: 4.3.2
+- name: Install R dependencies
+  run: |
+Rscript -e "install.packages(c('knitr', 'rmarkdown', 'testthat', 
'e1071', 'survival', 'arrow', 'xml2'), repos='https://cloud.r-project.org/')"
+Rscript -e "pkg_list <- as.data.frame(installed.packages()[,c(1, 
3:4)]); pkg_list[is.na(pkg_list$Priority), 1:2, drop = FALSE]"
+  shell: cmd
+- name: Build Spark
+  run: |
+rem 1. '-Djna.nosys=true' is required to avoid kernel32.dll load 
failure.
+rem   See SPARK-28759.
+rem 2. Ideally we should check the tests related to Hive in SparkR as 
well (SPARK-31745).
+rem 3. setup-java installs Maven 3.8.7 but does not allow changing its 
version, so overwrite
+rem   Maven version as a workaround.
+mvn -DskipTests -Psparkr -Djna.nosys=true packag

(spark) branch master updated: [SPARK-46858][PYTHON][PS][BUILD] Upgrade Pandas to 2.2.0

2024-02-20 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 8e82887ed25e [SPARK-46858][PYTHON][PS][BUILD] Upgrade Pandas to 2.2.0
8e82887ed25e is described below

commit 8e82887ed25e521e1400edf56109f9d8f5ee3303
Author: Haejoon Lee 
AuthorDate: Tue Feb 20 07:46:52 2024 -0800

[SPARK-46858][PYTHON][PS][BUILD] Upgrade Pandas to 2.2.0

### What changes were proposed in this pull request?

This PR proposes to upgrade Pandas to 2.2.0.

See [What's new in 2.2.0 (January 19, 
2024)](https://pandas.pydata.org/docs/whatsnew/v2.2.0.html)

### Why are the changes needed?

Pandas 2.2.0 is released, and we should support the latest Pandas.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

The existing CI should pass

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #44881 from itholic/pandas_2.2.0.

Authored-by: Haejoon Lee 
Signed-off-by: Dongjoon Hyun 
---
 dev/infra/Dockerfile   |  4 +-
 .../source/migration_guide/pyspark_upgrade.rst |  1 +
 python/pyspark/pandas/frame.py |  6 +-
 python/pyspark/pandas/namespace.py |  5 +-
 python/pyspark/pandas/plot/matplotlib.py   | 99 ++
 python/pyspark/pandas/resample.py  | 24 --
 python/pyspark/pandas/series.py|  8 +-
 python/pyspark/pandas/supported_api_gen.py |  2 +-
 .../pyspark/pandas/tests/computation/test_melt.py  | 13 +--
 .../pandas/tests/data_type_ops/test_boolean_ops.py |  8 +-
 .../pandas/tests/data_type_ops/test_complex_ops.py | 12 ++-
 .../tests/data_type_ops/test_num_arithmetic.py | 24 +++---
 .../pandas/tests/data_type_ops/test_num_mod.py |  6 +-
 .../pandas/tests/data_type_ops/test_num_mul_div.py | 10 +--
 .../pandas/tests/data_type_ops/test_num_ops.py |  6 +-
 .../pandas/tests/data_type_ops/test_num_reverse.py | 24 +++---
 .../pyspark/pandas/tests/frame/test_reshaping.py   |  6 +-
 .../pandas/tests/indexes/test_conversion.py|  4 +-
 python/pyspark/pandas/tests/resample/test_error.py |  9 ++
 python/pyspark/pandas/tests/resample/test_frame.py |  9 +-
 .../pyspark/pandas/tests/resample/test_missing.py  |  4 +-
 .../pyspark/pandas/tests/resample/test_series.py   |  4 +-
 python/pyspark/pandas/tests/test_namespace.py  |  6 +-
 .../sql/tests/connect/test_connect_basic.py|  1 +
 .../sql/tests/connect/test_connect_function.py | 63 ++
 25 files changed, 253 insertions(+), 105 deletions(-)

diff --git a/dev/infra/Dockerfile b/dev/infra/Dockerfile
index fa663bc6e419..eaeed51f90cd 100644
--- a/dev/infra/Dockerfile
+++ b/dev/infra/Dockerfile
@@ -91,10 +91,10 @@ RUN mkdir -p /usr/local/pypy/pypy3.8 && \
 ln -sf /usr/local/pypy/pypy3.8/bin/pypy /usr/local/bin/pypy3.8 && \
 ln -sf /usr/local/pypy/pypy3.8/bin/pypy /usr/local/bin/pypy3
 RUN curl -sS https://bootstrap.pypa.io/get-pip.py | pypy3
-RUN pypy3 -m pip install numpy 'six==1.16.0' 'pandas<=2.1.4' scipy coverage 
matplotlib lxml
+RUN pypy3 -m pip install numpy 'six==1.16.0' 'pandas<=2.2.0' scipy coverage 
matplotlib lxml
 
 
-ARG BASIC_PIP_PKGS="numpy pyarrow>=15.0.0 six==1.16.0 pandas<=2.1.4 scipy 
plotly>=4.8 mlflow>=2.8.1 coverage matplotlib openpyxl memory-profiler>=0.61.0 
scikit-learn>=1.3.2"
+ARG BASIC_PIP_PKGS="numpy pyarrow>=15.0.0 six==1.16.0 pandas<=2.2.0 scipy 
plotly>=4.8 mlflow>=2.8.1 coverage matplotlib openpyxl memory-profiler>=0.61.0 
scikit-learn>=1.3.2"
 # Python deps for Spark Connect
 ARG CONNECT_PIP_PKGS="grpcio==1.59.3 grpcio-status==1.59.3 protobuf==4.25.1 
googleapis-common-protos==1.56.4"
 
diff --git a/python/docs/source/migration_guide/pyspark_upgrade.rst 
b/python/docs/source/migration_guide/pyspark_upgrade.rst
index 9ef04814ef82..1ca5d7aad5d1 100644
--- a/python/docs/source/migration_guide/pyspark_upgrade.rst
+++ b/python/docs/source/migration_guide/pyspark_upgrade.rst
@@ -69,6 +69,7 @@ Upgrading from PySpark 3.5 to 4.0
 * In Spark 4.0, ``Series.dt.week`` and ``Series.dt.weekofyear`` have been 
removed from Pandas API on Spark, use ``Series.dt.isocalendar().week`` instead.
 * In Spark 4.0, when applying ``astype`` to a decimal type object, the 
existing missing value is changed to ``True`` instead of ``False`` from Pandas 
API on Spark.
 * In Spark 4.0, ``pyspark.testing.assertPandasOnSparkEqual`` has been removed 
from Pandas API on Spark, use ``pyspark.pandas.testing.assert_frame_equal`` 
instead.
+* In Spark 4.0, the aliases ``Y``, ``M``, ``H``, ``T``, ``S`` have been 
deprecated from Pandas API on Spark, use ``YE``, ``ME``, ``h``, ``min``, ``s`` 
instead respectively.
 
 
 
diff --git a/python/pyspark/pandas/frame.py 

(spark) branch master updated: [SPARK-47044][SQL] Add executed query for JDBC external datasources to explain output

2024-02-20 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new e6a3385e27fa [SPARK-47044][SQL] Add executed query for JDBC external 
datasources to explain output
e6a3385e27fa is described below

commit e6a3385e27fa95391433ea02fa053540fe101d40
Author: Uros Stankovic 
AuthorDate: Tue Feb 20 22:03:28 2024 +0800

[SPARK-47044][SQL] Add executed query for JDBC external datasources to 
explain output

### What changes were proposed in this pull request?
Add generated JDBC query to EXPLAIN FORMATTED command when physical Scan 
node should access to JDBC source to create RDD.

Output of Explain formatted with this change from newly added test.
```
== Physical Plan ==
* Project (2)
+- * Scan 
org.apache.spark.sql.execution.datasources.v2.jdbc.JDBCScan$$anon$14349389d  (1)

(1) Scan 
org.apache.spark.sql.execution.datasources.v2.jdbc.JDBCScan$$anon$14349389d  
[codegen id : 1]
Output [1]: [MAX(ID)#x]
Arguments: [MAX(ID)#x], [StructField(MAX(ID),IntegerType,true)], 
PushedDownOperators(Some(org.apache.spark.sql.connector.expressions.aggregate.Aggregation647d3279),None,None,None,List(),ArraySeq(ID
 IS NOT NULL, ID > 1)), JDBCRDD[0] at $anonfun$executePhase$2 at 
LexicalThreadLocal.scala:63, 
org.apache.spark.sql.execution.datasources.v2.jdbc.JDBCScan$$anon$14349389d, 
Statistics(sizeInBytes=8.0 EiB, ColumnStat: N/A)
External engine query: SELECT MAX("ID") FROM "test"."people"  WHERE ("ID" 
IS NOT NULL) AND ("ID" > 1)

(2) Project [codegen id : 1]
Output [1]: [MAX(ID)#x AS max(id)#x]
Input [1]: [MAX(ID)#x]
```

### Why are the changes needed?
This command will allow customers to see which query text is sent to 
external JDBC sources.

### Does this PR introduce _any_ user-facing change?
Yes
Customer will have another field in EXPLAIN FORMATTED command for 
RowDataSourceScanExec node.

### How was this patch tested?
Tested using JDBC V2 suite by new unit test.

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #45102 from urosstan-db/add-sql-query-for-external-datasources.

Authored-by: Uros Stankovic 
Signed-off-by: Wenchen Fan 
---
 .../apache/spark/sql/catalyst/trees/TreeNode.scala |  8 ++--
 .../spark/sql/execution/DataSourceScanExec.scala   | 10 
 .../datasources/ExternalEngineDatasourceRDD.scala  | 26 ++
 .../sql/execution/datasources/jdbc/JDBCRDD.scala   | 56 --
 .../org/apache/spark/sql/jdbc/JDBCV2Suite.scala|  7 +++
 5 files changed, 78 insertions(+), 29 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala
index dbacb833ef59..10e2718da833 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala
@@ -1000,12 +1000,10 @@ abstract class TreeNode[BaseType <: TreeNode[BaseType]]
 
 val str = if (verbose) {
   if (addSuffix) verboseStringWithSuffix(maxFields) else 
verboseString(maxFields)
+} else if (printNodeId) {
+  simpleStringWithNodeId()
 } else {
-  if (printNodeId) {
-simpleStringWithNodeId()
-  } else {
-simpleString(maxFields)
-  }
+  simpleString(maxFields)
 }
 append(prefix)
 append(str)
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala
index ec265f4eaea4..474d65a251ba 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala
@@ -127,6 +127,16 @@ case class RowDataSourceScanExec(
 }
   }
 
+  override def verboseStringWithOperatorId(): String = {
+super.verboseStringWithOperatorId() + (rdd match {
+  case externalEngineDatasourceRdd: ExternalEngineDatasourceRDD =>
+"External engine query: " +
+  externalEngineDatasourceRdd.getExternalEngineQuery +
+  System.lineSeparator()
+  case _ => ""
+})
+  }
+
   protected override def doExecute(): RDD[InternalRow] = {
 val numOutputRows = longMetric("numOutputRows")
 
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ExternalEngineDatasourceRDD.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ExternalEngineDatasourceRDD.scala
new file mode 100644
index ..14ca824596f9
--- /dev/null
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ExternalEngineDatasourceRDD.sca

(spark) branch master updated: [SPARK-47100][BUILD] Upgrade `netty` to 4.1.107.Final and `netty-tcnative` to 2.0.62.Final

2024-02-20 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new fb1e7872a3e6 [SPARK-47100][BUILD] Upgrade `netty` to 4.1.107.Final and 
`netty-tcnative` to 2.0.62.Final
fb1e7872a3e6 is described below

commit fb1e7872a3e64eab6127f9c2b3ffa42b63162f6c
Author: Dongjoon Hyun 
AuthorDate: Tue Feb 20 17:04:41 2024 +0800

[SPARK-47100][BUILD] Upgrade `netty` to 4.1.107.Final and `netty-tcnative` 
to 2.0.62.Final

### What changes were proposed in this pull request?

This PR aims to upgrade `netty` to 4.1.107.Final and `netty-tcnative` to 
2.0.62.Final.

### Why are the changes needed?

To bring the latest bug fixes.
- https://netty.io/news/2024/02/13/4-1-107-Final.html

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45178 from dongjoon-hyun/SPARK-47100.

Authored-by: Dongjoon Hyun 
Signed-off-by: yangjie01 
---
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 50 +--
 pom.xml   |  4 +--
 2 files changed, 27 insertions(+), 27 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index dbbddbc54c11..cc0145e004a0 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -192,32 +192,32 @@ metrics-jmx/4.2.25//metrics-jmx-4.2.25.jar
 metrics-json/4.2.25//metrics-json-4.2.25.jar
 metrics-jvm/4.2.25//metrics-jvm-4.2.25.jar
 minlog/1.3.0//minlog-1.3.0.jar
-netty-all/4.1.106.Final//netty-all-4.1.106.Final.jar
-netty-buffer/4.1.106.Final//netty-buffer-4.1.106.Final.jar
-netty-codec-http/4.1.106.Final//netty-codec-http-4.1.106.Final.jar
-netty-codec-http2/4.1.106.Final//netty-codec-http2-4.1.106.Final.jar
-netty-codec-socks/4.1.106.Final//netty-codec-socks-4.1.106.Final.jar
-netty-codec/4.1.106.Final//netty-codec-4.1.106.Final.jar
-netty-common/4.1.106.Final//netty-common-4.1.106.Final.jar
-netty-handler-proxy/4.1.106.Final//netty-handler-proxy-4.1.106.Final.jar
-netty-handler/4.1.106.Final//netty-handler-4.1.106.Final.jar
-netty-resolver/4.1.106.Final//netty-resolver-4.1.106.Final.jar
+netty-all/4.1.107.Final//netty-all-4.1.107.Final.jar
+netty-buffer/4.1.107.Final//netty-buffer-4.1.107.Final.jar
+netty-codec-http/4.1.107.Final//netty-codec-http-4.1.107.Final.jar
+netty-codec-http2/4.1.107.Final//netty-codec-http2-4.1.107.Final.jar
+netty-codec-socks/4.1.107.Final//netty-codec-socks-4.1.107.Final.jar
+netty-codec/4.1.107.Final//netty-codec-4.1.107.Final.jar
+netty-common/4.1.107.Final//netty-common-4.1.107.Final.jar
+netty-handler-proxy/4.1.107.Final//netty-handler-proxy-4.1.107.Final.jar
+netty-handler/4.1.107.Final//netty-handler-4.1.107.Final.jar
+netty-resolver/4.1.107.Final//netty-resolver-4.1.107.Final.jar
 
netty-tcnative-boringssl-static/2.0.61.Final//netty-tcnative-boringssl-static-2.0.61.Final.jar
-netty-tcnative-boringssl-static/2.0.61.Final/linux-aarch_64/netty-tcnative-boringssl-static-2.0.61.Final-linux-aarch_64.jar
-netty-tcnative-boringssl-static/2.0.61.Final/linux-x86_64/netty-tcnative-boringssl-static-2.0.61.Final-linux-x86_64.jar
-netty-tcnative-boringssl-static/2.0.61.Final/osx-aarch_64/netty-tcnative-boringssl-static-2.0.61.Final-osx-aarch_64.jar
-netty-tcnative-boringssl-static/2.0.61.Final/osx-x86_64/netty-tcnative-boringssl-static-2.0.61.Final-osx-x86_64.jar
-netty-tcnative-boringssl-static/2.0.61.Final/windows-x86_64/netty-tcnative-boringssl-static-2.0.61.Final-windows-x86_64.jar
-netty-tcnative-classes/2.0.61.Final//netty-tcnative-classes-2.0.61.Final.jar
-netty-transport-classes-epoll/4.1.106.Final//netty-transport-classes-epoll-4.1.106.Final.jar
-netty-transport-classes-kqueue/4.1.106.Final//netty-transport-classes-kqueue-4.1.106.Final.jar
-netty-transport-native-epoll/4.1.106.Final/linux-aarch_64/netty-transport-native-epoll-4.1.106.Final-linux-aarch_64.jar
-netty-transport-native-epoll/4.1.106.Final/linux-riscv64/netty-transport-native-epoll-4.1.106.Final-linux-riscv64.jar
-netty-transport-native-epoll/4.1.106.Final/linux-x86_64/netty-transport-native-epoll-4.1.106.Final-linux-x86_64.jar
-netty-transport-native-kqueue/4.1.106.Final/osx-aarch_64/netty-transport-native-kqueue-4.1.106.Final-osx-aarch_64.jar
-netty-transport-native-kqueue/4.1.106.Final/osx-x86_64/netty-transport-native-kqueue-4.1.106.Final-osx-x86_64.jar
-netty-transport-native-unix-common/4.1.106.Final//netty-transport-native-unix-common-4.1.106.Final.jar
-netty-transport/4.1.106.Final//netty-transport-4.1.106.Final.jar
+netty-tcnative-boringssl-static/2.0.62.Final/linux-aarch_64/netty-tcnative-boringssl-static-2.0.62.Final-linux-aarch_64.jar
+netty-tcnati