date:20230801

[spark] branch branch-3.5 updated: [SPARK-44555][SQL] Use checkError() to check Exception in command Suite & assign some error class names

2023-08-01 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new c47d9b1bcf6 [SPARK-44555][SQL] Use checkError() to check Exception in 
command Suite & assign some error class names
c47d9b1bcf6 is described below

commit c47d9b1bcf61f65a7078d43361b438fd56d0af81
Author: panbingkun 
AuthorDate: Wed Aug 2 10:51:16 2023 +0500

[SPARK-44555][SQL] Use checkError() to check Exception in command Suite & 
assign some error class names

### What changes were proposed in this pull request?
The pr aims to
1. Use `checkError()` to check Exception in `command` Suite.
2. Assign some error class names, include: 
`UNSUPPORTED_FEATURE.PURGE_PARTITION` and `UNSUPPORTED_FEATURE.PURGE_TABLE`.

### Why are the changes needed?
The changes improve the error framework.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
- Manually test.
- Pass GA.

Closes #42169 from panbingkun/checkError_for_command.

Authored-by: panbingkun 
Signed-off-by: Max Gekk 
(cherry picked from commit 4ec27c3801aaa0cbba3e086c278a0ff96260b84a)
Signed-off-by: Max Gekk 
---
 .../src/main/resources/error/error-classes.json| 10 
 ...r-conditions-unsupported-feature-error-class.md |  8 ++
 .../catalog/SupportsAtomicPartitionManagement.java |  3 ++-
 .../catalog/SupportsPartitionManagement.java   |  3 ++-
 .../spark/sql/connector/catalog/TableCatalog.java  |  3 ++-
 .../spark/sql/errors/QueryExecutionErrors.scala| 12 +
 .../SupportsAtomicPartitionManagementSuite.scala   | 13 ++
 .../catalog/SupportsPartitionManagementSuite.scala | 13 ++
 .../command/v1/AlterTableAddPartitionSuite.scala   | 14 ++
 .../command/v1/AlterTableDropPartitionSuite.scala  | 12 +
 .../command/v1/AlterTableRenameSuite.scala | 11 +---
 .../command/v1/AlterTableSetLocationSuite.scala| 11 +---
 .../command/v1/ShowCreateTableSuite.scala  | 12 +
 .../sql/execution/command/v1/ShowTablesSuite.scala | 22 ++--
 .../execution/command/v1/TruncateTableSuite.scala  | 11 +---
 .../command/v2/AlterTableDropPartitionSuite.scala  | 12 ++---
 .../v2/AlterTableRecoverPartitionsSuite.scala  | 11 +---
 .../command/v2/AlterTableSetLocationSuite.scala| 12 +
 .../sql/execution/command/v2/DropTableSuite.scala  | 12 ++---
 .../command/v2/MsckRepairTableSuite.scala  | 11 +---
 .../sql/execution/command/v2/ShowTablesSuite.scala | 11 +---
 .../execution/command/ShowCreateTableSuite.scala   | 30 +-
 22 files changed, 172 insertions(+), 85 deletions(-)

diff --git a/common/utils/src/main/resources/error/error-classes.json 
b/common/utils/src/main/resources/error/error-classes.json
index 385435c740e..480ec636283 100644
--- a/common/utils/src/main/resources/error/error-classes.json
+++ b/common/utils/src/main/resources/error/error-classes.json
@@ -2956,6 +2956,16 @@
   "Pivoting by the value '' of the column data type ."
 ]
   },
+  "PURGE_PARTITION" : {
+"message" : [
+  "Partition purge."
+]
+  },
+  "PURGE_TABLE" : {
+"message" : [
+  "Purge table."
+]
+  },
   "PYTHON_UDF_IN_ON_CLAUSE" : {
 "message" : [
   "Python UDF in the ON clause of a  JOIN. In case of an 
INNNER JOIN consider rewriting to a CROSS JOIN with a WHERE clause."
diff --git a/docs/sql-error-conditions-unsupported-feature-error-class.md 
b/docs/sql-error-conditions-unsupported-feature-error-class.md
index aa1c622c458..7a60dc76fa6 100644
--- a/docs/sql-error-conditions-unsupported-feature-error-class.md
+++ b/docs/sql-error-conditions-unsupported-feature-error-class.md
@@ -141,6 +141,14 @@ PIVOT clause following a GROUP BY clause. Consider pushing 
the GROUP BY into a s
 
 Pivoting by the value '``' of the column data type ``.
 
+## PURGE_PARTITION
+
+Partition purge.
+
+## PURGE_TABLE
+
+Purge table.
+
 ## PYTHON_UDF_IN_ON_CLAUSE
 
 Python UDF in the ON clause of a `` JOIN. In case of an INNNER JOIN 
consider rewriting to a CROSS JOIN with a WHERE clause.
diff --git 
a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/SupportsAtomicPartitionManagement.java
 
b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/SupportsAtomicPartitionManagement.java
index 3eb9bf9f913..48c6392d2b8 100644
--- 
a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/SupportsAtomicPartitionManagement.java
+++ 
b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/SupportsAtomicPartitionManagement.java
@@ -23,6 +23,7 @@ import org.apache.spark.annotation.Experimental;
 import org.apache.spark.sql.catalyst.InternalRow;

[spark] branch master updated: [SPARK-44555][SQL] Use checkError() to check Exception in command Suite & assign some error class names

2023-08-01 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 4ec27c3801a [SPARK-44555][SQL] Use checkError() to check Exception in 
command Suite & assign some error class names
4ec27c3801a is described below

commit 4ec27c3801aaa0cbba3e086c278a0ff96260b84a
Author: panbingkun 
AuthorDate: Wed Aug 2 10:51:16 2023 +0500

[SPARK-44555][SQL] Use checkError() to check Exception in command Suite & 
assign some error class names

### What changes were proposed in this pull request?
The pr aims to
1. Use `checkError()` to check Exception in `command` Suite.
2. Assign some error class names, include: 
`UNSUPPORTED_FEATURE.PURGE_PARTITION` and `UNSUPPORTED_FEATURE.PURGE_TABLE`.

### Why are the changes needed?
The changes improve the error framework.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
- Manually test.
- Pass GA.

Closes #42169 from panbingkun/checkError_for_command.

Authored-by: panbingkun 
Signed-off-by: Max Gekk 
---
 .../src/main/resources/error/error-classes.json| 10 
 ...r-conditions-unsupported-feature-error-class.md |  8 ++
 .../catalog/SupportsAtomicPartitionManagement.java |  3 ++-
 .../catalog/SupportsPartitionManagement.java   |  3 ++-
 .../spark/sql/connector/catalog/TableCatalog.java  |  3 ++-
 .../spark/sql/errors/QueryExecutionErrors.scala| 12 +
 .../SupportsAtomicPartitionManagementSuite.scala   | 13 ++
 .../catalog/SupportsPartitionManagementSuite.scala | 13 ++
 .../command/v1/AlterTableAddPartitionSuite.scala   | 14 ++
 .../command/v1/AlterTableDropPartitionSuite.scala  | 12 +
 .../command/v1/AlterTableRenameSuite.scala | 11 +---
 .../command/v1/AlterTableSetLocationSuite.scala| 11 +---
 .../command/v1/ShowCreateTableSuite.scala  | 12 +
 .../sql/execution/command/v1/ShowTablesSuite.scala | 22 ++--
 .../execution/command/v1/TruncateTableSuite.scala  | 11 +---
 .../command/v2/AlterTableDropPartitionSuite.scala  | 12 ++---
 .../v2/AlterTableRecoverPartitionsSuite.scala  | 11 +---
 .../command/v2/AlterTableSetLocationSuite.scala| 12 +
 .../sql/execution/command/v2/DropTableSuite.scala  | 12 ++---
 .../command/v2/MsckRepairTableSuite.scala  | 11 +---
 .../sql/execution/command/v2/ShowTablesSuite.scala | 11 +---
 .../execution/command/ShowCreateTableSuite.scala   | 30 +-
 22 files changed, 172 insertions(+), 85 deletions(-)

diff --git a/common/utils/src/main/resources/error/error-classes.json 
b/common/utils/src/main/resources/error/error-classes.json
index 7012c66c895..06350522834 100644
--- a/common/utils/src/main/resources/error/error-classes.json
+++ b/common/utils/src/main/resources/error/error-classes.json
@@ -3020,6 +3020,16 @@
   "Pivoting by the value '' of the column data type ."
 ]
   },
+  "PURGE_PARTITION" : {
+"message" : [
+  "Partition purge."
+]
+  },
+  "PURGE_TABLE" : {
+"message" : [
+  "Purge table."
+]
+  },
   "PYTHON_UDF_IN_ON_CLAUSE" : {
 "message" : [
   "Python UDF in the ON clause of a  JOIN. In case of an 
INNNER JOIN consider rewriting to a CROSS JOIN with a WHERE clause."
diff --git a/docs/sql-error-conditions-unsupported-feature-error-class.md 
b/docs/sql-error-conditions-unsupported-feature-error-class.md
index aa1c622c458..7a60dc76fa6 100644
--- a/docs/sql-error-conditions-unsupported-feature-error-class.md
+++ b/docs/sql-error-conditions-unsupported-feature-error-class.md
@@ -141,6 +141,14 @@ PIVOT clause following a GROUP BY clause. Consider pushing 
the GROUP BY into a s
 
 Pivoting by the value '``' of the column data type ``.
 
+## PURGE_PARTITION
+
+Partition purge.
+
+## PURGE_TABLE
+
+Purge table.
+
 ## PYTHON_UDF_IN_ON_CLAUSE
 
 Python UDF in the ON clause of a `` JOIN. In case of an INNNER JOIN 
consider rewriting to a CROSS JOIN with a WHERE clause.
diff --git 
a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/SupportsAtomicPartitionManagement.java
 
b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/SupportsAtomicPartitionManagement.java
index 3eb9bf9f913..48c6392d2b8 100644
--- 
a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/SupportsAtomicPartitionManagement.java
+++ 
b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/SupportsAtomicPartitionManagement.java
@@ -23,6 +23,7 @@ import org.apache.spark.annotation.Experimental;
 import org.apache.spark.sql.catalyst.InternalRow;
 import org.apache.spark.sql.catalyst.analysis.NoSuchPartitionException;
 import

[spark] branch branch-3.5 updated: [SPARK-41532][CONNECT][FOLLOWUP] Make the scala client using the same error class as python client

2023-08-01 Thread ruifengz

This is an automated email from the ASF dual-hosted git repository.

ruifengz pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 9ed13a1032e [SPARK-41532][CONNECT][FOLLOWUP] Make the scala client 
using the same error class as python client
9ed13a1032e is described below

commit 9ed13a1032e38d79d5b2f952a7537f675254a042
Author: Jiaan Geng 
AuthorDate: Wed Aug 2 12:48:48 2023 +0800

[SPARK-41532][CONNECT][FOLLOWUP] Make the scala client using the same error 
class as python client

### What changes were proposed in this pull request?
The python connect client define the error class in `error_classes.py`.
`SESSION_NOT_SAME` is an error class used to check the `SparkSession` of 
one dataset is the same the other dataset. Please refer [`error_classes.py` 
](https://github.com/apache/spark/blob/546e39c5dabc243ab81b6238dc893d9993e0/python/pyspark/errors/error_classes.py#L678C1-L678C1)
But the scala connect client not the the same error class.

### Why are the changes needed?
This PR make the scala client using the same error class as python client.

### Does this PR introduce _any_ user-facing change?
'No'.
Just update the inner implementation.

### How was this patch tested?
Exists test cases.

Closes #42256 from beliefer/SPARK-41532_followup.

Authored-by: Jiaan Geng 
Signed-off-by: Ruifeng Zheng 
(cherry picked from commit 47d9fbd407d1b7148ae16c781facc2d9d663fc16)
Signed-off-by: Ruifeng Zheng 
---
 common/utils/src/main/resources/error/error-classes.json | 5 +
 .../client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala | 5 -
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/common/utils/src/main/resources/error/error-classes.json 
b/common/utils/src/main/resources/error/error-classes.json
index 31844f6f1f4..385435c740e 100644
--- a/common/utils/src/main/resources/error/error-classes.json
+++ b/common/utils/src/main/resources/error/error-classes.json
@@ -350,6 +350,11 @@
 "message" : [
   "Error instantiating Spark Connect plugin: "
 ]
+  },
+  "SESSION_NOT_SAME" : {
+"message" : [
+  "Both Datasets must belong to the same SparkSession."
+]
   }
 }
   },
diff --git 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala
 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala
index b0dd91293a0..0f7b376955c 100644
--- 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala
+++ 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala
@@ -1750,7 +1750,10 @@ class Dataset[T] private[sql] (
 
   private def checkSameSparkSession(other: Dataset[_]): Unit = {
 if (this.sparkSession.sessionId != other.sparkSession.sessionId) {
-  throw new SparkException("Both Datasets must belong to the same 
SparkSession")
+  throw new SparkException(
+errorClass = "CONNECT.SESSION_NOT_SAME",
+messageParameters = Map.empty,
+cause = null)
 }
   }
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-41532][CONNECT][FOLLOWUP] Make the scala client using the same error class as python client

2023-08-01 Thread ruifengz

This is an automated email from the ASF dual-hosted git repository.

ruifengz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 47d9fbd407d [SPARK-41532][CONNECT][FOLLOWUP] Make the scala client 
using the same error class as python client
47d9fbd407d is described below

commit 47d9fbd407d1b7148ae16c781facc2d9d663fc16
Author: Jiaan Geng 
AuthorDate: Wed Aug 2 12:48:48 2023 +0800

[SPARK-41532][CONNECT][FOLLOWUP] Make the scala client using the same error 
class as python client

### What changes were proposed in this pull request?
The python connect client define the error class in `error_classes.py`.
`SESSION_NOT_SAME` is an error class used to check the `SparkSession` of 
one dataset is the same the other dataset. Please refer [`error_classes.py` 
](https://github.com/apache/spark/blob/546e39c5dabc243ab81b6238dc893d9993e0/python/pyspark/errors/error_classes.py#L678C1-L678C1)
But the scala connect client not the the same error class.

### Why are the changes needed?
This PR make the scala client using the same error class as python client.

### Does this PR introduce _any_ user-facing change?
'No'.
Just update the inner implementation.

### How was this patch tested?
Exists test cases.

Closes #42256 from beliefer/SPARK-41532_followup.

Authored-by: Jiaan Geng 
Signed-off-by: Ruifeng Zheng 
---
 common/utils/src/main/resources/error/error-classes.json | 5 +
 .../client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala | 5 -
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/common/utils/src/main/resources/error/error-classes.json 
b/common/utils/src/main/resources/error/error-classes.json
index 06c47419fcb..7012c66c895 100644
--- a/common/utils/src/main/resources/error/error-classes.json
+++ b/common/utils/src/main/resources/error/error-classes.json
@@ -409,6 +409,11 @@
 "message" : [
   "Error instantiating Spark Connect plugin: "
 ]
+  },
+  "SESSION_NOT_SAME" : {
+"message" : [
+  "Both Datasets must belong to the same SparkSession."
+]
   }
 }
   },
diff --git 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala
 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala
index b0dd91293a0..0f7b376955c 100644
--- 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala
+++ 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala
@@ -1750,7 +1750,10 @@ class Dataset[T] private[sql] (
 
   private def checkSameSparkSession(other: Dataset[_]): Unit = {
 if (this.sparkSession.sessionId != other.sparkSession.sessionId) {
-  throw new SparkException("Both Datasets must belong to the same 
SparkSession")
+  throw new SparkException(
+errorClass = "CONNECT.SESSION_NOT_SAME",
+messageParameters = Map.empty,
+cause = null)
 }
   }
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.3 updated: [SPARK-44588][CORE][3.3] Fix double encryption issue for migrated shuffle blocks

2023-08-01 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.3
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.3 by this push:
 new 9e86aed7b2a [SPARK-44588][CORE][3.3] Fix double encryption issue for 
migrated shuffle blocks
9e86aed7b2a is described below

commit 9e86aed7b2ac3f9c18346a290b9672b0d9465805
Author: Henry Mai 
AuthorDate: Tue Aug 1 21:07:51 2023 -0700

[SPARK-44588][CORE][3.3] Fix double encryption issue for migrated shuffle 
blocks

### What changes were proposed in this pull request?

Fix double encryption issue for migrated shuffle blocks

Shuffle blocks upon migration are sent without decryption when 
io.encryption is enabled. The code on the receiving side ends up using 
serializer.wrapStream on the OutputStream to the file which results in the 
already encrypted bytes being encrypted again when the bytes are written out.

This patch removes the usage of serializerManager.wrapStream on the 
receiving side and also adds tests that validate that this works as expected. I 
have also validated that the added tests will fail if the fix is not in place.

Jira ticket with more details: 
https://issues.apache.org/jira/browse/SPARK-44588

### Why are the changes needed?

Migrated shuffle blocks will be double encrypted when `spark.io.encryption 
= true` without this fix.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Unit tests were added to test shuffle block migration with 
spark.io.encryption enabled and also fixes a test helper method to properly 
construct the SerializerManager with the encryption key.

Closes #42277 from henrymai/branch-3.3_backport_double_encryption.

Authored-by: Henry Mai 
Signed-off-by: Dongjoon Hyun 
---
 .../spark/shuffle/IndexShuffleBlockResolver.scala  |  8 ---
 .../apache/spark/storage/BlockManagerSuite.scala   | 28 ++
 2 files changed, 28 insertions(+), 8 deletions(-)

diff --git 
a/core/src/main/scala/org/apache/spark/shuffle/IndexShuffleBlockResolver.scala 
b/core/src/main/scala/org/apache/spark/shuffle/IndexShuffleBlockResolver.scala
index ba54555311e..d41321b4597 100644
--- 
a/core/src/main/scala/org/apache/spark/shuffle/IndexShuffleBlockResolver.scala
+++ 
b/core/src/main/scala/org/apache/spark/shuffle/IndexShuffleBlockResolver.scala
@@ -240,9 +240,11 @@ private[spark] class IndexShuffleBlockResolver(
   s"${blockId.getClass().getSimpleName()}")
 }
 val fileTmp = createTempFile(file)
-val channel = Channels.newChannel(
-  serializerManager.wrapStream(blockId,
-new FileOutputStream(fileTmp)))
+
+// Shuffle blocks' file bytes are being sent directly over the wire, so 
there is no need to
+// serializerManager.wrapStream() on it. Meaning if it was originally 
encrypted, then
+// it will stay encrypted when being written out to the file here.
+val channel = Channels.newChannel(new FileOutputStream(fileTmp))
 
 new StreamCallbackWithID {
 
diff --git 
a/core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala 
b/core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala
index 6bfffc8ab3d..986ac79953d 100644
--- a/core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala
+++ b/core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala
@@ -133,7 +133,7 @@ class BlockManagerSuite extends SparkFunSuite with Matchers 
with BeforeAndAfterE
 val transfer = transferService
   .getOrElse(new NettyBlockTransferService(conf, securityMgr, "localhost", 
"localhost", 0, 1))
 val memManager = UnifiedMemoryManager(bmConf, numCores = 1)
-val serializerManager = new SerializerManager(serializer, bmConf)
+val serializerManager = new SerializerManager(serializer, bmConf, 
encryptionKey)
 val externalShuffleClient = if (conf.get(config.SHUFFLE_SERVICE_ENABLED)) {
   val transConf = SparkTransportConf.fromSparkConf(conf, "shuffle", 0)
   Some(new ExternalBlockStoreClient(transConf, bmSecurityMgr,
@@ -2027,10 +2027,13 @@ class BlockManagerSuite extends SparkFunSuite with 
Matchers with BeforeAndAfterE
 assert(master.getLocations(blockIdLarge) === Seq(store1.blockManagerId))
   }
 
-  private def testShuffleBlockDecommissioning(maxShuffleSize: Option[Int], 
willReject: Boolean) = {
+  private def testShuffleBlockDecommissioning(
+  maxShuffleSize: Option[Int], willReject: Boolean, enableIoEncryption: 
Boolean) = {
 maxShuffleSize.foreach{ size =>
   conf.set(STORAGE_DECOMMISSION_SHUFFLE_MAX_DISK_SIZE.key, s"${size}b")
 }
+conf.set(IO_ENCRYPTION_ENABLED, enableIoEncryption)
+
 val shuffleManager1 = makeSortShuffleManager(Some(conf))
 val bm1 = makeBlockManager(3500, "exec1", shuffleManager = shuffleManager1)

[spark] branch branch-3.4 updated: [SPARK-44588][CORE][3.4] Fix double encryption issue for migrated shuffle blocks

2023-08-01 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new fcb6e2ac3ca [SPARK-44588][CORE][3.4] Fix double encryption issue for 
migrated shuffle blocks
fcb6e2ac3ca is described below

commit fcb6e2ac3ca51c86450b20f74520c25ab552d42b
Author: Henry Mai 
AuthorDate: Tue Aug 1 21:06:42 2023 -0700

[SPARK-44588][CORE][3.4] Fix double encryption issue for migrated shuffle 
blocks

### What changes were proposed in this pull request?

Fix double encryption issue for migrated shuffle blocks

Shuffle blocks upon migration are sent without decryption when 
io.encryption is enabled. The code on the receiving side ends up using 
serializer.wrapStream on the OutputStream to the file which results in the 
already encrypted bytes being encrypted again when the bytes are written out.

This patch removes the usage of serializerManager.wrapStream on the 
receiving side and also adds tests that validate that this works as expected. I 
have also validated that the added tests will fail if the fix is not in place.

Jira ticket with more details: 
https://issues.apache.org/jira/browse/SPARK-44588

### Why are the changes needed?

Migrated shuffle blocks will be double encrypted when `spark.io.encryption 
= true` without this fix.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Unit tests were added to test shuffle block migration with 
spark.io.encryption enabled and also fixes a test helper method to properly 
construct the SerializerManager with the encryption key.

Closes #42279 from henrymai/branch-3.4_backport_double_encryption.

Authored-by: Henry Mai 
Signed-off-by: Dongjoon Hyun 
---
 .../spark/shuffle/IndexShuffleBlockResolver.scala  |  8 ---
 .../apache/spark/storage/BlockManagerSuite.scala   | 28 ++
 2 files changed, 28 insertions(+), 8 deletions(-)

diff --git 
a/core/src/main/scala/org/apache/spark/shuffle/IndexShuffleBlockResolver.scala 
b/core/src/main/scala/org/apache/spark/shuffle/IndexShuffleBlockResolver.scala
index ba54555311e..d41321b4597 100644
--- 
a/core/src/main/scala/org/apache/spark/shuffle/IndexShuffleBlockResolver.scala
+++ 
b/core/src/main/scala/org/apache/spark/shuffle/IndexShuffleBlockResolver.scala
@@ -240,9 +240,11 @@ private[spark] class IndexShuffleBlockResolver(
   s"${blockId.getClass().getSimpleName()}")
 }
 val fileTmp = createTempFile(file)
-val channel = Channels.newChannel(
-  serializerManager.wrapStream(blockId,
-new FileOutputStream(fileTmp)))
+
+// Shuffle blocks' file bytes are being sent directly over the wire, so 
there is no need to
+// serializerManager.wrapStream() on it. Meaning if it was originally 
encrypted, then
+// it will stay encrypted when being written out to the file here.
+val channel = Channels.newChannel(new FileOutputStream(fileTmp))
 
 new StreamCallbackWithID {
 
diff --git 
a/core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala 
b/core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala
index 842b66193f2..2a8f9f60241 100644
--- a/core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala
+++ b/core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala
@@ -132,7 +132,7 @@ class BlockManagerSuite extends SparkFunSuite with Matchers 
with PrivateMethodTe
 val transfer = transferService
   .getOrElse(new NettyBlockTransferService(conf, securityMgr, "localhost", 
"localhost", 0, 1))
 val memManager = UnifiedMemoryManager(bmConf, numCores = 1)
-val serializerManager = new SerializerManager(serializer, bmConf)
+val serializerManager = new SerializerManager(serializer, bmConf, 
encryptionKey)
 val externalShuffleClient = if (conf.get(config.SHUFFLE_SERVICE_ENABLED)) {
   val transConf = SparkTransportConf.fromSparkConf(conf, "shuffle", 0)
   Some(new ExternalBlockStoreClient(transConf, bmSecurityMgr,
@@ -2026,10 +2026,13 @@ class BlockManagerSuite extends SparkFunSuite with 
Matchers with PrivateMethodTe
 assert(master.getLocations(blockIdLarge) === Seq(store1.blockManagerId))
   }
 
-  private def testShuffleBlockDecommissioning(maxShuffleSize: Option[Int], 
willReject: Boolean) = {
+  private def testShuffleBlockDecommissioning(
+  maxShuffleSize: Option[Int], willReject: Boolean, enableIoEncryption: 
Boolean) = {
 maxShuffleSize.foreach{ size =>
   conf.set(STORAGE_DECOMMISSION_SHUFFLE_MAX_DISK_SIZE.key, s"${size}b")
 }
+conf.set(IO_ENCRYPTION_ENABLED, enableIoEncryption)
+
 val shuffleManager1 = makeSortShuffleManager(Some(conf))
 val bm1 = makeBlockManager(3500, "exec1", shuffleManager = shuffleManager1)

[spark] branch branch-3.5 updated: [SPARK-44421][CONNECT][FOLLOWUP] Minor comment improvements

2023-08-01 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 6cd45ab801c [SPARK-44421][CONNECT][FOLLOWUP] Minor comment improvements
6cd45ab801c is described below

commit 6cd45ab801c5e39a05dd9ff760c67148bb067fa0
Author: Juliusz Sompolski 
AuthorDate: Wed Aug 2 12:49:11 2023 +0900

[SPARK-44421][CONNECT][FOLLOWUP] Minor comment improvements

### What changes were proposed in this pull request?

Improve some comments about iterator retries.

### Why are the changes needed?

Improve comments based on followup questions.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

No code changes, only comment changes.

Closes #42281 from juliuszsompolski/SPARK-44624-comment-only.

Lead-authored-by: Juliusz Sompolski 
Co-authored-by: Hyukjin Kwon 
Signed-off-by: Hyukjin Kwon 
(cherry picked from commit c1c4a79ff1728dca0c1536b944c10d282eb13f9f)
Signed-off-by: Hyukjin Kwon 
---
 .../client/ExecutePlanResponseReattachableIterator.scala  | 11 +--
 .../apache/spark/sql/connect/client/GrpcRetryHandler.scala|  3 +++
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/ExecutePlanResponseReattachableIterator.scala
 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/ExecutePlanResponseReattachableIterator.scala
index c6f75928a3a..00787b8f94d 100644
--- 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/ExecutePlanResponseReattachableIterator.scala
+++ 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/ExecutePlanResponseReattachableIterator.scala
@@ -40,8 +40,13 @@ import org.apache.spark.internal.Logging
  * ExecutePlanResponse on the iterator to return a new iterator from server 
that continues after
  * that.
  *
- * Since in reattachable execute the server does buffer some responses in case 
the client needs to
- * backtrack
+ * In reattachable execute the server does buffer some responses in case the 
client needs to
+ * backtrack. To let server release this buffer sooner, this iterator 
asynchronously sends
+ * ReleaseExecute RPCs that instruct the server to release responses that it 
already processed.
+ *
+ * Note: If the initial ExecutePlan did not even reach the server and 
execution didn't start, the
+ * ReattachExecute can still fail with INVALID_HANDLE.OPERATION_NOT_FOUND, 
failing the whole
+ * operation.
  */
 class ExecutePlanResponseReattachableIterator(
 request: proto.ExecutePlanRequest,
@@ -86,6 +91,8 @@ class ExecutePlanResponseReattachableIterator(
   private var responseComplete: Boolean = false
 
   // Initial iterator comes from ExecutePlan request.
+  // Note: This is not retried, because no error would ever be thrown here, 
and GRPC will only
+  // throw error on first iterator.hasNext() or iterator.next()
   private var iterator: java.util.Iterator[proto.ExecutePlanResponse] =
 rawBlockingStub.executePlan(initialRequest)
 
diff --git 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/GrpcRetryHandler.scala
 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/GrpcRetryHandler.scala
index 16352bb90b5..ef446399f16 100644
--- 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/GrpcRetryHandler.scala
+++ 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/GrpcRetryHandler.scala
@@ -99,6 +99,9 @@ private[client] class GrpcRetryHandler(private val 
retryPolicy: GrpcRetryHandler
   extends StreamObserver[U] {
 
 private var opened = false // only retries on first call
+
+// Note: This is not retried, because no error would ever be thrown here, 
and GRPC will only
+// throw error on first iterator.hasNext() or iterator.next()
 private var streamObserver = call(request)
 
 override def onNext(v: U): Unit = {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-44421][CONNECT][FOLLOWUP] Minor comment improvements

2023-08-01 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new c1c4a79ff17 [SPARK-44421][CONNECT][FOLLOWUP] Minor comment improvements
c1c4a79ff17 is described below

commit c1c4a79ff1728dca0c1536b944c10d282eb13f9f
Author: Juliusz Sompolski 
AuthorDate: Wed Aug 2 12:49:11 2023 +0900

[SPARK-44421][CONNECT][FOLLOWUP] Minor comment improvements

### What changes were proposed in this pull request?

Improve some comments about iterator retries.

### Why are the changes needed?

Improve comments based on followup questions.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

No code changes, only comment changes.

Closes #42281 from juliuszsompolski/SPARK-44624-comment-only.

Lead-authored-by: Juliusz Sompolski 
Co-authored-by: Hyukjin Kwon 
Signed-off-by: Hyukjin Kwon 
---
 .../client/ExecutePlanResponseReattachableIterator.scala  | 11 +--
 .../apache/spark/sql/connect/client/GrpcRetryHandler.scala|  3 +++
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/ExecutePlanResponseReattachableIterator.scala
 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/ExecutePlanResponseReattachableIterator.scala
index c6f75928a3a..00787b8f94d 100644
--- 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/ExecutePlanResponseReattachableIterator.scala
+++ 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/ExecutePlanResponseReattachableIterator.scala
@@ -40,8 +40,13 @@ import org.apache.spark.internal.Logging
  * ExecutePlanResponse on the iterator to return a new iterator from server 
that continues after
  * that.
  *
- * Since in reattachable execute the server does buffer some responses in case 
the client needs to
- * backtrack
+ * In reattachable execute the server does buffer some responses in case the 
client needs to
+ * backtrack. To let server release this buffer sooner, this iterator 
asynchronously sends
+ * ReleaseExecute RPCs that instruct the server to release responses that it 
already processed.
+ *
+ * Note: If the initial ExecutePlan did not even reach the server and 
execution didn't start, the
+ * ReattachExecute can still fail with INVALID_HANDLE.OPERATION_NOT_FOUND, 
failing the whole
+ * operation.
  */
 class ExecutePlanResponseReattachableIterator(
 request: proto.ExecutePlanRequest,
@@ -86,6 +91,8 @@ class ExecutePlanResponseReattachableIterator(
   private var responseComplete: Boolean = false
 
   // Initial iterator comes from ExecutePlan request.
+  // Note: This is not retried, because no error would ever be thrown here, 
and GRPC will only
+  // throw error on first iterator.hasNext() or iterator.next()
   private var iterator: java.util.Iterator[proto.ExecutePlanResponse] =
 rawBlockingStub.executePlan(initialRequest)
 
diff --git 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/GrpcRetryHandler.scala
 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/GrpcRetryHandler.scala
index 16352bb90b5..ef446399f16 100644
--- 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/GrpcRetryHandler.scala
+++ 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/GrpcRetryHandler.scala
@@ -99,6 +99,9 @@ private[client] class GrpcRetryHandler(private val 
retryPolicy: GrpcRetryHandler
   extends StreamObserver[U] {
 
 private var opened = false // only retries on first call
+
+// Note: This is not retried, because no error would ever be thrown here, 
and GRPC will only
+// throw error on first iterator.hasNext() or iterator.next()
 private var streamObserver = call(request)
 
 override def onNext(v: U): Unit = {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-44607][SQL] Remove unused function `containsNestedColumn` from `Filter`

2023-08-01 Thread yangjie01

This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new ca274fe15cf [SPARK-44607][SQL] Remove unused function 
`containsNestedColumn` from `Filter`
ca274fe15cf is described below

commit ca274fe15cf9f621394759339b6c8d826692d1a1
Author: yangjie01 
AuthorDate: Wed Aug 2 11:17:59 2023 +0800

[SPARK-44607][SQL] Remove unused function `containsNestedColumn` from 
`Filter`

### What changes were proposed in this pull request?
This pr aims remove `private[sql] `function `containsNestedColumn` from 
`org.apache.spark.sql.sources.Filter`.
This function was introduced by https://github.com/apache/spark/pull/27728 
to avoid nested predicate pushdown for Orc.
After https://github.com/apache/spark/pull/28761, Orc also support nested 
column predicate pushdown, so this function become unused.

### Why are the changes needed?
Remove unused `private[sql] ` function `containsNestedColumn`.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GitHub Actions

Closes #42239 from LuciferYang/SPARK-44607.

Authored-by: yangjie01 
Signed-off-by: yangjie01 
---
 .../src/main/scala/org/apache/spark/sql/sources/filters.scala  | 7 ---
 1 file changed, 7 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/sources/filters.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/sources/filters.scala
index af5e4f5ef5a..a52bca10660 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/sources/filters.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/sources/filters.scala
@@ -64,13 +64,6 @@ sealed abstract class Filter {
 this.references.map(parseColumnPath(_).toArray)
   }
 
-  /**
-   * If any of the references of this filter contains nested column
-   */
-  private[sql] def containsNestedColumn: Boolean = {
-this.v2references.exists(_.length > 1)
-  }
-
   /**
* Converts V1 filter to V2 filter
*/


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r63306 - /dev/spark/v3.5.0-rc1-bin/

2023-08-01 Thread liyuanjian

Author: liyuanjian
Date: Wed Aug  2 02:57:39 2023
New Revision: 63306

Log:
Apache Spark v3.5.0-rc1

Added:
dev/spark/v3.5.0-rc1-bin/
dev/spark/v3.5.0-rc1-bin/SparkR_3.5.0.tar.gz   (with props)
dev/spark/v3.5.0-rc1-bin/SparkR_3.5.0.tar.gz.asc
dev/spark/v3.5.0-rc1-bin/SparkR_3.5.0.tar.gz.sha512
dev/spark/v3.5.0-rc1-bin/pyspark-3.5.0.tar.gz   (with props)
dev/spark/v3.5.0-rc1-bin/pyspark-3.5.0.tar.gz.asc
dev/spark/v3.5.0-rc1-bin/pyspark-3.5.0.tar.gz.sha512
dev/spark/v3.5.0-rc1-bin/spark-3.5.0-bin-hadoop3-scala2.13.tgz   (with 
props)
dev/spark/v3.5.0-rc1-bin/spark-3.5.0-bin-hadoop3-scala2.13.tgz.asc
dev/spark/v3.5.0-rc1-bin/spark-3.5.0-bin-hadoop3-scala2.13.tgz.sha512
dev/spark/v3.5.0-rc1-bin/spark-3.5.0-bin-hadoop3.tgz   (with props)
dev/spark/v3.5.0-rc1-bin/spark-3.5.0-bin-hadoop3.tgz.asc
dev/spark/v3.5.0-rc1-bin/spark-3.5.0-bin-hadoop3.tgz.sha512
dev/spark/v3.5.0-rc1-bin/spark-3.5.0-bin-without-hadoop.tgz   (with props)
dev/spark/v3.5.0-rc1-bin/spark-3.5.0-bin-without-hadoop.tgz.asc
dev/spark/v3.5.0-rc1-bin/spark-3.5.0-bin-without-hadoop.tgz.sha512
dev/spark/v3.5.0-rc1-bin/spark-3.5.0.tgz   (with props)
dev/spark/v3.5.0-rc1-bin/spark-3.5.0.tgz.asc
dev/spark/v3.5.0-rc1-bin/spark-3.5.0.tgz.sha512

Added: dev/spark/v3.5.0-rc1-bin/SparkR_3.5.0.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v3.5.0-rc1-bin/SparkR_3.5.0.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v3.5.0-rc1-bin/SparkR_3.5.0.tar.gz.asc
==
--- dev/spark/v3.5.0-rc1-bin/SparkR_3.5.0.tar.gz.asc (added)
+++ dev/spark/v3.5.0-rc1-bin/SparkR_3.5.0.tar.gz.asc Wed Aug  2 02:57:39 2023
@@ -0,0 +1,16 @@
+-BEGIN PGP SIGNATURE-
+
+iQIzBAABCgAdFiEE/Drjp+qhusmHcIQOfhq8xTqqIhYFAmTJxbgACgkQfhq8xTqq
+IhZ8rRAArVC0miH0g7k5hSoW4LHaraZylR90vghAYOQAl/7AenBE11MZM8gKwuil
+oRgbH70B6YimdNIV4/D6LMkk9+ZgWpqsfxYrM10jjK09/9Q7eqI0aKwb69sI8WO4
+YKOwMaLs511qZkNa3rinK1UzJZP3n28+m6UjmxVs4VJM1DFfGX0YCeuZX0hcZbIM
+KpUxXpenWiRv1XLeHQAvINxLtb6Dps9tooeKSagTwljlkEE6nO5PnZaOZytbp6OS
+lR7aozlbwSJuXbc32Qorbnuu196T2ytL96Wcy15dLEaKKyfwCVYgXCPWonRoFH0b
+ayM6rj5EjYmkud0blrPdS+mP6HZnRUiBB1pqZU+Fz+f7KUFBuV3gEkqKs9vqpPDa
+b7HzEe6LK+DhNZw3xail99UKPc1Rxh+vZZ6FlmQlxU4Ru/ITN5garhhGcoQqvXgT
+//CLae2idX1EYrRXlKLVv0GkutzRqhILaaf/TrVm4YV+XkGQJNDy2pS1SxLdq72y
+IGeH1Rqxz1CDHbDWNCdIrYMv82hEI2eg6M7xoZL0h+i2u/rAARyaIc3BtZ/yUf6g
+6FVjwcPRGtQ/BWmouAW0+4IsiRYzJ5+G2AgMU0nyPp4ZyM46ta+aEQwa59SDHBzI
+cEVHIs1q+2P0tpF/E9z4GSbJrC6M87x87vQf58RizvJ+agHsr/I=
+=Gld6
+-END PGP SIGNATURE-

Added: dev/spark/v3.5.0-rc1-bin/SparkR_3.5.0.tar.gz.sha512
==
--- dev/spark/v3.5.0-rc1-bin/SparkR_3.5.0.tar.gz.sha512 (added)
+++ dev/spark/v3.5.0-rc1-bin/SparkR_3.5.0.tar.gz.sha512 Wed Aug  2 02:57:39 2023
@@ -0,0 +1 @@
+3ec2a549720354a5a49f06159dabf848796cc8fc61e544dffcce5d92bf6b2bb2939d6a4d50d2fab67b1f76fc56ea8667cc45c371717db3b05a4379a49b88bf55
  SparkR_3.5.0.tar.gz

Added: dev/spark/v3.5.0-rc1-bin/pyspark-3.5.0.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v3.5.0-rc1-bin/pyspark-3.5.0.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v3.5.0-rc1-bin/pyspark-3.5.0.tar.gz.asc
==
--- dev/spark/v3.5.0-rc1-bin/pyspark-3.5.0.tar.gz.asc (added)
+++ dev/spark/v3.5.0-rc1-bin/pyspark-3.5.0.tar.gz.asc Wed Aug  2 02:57:39 2023
@@ -0,0 +1,16 @@
+-BEGIN PGP SIGNATURE-
+
+iQIzBAABCgAdFiEE/Drjp+qhusmHcIQOfhq8xTqqIhYFAmTJxboACgkQfhq8xTqq
+IhYBlg//d0kfe8i2f5qTsxymcXy2oyB8jmRV6raCyqX7IrvQhSN67a2eKdwS39ji
+2VE7ikCrpyQDg11KW3PgvL2XMchcWi2T3N75pX5YNeMevLmjyepRHb91M+A7WfKt
+sNg6krtRU8arPKw48rRiiAsY7f9SJnVYt8A/8qrB7MmQEVlUUByzm5ENMQrr1AaV
+0lwu1Xpu8YP+YqtJFc1Yo5QJFl/ERtOf8M6BCHJ44B4c2Dtu+14Apf/cTKVtKZPP
+itcvHWEtKnFjJhz1zYUOkQ2d5vhpsVVKWJXOrPZDWNp09xjyflz01JMZlszd8sM7
+Mmfn7pCPwDViG5rBdz34x78Oz+BO7vdxO/Cr42o1iOIaC0vTfhpMISiTvXeanzgd
+wfgKoTrFvWVxY5LVu1rDvna5FudxCnlhB0PI6qlf6go66tjSxGaH7qLqwCV34vd1
+GKAKhoDQ8FaQExepJv793PVPLrTQEVArMdKUecvgbTWU5zZ46q8AlKyVVY6o5+KM
+xHXNCoLh3kK7CljAQEXMd8SIARCPNb7jQGQMp9ttK2q/dSqntUS5TqYYkhTDlDkO
+PFhD7RWiXw7zSxuMPolJrdKA6Ov4LE6WqMTdGnu6ErN5v7XqrWfd527Z/qHp9qap
+ss3D5DQyLAjGKGN6AHiixGuQnvgZ19Wu0D88g/SM9FGRZyFgO2k=
+=FInA
+-END PGP SIGNATURE-

Added: dev/spark/v3.5.0-rc1-bin/pyspark-3.5.0.tar.gz.sha512
==
--- dev/spark/v3.5.0-rc1-bin/pyspark-3.5.0.tar.gz.sha512 (added)
+++ dev/spark/v3.5.0-rc1-bin/pyspark-3.5.0.tar.gz.sha512 Wed Aug  2

[spark] branch branch-3.5 updated: [MINOR][CONNECT] Fix some typos in connect server module

2023-08-01 Thread yangjie01

This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new e8e425e5683 [MINOR][CONNECT] Fix some typos in connect server module
e8e425e5683 is described below

commit e8e425e5683738f5262edb44f2e4c0eca1ad0e0e
Author: yangjie01 
AuthorDate: Wed Aug 2 10:52:32 2023 +0800

[MINOR][CONNECT] Fix some typos in connect server module

### What changes were proposed in this pull request?
This pr just fix some typos in connect server module

### Why are the changes needed?
Fix typos

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GitHub Actions

Closes #42259 from LuciferYang/connect-server-typo.

Authored-by: yangjie01 
Signed-off-by: yangjie01 
(cherry picked from commit 0a9bc446163e0d3b06c8bf774d728d3770ac405b)
Signed-off-by: yangjie01 
---
 .../src/main/scala/org/apache/spark/sql/connect/dsl/package.scala | 2 +-
 .../org/apache/spark/sql/connect/execution/ExecuteThreadRunner.scala  | 4 ++--
 .../spark/sql/connect/planner/StreamingForeachBatchHelper.scala   | 4 ++--
 .../scala/org/apache/spark/sql/connect/service/ExecuteHolder.scala| 2 +-
 .../spark/sql/connect/service/SparkConnectInterceptorRegistry.scala   | 2 +-
 .../org/apache/spark/sql/connect/service/SparkConnectService.scala| 4 ++--
 .../spark/sql/connect/service/SparkConnectStreamingQueryCache.scala   | 2 +-
 .../org/apache/spark/sql/connect/ui/SparkConnectServerListener.scala  | 2 +-
 8 files changed, 11 insertions(+), 11 deletions(-)

diff --git 
a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/dsl/package.scala
 
b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/dsl/package.scala
index 25d722cf58d..86c38277c1b 100644
--- 
a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/dsl/package.scala
+++ 
b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/dsl/package.scala
@@ -1056,7 +1056,7 @@ package object dsl {
   def randomSplit(weights: Array[Double], seed: Long): Array[Relation] = {
 require(
   weights.forall(_ >= 0),
-  s"Weights must be nonnegative, but got ${weights.mkString("[", ",", 
"]")}")
+  s"Weights must be non-negative, but got ${weights.mkString("[", ",", 
"]")}")
 require(
   weights.sum > 0,
   s"Sum of weights must be positive, but got ${weights.mkString("[", 
",", "]")}")
diff --git 
a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/execution/ExecuteThreadRunner.scala
 
b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/execution/ExecuteThreadRunner.scala
index ec078216c21..662288177dc 100644
--- 
a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/execution/ExecuteThreadRunner.scala
+++ 
b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/execution/ExecuteThreadRunner.scala
@@ -38,7 +38,7 @@ import org.apache.spark.util.Utils
 private[connect] class ExecuteThreadRunner(executeHolder: ExecuteHolder) 
extends Logging {
 
   // The newly created thread will inherit all InheritableThreadLocals used by 
Spark,
-  // e.g. SparkContext.localProperties. If considering implementing a 
threadpool,
+  // e.g. SparkContext.localProperties. If considering implementing a 
thread-pool,
   // forwarding of thread locals needs to be taken into account.
   private var executionThread: Thread = new ExecutionThread()
 
@@ -166,7 +166,7 @@ private[connect] class ExecuteThreadRunner(executeHolder: 
ExecuteHolder) extends
 executeHolder.responseObserver.onNext(createResultComplete())
   }
   synchronized {
-// Prevent interrupt after onCompleted, and throwing error to an 
alredy closed stream.
+// Prevent interrupt after onCompleted, and throwing error to an 
already closed stream.
 completed = true
 executeHolder.responseObserver.onCompleted()
   }
diff --git 
a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/StreamingForeachBatchHelper.scala
 
b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/StreamingForeachBatchHelper.scala
index 3b9ae483cf1..998faf327d0 100644
--- 
a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/StreamingForeachBatchHelper.scala
+++ 
b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/StreamingForeachBatchHelper.scala
@@ -113,8 +113,8 @@ object StreamingForeachBatchHelper extends Logging {
   }
 
   // TODO(SPARK-44433): Improve termination of Processes
-  //   The goal is that when a query is terminated, the python process 
asociated with foreachBatch
-  //   should be terminated. One way to do that is by registering

[spark] branch master updated: [MINOR][CONNECT] Fix some typos in connect server module

2023-08-01 Thread yangjie01

This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 0a9bc446163 [MINOR][CONNECT] Fix some typos in connect server module
0a9bc446163 is described below

commit 0a9bc446163e0d3b06c8bf774d728d3770ac405b
Author: yangjie01 
AuthorDate: Wed Aug 2 10:52:32 2023 +0800

[MINOR][CONNECT] Fix some typos in connect server module

### What changes were proposed in this pull request?
This pr just fix some typos in connect server module

### Why are the changes needed?
Fix typos

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GitHub Actions

Closes #42259 from LuciferYang/connect-server-typo.

Authored-by: yangjie01 
Signed-off-by: yangjie01 
---
 .../src/main/scala/org/apache/spark/sql/connect/dsl/package.scala | 2 +-
 .../org/apache/spark/sql/connect/execution/ExecuteThreadRunner.scala  | 4 ++--
 .../spark/sql/connect/planner/StreamingForeachBatchHelper.scala   | 4 ++--
 .../scala/org/apache/spark/sql/connect/service/ExecuteHolder.scala| 2 +-
 .../spark/sql/connect/service/SparkConnectInterceptorRegistry.scala   | 2 +-
 .../org/apache/spark/sql/connect/service/SparkConnectService.scala| 4 ++--
 .../spark/sql/connect/service/SparkConnectStreamingQueryCache.scala   | 2 +-
 .../org/apache/spark/sql/connect/ui/SparkConnectServerListener.scala  | 2 +-
 8 files changed, 11 insertions(+), 11 deletions(-)

diff --git 
a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/dsl/package.scala
 
b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/dsl/package.scala
index 25d722cf58d..86c38277c1b 100644
--- 
a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/dsl/package.scala
+++ 
b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/dsl/package.scala
@@ -1056,7 +1056,7 @@ package object dsl {
   def randomSplit(weights: Array[Double], seed: Long): Array[Relation] = {
 require(
   weights.forall(_ >= 0),
-  s"Weights must be nonnegative, but got ${weights.mkString("[", ",", 
"]")}")
+  s"Weights must be non-negative, but got ${weights.mkString("[", ",", 
"]")}")
 require(
   weights.sum > 0,
   s"Sum of weights must be positive, but got ${weights.mkString("[", 
",", "]")}")
diff --git 
a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/execution/ExecuteThreadRunner.scala
 
b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/execution/ExecuteThreadRunner.scala
index ec078216c21..662288177dc 100644
--- 
a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/execution/ExecuteThreadRunner.scala
+++ 
b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/execution/ExecuteThreadRunner.scala
@@ -38,7 +38,7 @@ import org.apache.spark.util.Utils
 private[connect] class ExecuteThreadRunner(executeHolder: ExecuteHolder) 
extends Logging {
 
   // The newly created thread will inherit all InheritableThreadLocals used by 
Spark,
-  // e.g. SparkContext.localProperties. If considering implementing a 
threadpool,
+  // e.g. SparkContext.localProperties. If considering implementing a 
thread-pool,
   // forwarding of thread locals needs to be taken into account.
   private var executionThread: Thread = new ExecutionThread()
 
@@ -166,7 +166,7 @@ private[connect] class ExecuteThreadRunner(executeHolder: 
ExecuteHolder) extends
 executeHolder.responseObserver.onNext(createResultComplete())
   }
   synchronized {
-// Prevent interrupt after onCompleted, and throwing error to an 
alredy closed stream.
+// Prevent interrupt after onCompleted, and throwing error to an 
already closed stream.
 completed = true
 executeHolder.responseObserver.onCompleted()
   }
diff --git 
a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/StreamingForeachBatchHelper.scala
 
b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/StreamingForeachBatchHelper.scala
index 3b9ae483cf1..998faf327d0 100644
--- 
a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/StreamingForeachBatchHelper.scala
+++ 
b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/StreamingForeachBatchHelper.scala
@@ -113,8 +113,8 @@ object StreamingForeachBatchHelper extends Logging {
   }
 
   // TODO(SPARK-44433): Improve termination of Processes
-  //   The goal is that when a query is terminated, the python process 
asociated with foreachBatch
-  //   should be terminated. One way to do that is by registering stremaing 
query listener:
+  //   The goal is that when a query is terminated, the python process 
associated

[spark] branch branch-3.5 updated (d8ce4a70c93 -> 495447f3c04)

2023-08-01 Thread liyuanjian

This is an automated email from the ASF dual-hosted git repository.

liyuanjian pushed a change to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


from d8ce4a70c93 [SPARK-42497][FOLLOWUPS][TESTS] Add missing UTs to 
`modules.py`
 add 7e862c01fc9 Preparing Spark release v3.5.0-rc1
 new 495447f3c04 Preparing development version 3.5.1-SNAPSHOT

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 R/pkg/DESCRIPTION  | 2 +-
 assembly/pom.xml   | 2 +-
 common/kvstore/pom.xml | 2 +-
 common/network-common/pom.xml  | 2 +-
 common/network-shuffle/pom.xml | 2 +-
 common/network-yarn/pom.xml| 2 +-
 common/sketch/pom.xml  | 2 +-
 common/tags/pom.xml| 2 +-
 common/unsafe/pom.xml  | 2 +-
 common/utils/pom.xml   | 2 +-
 connector/avro/pom.xml | 2 +-
 connector/connect/client/jvm/pom.xml   | 2 +-
 connector/connect/common/pom.xml   | 2 +-
 connector/connect/server/pom.xml   | 2 +-
 connector/docker-integration-tests/pom.xml | 2 +-
 connector/kafka-0-10-assembly/pom.xml  | 2 +-
 connector/kafka-0-10-sql/pom.xml   | 2 +-
 connector/kafka-0-10-token-provider/pom.xml| 2 +-
 connector/kafka-0-10/pom.xml   | 2 +-
 connector/kinesis-asl-assembly/pom.xml | 2 +-
 connector/kinesis-asl/pom.xml  | 2 +-
 connector/protobuf/pom.xml | 2 +-
 connector/spark-ganglia-lgpl/pom.xml   | 2 +-
 core/pom.xml   | 2 +-
 docs/_config.yml   | 6 +++---
 examples/pom.xml   | 2 +-
 graphx/pom.xml | 2 +-
 hadoop-cloud/pom.xml   | 2 +-
 launcher/pom.xml   | 2 +-
 mllib-local/pom.xml| 2 +-
 mllib/pom.xml  | 2 +-
 pom.xml| 2 +-
 python/pyspark/version.py  | 2 +-
 repl/pom.xml   | 2 +-
 resource-managers/kubernetes/core/pom.xml  | 2 +-
 resource-managers/kubernetes/integration-tests/pom.xml | 2 +-
 resource-managers/mesos/pom.xml| 2 +-
 resource-managers/yarn/pom.xml | 2 +-
 sql/api/pom.xml| 2 +-
 sql/catalyst/pom.xml   | 2 +-
 sql/core/pom.xml   | 2 +-
 sql/hive-thriftserver/pom.xml  | 2 +-
 sql/hive/pom.xml   | 2 +-
 streaming/pom.xml  | 2 +-
 tools/pom.xml  | 2 +-
 45 files changed, 47 insertions(+), 47 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] 01/01: Preparing development version 3.5.1-SNAPSHOT

2023-08-01 Thread liyuanjian

This is an automated email from the ASF dual-hosted git repository.

liyuanjian pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git

commit 495447f3c04b389f8b13d314714604611532d062
Author: Yuanjian Li 
AuthorDate: Wed Aug 2 02:19:22 2023 +

Preparing development version 3.5.1-SNAPSHOT
---
 R/pkg/DESCRIPTION  | 2 +-
 assembly/pom.xml   | 2 +-
 common/kvstore/pom.xml | 2 +-
 common/network-common/pom.xml  | 2 +-
 common/network-shuffle/pom.xml | 2 +-
 common/network-yarn/pom.xml| 2 +-
 common/sketch/pom.xml  | 2 +-
 common/tags/pom.xml| 2 +-
 common/unsafe/pom.xml  | 2 +-
 common/utils/pom.xml   | 2 +-
 connector/avro/pom.xml | 2 +-
 connector/connect/client/jvm/pom.xml   | 2 +-
 connector/connect/common/pom.xml   | 2 +-
 connector/connect/server/pom.xml   | 2 +-
 connector/docker-integration-tests/pom.xml | 2 +-
 connector/kafka-0-10-assembly/pom.xml  | 2 +-
 connector/kafka-0-10-sql/pom.xml   | 2 +-
 connector/kafka-0-10-token-provider/pom.xml| 2 +-
 connector/kafka-0-10/pom.xml   | 2 +-
 connector/kinesis-asl-assembly/pom.xml | 2 +-
 connector/kinesis-asl/pom.xml  | 2 +-
 connector/protobuf/pom.xml | 2 +-
 connector/spark-ganglia-lgpl/pom.xml   | 2 +-
 core/pom.xml   | 2 +-
 docs/_config.yml   | 6 +++---
 examples/pom.xml   | 2 +-
 graphx/pom.xml | 2 +-
 hadoop-cloud/pom.xml   | 2 +-
 launcher/pom.xml   | 2 +-
 mllib-local/pom.xml| 2 +-
 mllib/pom.xml  | 2 +-
 pom.xml| 2 +-
 python/pyspark/version.py  | 2 +-
 repl/pom.xml   | 2 +-
 resource-managers/kubernetes/core/pom.xml  | 2 +-
 resource-managers/kubernetes/integration-tests/pom.xml | 2 +-
 resource-managers/mesos/pom.xml| 2 +-
 resource-managers/yarn/pom.xml | 2 +-
 sql/api/pom.xml| 2 +-
 sql/catalyst/pom.xml   | 2 +-
 sql/core/pom.xml   | 2 +-
 sql/hive-thriftserver/pom.xml  | 2 +-
 sql/hive/pom.xml   | 2 +-
 streaming/pom.xml  | 2 +-
 tools/pom.xml  | 2 +-
 45 files changed, 47 insertions(+), 47 deletions(-)

diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index 1c093a4a980..66faa8031c4 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: SparkR
 Type: Package
-Version: 3.5.0
+Version: 3.5.1
 Title: R Front End for 'Apache Spark'
 Description: Provides an R Front end for 'Apache Spark' 
.
 Authors@R:
diff --git a/assembly/pom.xml b/assembly/pom.xml
index a389e3fe9a5..d97f724f0b5 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.5.0
+3.5.1-SNAPSHOT
 ../pom.xml
   
 
diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml
index ce180f49ff1..1b1a8d0066f 100644
--- a/common/kvstore/pom.xml
+++ b/common/kvstore/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.5.0
+3.5.1-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index 8da48076a43..54c10a05eed 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.5.0
+3.5.1-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index 48e64d21a58..92bf5bc0785 100644
--- a/common/network-shuffle/pom.xml
+++ b/common/network-shuffle/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.5.0
+3.5.1-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml
index 2bbacbe71a4..3003927e713 100644
--- a/common/network-yarn/pom.xml
+++ b/common/network-yarn/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.5.0

[spark] tag v3.5.0-rc1 created (now 7e862c01fc9)

2023-08-01 Thread liyuanjian

This is an automated email from the ASF dual-hosted git repository.

liyuanjian pushed a change to tag v3.5.0-rc1
in repository https://gitbox.apache.org/repos/asf/spark.git


  at 7e862c01fc9 (commit)
This tag includes the following new commits:

 new 7e862c01fc9 Preparing Spark release v3.5.0-rc1

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] 01/01: Preparing Spark release v3.5.0-rc1

2023-08-01 Thread liyuanjian

This is an automated email from the ASF dual-hosted git repository.

liyuanjian pushed a commit to tag v3.5.0-rc1
in repository https://gitbox.apache.org/repos/asf/spark.git

commit 7e862c01fc9a1d3b47764df8b6a4b5c4cafb0807
Author: Yuanjian Li 
AuthorDate: Wed Aug 2 02:19:19 2023 +

Preparing Spark release v3.5.0-rc1
---
 assembly/pom.xml   | 2 +-
 common/kvstore/pom.xml | 2 +-
 common/network-common/pom.xml  | 2 +-
 common/network-shuffle/pom.xml | 2 +-
 common/network-yarn/pom.xml| 2 +-
 common/sketch/pom.xml  | 2 +-
 common/tags/pom.xml| 2 +-
 common/unsafe/pom.xml  | 2 +-
 common/utils/pom.xml   | 2 +-
 connector/avro/pom.xml | 2 +-
 connector/connect/client/jvm/pom.xml   | 2 +-
 connector/connect/common/pom.xml   | 2 +-
 connector/connect/server/pom.xml   | 2 +-
 connector/docker-integration-tests/pom.xml | 2 +-
 connector/kafka-0-10-assembly/pom.xml  | 2 +-
 connector/kafka-0-10-sql/pom.xml   | 2 +-
 connector/kafka-0-10-token-provider/pom.xml| 2 +-
 connector/kafka-0-10/pom.xml   | 2 +-
 connector/kinesis-asl-assembly/pom.xml | 2 +-
 connector/kinesis-asl/pom.xml  | 2 +-
 connector/protobuf/pom.xml | 2 +-
 connector/spark-ganglia-lgpl/pom.xml   | 2 +-
 core/pom.xml   | 2 +-
 docs/_config.yml   | 4 ++--
 examples/pom.xml   | 2 +-
 graphx/pom.xml | 2 +-
 hadoop-cloud/pom.xml   | 2 +-
 launcher/pom.xml   | 2 +-
 mllib-local/pom.xml| 2 +-
 mllib/pom.xml  | 2 +-
 pom.xml| 2 +-
 python/pyspark/version.py  | 2 +-
 repl/pom.xml   | 2 +-
 resource-managers/kubernetes/core/pom.xml  | 2 +-
 resource-managers/kubernetes/integration-tests/pom.xml | 2 +-
 resource-managers/mesos/pom.xml| 2 +-
 resource-managers/yarn/pom.xml | 2 +-
 sql/api/pom.xml| 2 +-
 sql/catalyst/pom.xml   | 2 +-
 sql/core/pom.xml   | 2 +-
 sql/hive-thriftserver/pom.xml  | 2 +-
 sql/hive/pom.xml   | 2 +-
 streaming/pom.xml  | 2 +-
 tools/pom.xml  | 2 +-
 44 files changed, 45 insertions(+), 45 deletions(-)

diff --git a/assembly/pom.xml b/assembly/pom.xml
index 09d6bd8a33f..a389e3fe9a5 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.5.0-SNAPSHOT
+3.5.0
 ../pom.xml
   
 
diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml
index bef8303874b..ce180f49ff1 100644
--- a/common/kvstore/pom.xml
+++ b/common/kvstore/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.5.0-SNAPSHOT
+3.5.0
 ../../pom.xml
   
 
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index 2b43f9ce98a..8da48076a43 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.5.0-SNAPSHOT
+3.5.0
 ../../pom.xml
   
 
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index a8bde14a259..48e64d21a58 100644
--- a/common/network-shuffle/pom.xml
+++ b/common/network-shuffle/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.5.0-SNAPSHOT
+3.5.0
 ../../pom.xml
   
 
diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml
index 671d5cb7e01..2bbacbe71a4 100644
--- a/common/network-yarn/pom.xml
+++ b/common/network-yarn/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.5.0-SNAPSHOT
+3.5.0
 ../../pom.xml
   
 
diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml
index 4cc597519c3..fca31591b1e 100644
--- a/common/sketch/pom.xml
+++ b/common/sketch/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.5.0-SNAPSHOT
+3.5.0
 ../../pom.xml
   
 
diff --git a/common/tags/pom.xml b/common/tags/pom.xml
index 9a44c847d8a..a93e227655e 100644
---

[spark] branch branch-3.5 updated: [SPARK-42497][FOLLOWUPS][TESTS] Add missing UTs to `modules.py`

2023-08-01 Thread ruifengz

This is an automated email from the ASF dual-hosted git repository.

ruifengz pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new d8ce4a70c93 [SPARK-42497][FOLLOWUPS][TESTS] Add missing UTs to 
`modules.py`
d8ce4a70c93 is described below

commit d8ce4a70c930a292c004244be42a66248e8c0595
Author: Ruifeng Zheng 
AuthorDate: Wed Aug 2 09:39:10 2023 +0800

[SPARK-42497][FOLLOWUPS][TESTS] Add missing UTs to `modules.py`

### What changes were proposed in this pull request?
add missing UTs

### Why are the changes needed?
the two UTs were missing in https://github.com/apache/spark/pull/40525

### Does this PR introduce _any_ user-facing change?
no. test-only

### How was this patch tested?
updated CI

Closes #42262 from zhengruifeng/ps_test_ewm.

Authored-by: Ruifeng Zheng 
Signed-off-by: Ruifeng Zheng 
(cherry picked from commit 6891f2c0ac8ccf1b77be876abdf03afed190c500)
Signed-off-by: Ruifeng Zheng 
---
 dev/sparktestsupport/modules.py | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/dev/sparktestsupport/modules.py b/dev/sparktestsupport/modules.py
index 4005c317e62..4435d19810b 100644
--- a/dev/sparktestsupport/modules.py
+++ b/dev/sparktestsupport/modules.py
@@ -703,6 +703,7 @@ pyspark_pandas = Module(
 "pyspark.pandas.tests.test_default_index",
 "pyspark.pandas.tests.test_expanding",
 "pyspark.pandas.tests.test_extension",
+"pyspark.pandas.tests.test_ewm",
 "pyspark.pandas.tests.test_frame_spark",
 "pyspark.pandas.tests.test_generic_functions",
 "pyspark.pandas.tests.test_indexops_spark",
@@ -928,6 +929,7 @@ pyspark_pandas_connect = Module(
 "pyspark.pandas.tests.connect.test_parity_default_index",
 "pyspark.pandas.tests.connect.test_parity_expanding",
 "pyspark.pandas.tests.connect.test_parity_extension",
+"pyspark.pandas.tests.connect.test_parity_ewm",
 "pyspark.pandas.tests.connect.test_parity_frame_spark",
 "pyspark.pandas.tests.connect.test_parity_generic_functions",
 "pyspark.pandas.tests.connect.test_parity_indexops_spark",


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-42497][FOLLOWUPS][TESTS] Add missing UTs to `modules.py`

2023-08-01 Thread ruifengz

This is an automated email from the ASF dual-hosted git repository.

ruifengz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 6891f2c0ac8 [SPARK-42497][FOLLOWUPS][TESTS] Add missing UTs to 
`modules.py`
6891f2c0ac8 is described below

commit 6891f2c0ac8ccf1b77be876abdf03afed190c500
Author: Ruifeng Zheng 
AuthorDate: Wed Aug 2 09:39:10 2023 +0800

[SPARK-42497][FOLLOWUPS][TESTS] Add missing UTs to `modules.py`

### What changes were proposed in this pull request?
add missing UTs

### Why are the changes needed?
the two UTs were missing in https://github.com/apache/spark/pull/40525

### Does this PR introduce _any_ user-facing change?
no. test-only

### How was this patch tested?
updated CI

Closes #42262 from zhengruifeng/ps_test_ewm.

Authored-by: Ruifeng Zheng 
Signed-off-by: Ruifeng Zheng 
---
 dev/sparktestsupport/modules.py | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/dev/sparktestsupport/modules.py b/dev/sparktestsupport/modules.py
index 1caf24383c7..9e45e0facef 100644
--- a/dev/sparktestsupport/modules.py
+++ b/dev/sparktestsupport/modules.py
@@ -703,6 +703,7 @@ pyspark_pandas = Module(
 "pyspark.pandas.tests.test_default_index",
 "pyspark.pandas.tests.test_expanding",
 "pyspark.pandas.tests.test_extension",
+"pyspark.pandas.tests.test_ewm",
 "pyspark.pandas.tests.test_frame_spark",
 "pyspark.pandas.tests.test_generic_functions",
 "pyspark.pandas.tests.test_indexops_spark",
@@ -928,6 +929,7 @@ pyspark_pandas_connect = Module(
 "pyspark.pandas.tests.connect.test_parity_default_index",
 "pyspark.pandas.tests.connect.test_parity_expanding",
 "pyspark.pandas.tests.connect.test_parity_extension",
+"pyspark.pandas.tests.connect.test_parity_ewm",
 "pyspark.pandas.tests.connect.test_parity_frame_spark",
 "pyspark.pandas.tests.connect.test_parity_generic_functions",
 "pyspark.pandas.tests.connect.test_parity_indexops_spark",


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.5 updated: [SPARK-42941][SS][CONNECT][3.5] Python StreamingQueryListener

2023-08-01 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 808fd88bf4f [SPARK-42941][SS][CONNECT][3.5] Python 
StreamingQueryListener
808fd88bf4f is described below

commit 808fd88bf4fb96d3277641897f2a8fffdeb77f73
Author: bogao007 
AuthorDate: Wed Aug 2 09:02:25 2023 +0900

[SPARK-42941][SS][CONNECT][3.5] Python StreamingQueryListener

### What changes were proposed in this pull request?

Implement the python streaming query listener and the addListener method 
and removeListener method, follow up filed in: 
[SPARK-44516](https://issues.apache.org/jira/browse/SPARK-44516) to actually 
terminate the query listener process when removeListener is called. 
[SPARK-44516](https://issues.apache.org/jira/browse/SPARK-44516) depends on 
[SPARK-44433](https://issues.apache.org/jira/browse/SPARK-44433).

### Why are the changes needed?

SS Connect development

### Does this PR introduce _any_ user-facing change?

Yes now they can use connect listener

### How was this patch tested?

Manual test and added unit test

**addListener:**
```
# Client side:
>>> from pyspark.sql.streaming.listener import StreamingQueryListener;from 
pyspark.sql.streaming.listener import (QueryStartedEvent, QueryProgressEvent, 
QueryTerminatedEvent, QueryIdleEvent)
>>> class MyListener(StreamingQueryListener):
... def onQueryStarted(self, event: QueryStartedEvent) -> None: 
print("hi, event query id is: " +  str(event.id)); 
df=self.spark.createDataFrame(["10","11","13"], "string").toDF("age"); 
df.write.saveAsTable("tbllistener1")
... def onQueryProgress(self, event: QueryProgressEvent) -> None: pass
... def onQueryIdle(self, event: QueryIdleEvent) -> None: pass
... def onQueryTerminated(self, event: QueryTerminatedEvent) -> None: 
pass
...
>>> spark.streams.addListener(MyListener())
>>> q = 
spark.readStream.format("rate").load().writeStream.format("console").start()
>>> q.stop()
>>> spark.read.table("tbllistener1").collect()
[Row(age='13'), Row(age='10'), Row(age='11’)]

# Server side:
# event_type received from python process is 0
hi, event query id is: dd7ba1c4-6c8f-4369-9c3c-5dede22b8a2f
```

**removeListener:**

```
# Client side:
>>> listener = MyListener(); spark.streams.addListener(listener)
>>> spark.streams.removeListener(listener)

# Server side:
# nothing to print actually, the listener is removed from server side 
StreamingQueryManager and cache in sessionHolder, but the process still hangs 
there. Follow up SPARK-44516 filed to stop this process
```

Closes #42250 from bogao007/3.5-branch-sync.

Lead-authored-by: bogao007 
Co-authored-by: Wei Liu 
Signed-off-by: Hyukjin Kwon 
---
 .../sql/streaming/StreamingQueryManager.scala  |   6 +-
 .../sql/streaming/ClientStreamingQuerySuite.scala  |   2 +-
 .../src/main/protobuf/spark/connect/commands.proto |   2 +
 .../sql/connect/planner/SparkConnectPlanner.scala  |  30 +--
 .../planner/StreamingForeachBatchHelper.scala  |   9 +-
 .../planner/StreamingQueryListenerHelper.scala |  69 +++
 .../spark/api/python/PythonWorkerFactory.scala |   6 +-
 .../spark/api/python/StreamingPythonRunner.scala   |   5 +-
 dev/sparktestsupport/modules.py|   1 +
 python/pyspark/sql/connect/proto/commands_pb2.py   |  44 ++--
 python/pyspark/sql/connect/proto/commands_pb2.pyi  |  37 +++-
 python/pyspark/sql/connect/streaming/query.py  |  31 +--
 .../sql/connect/streaming/worker/__init__.py   |  18 ++
 .../streaming/worker/foreachBatch_worker.py}   |  18 +-
 .../connect/streaming/worker/listener_worker.py}   |  53 +++--
 python/pyspark/sql/streaming/listener.py   |  29 ++-
 python/pyspark/sql/streaming/query.py  |  12 ++
 .../connect/streaming/test_parity_listener.py  |  90 
 .../sql/tests/streaming/test_streaming_listener.py | 228 +++--
 python/setup.py|   1 +
 20 files changed, 484 insertions(+), 207 deletions(-)

diff --git 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryManager.scala
 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryManager.scala
index 91744460440..d16638e5945 100644
--- 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryManager.scala
+++ 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryManager.scala
@@ -156,7 +156,8 @@ class StreamingQueryManager private[sql] (sparkSession: 
SparkSession) extends Lo
 executeManagerCmd(

[spark] branch master updated: [SPARK-44218][PYTHON] Customize diff log in assertDataFrameEqual error message format

2023-08-01 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 4f66f091ad5 [SPARK-44218][PYTHON] Customize diff log in 
assertDataFrameEqual error message format
4f66f091ad5 is described below

commit 4f66f091ad5dbdd9177a7550a8da7e266a869147
Author: Amanda Liu 
AuthorDate: Wed Aug 2 08:40:54 2023 +0900

[SPARK-44218][PYTHON] Customize diff log in assertDataFrameEqual error 
message format

### What changes were proposed in this pull request?
This PR improves the error message format for `assertDataFrameEqual` by 
creating a new custom diff log function, `_context_diff_ `to print all Rows for 
`actual` and `expected` and highlight their different rows in color.

### Why are the changes needed?
The change is needed to clarify the error message for unequal DataFrames.

### Does this PR introduce _any_ user-facing change?
Yes, the PR affects the error message display for users. See the new error 
message output examples below.

### How was this patch tested?
Added tests to `python/pyspark/sql/tests/test_utils.py` and 
`python/pyspark/sql/tests/connect/test_utils.py`.

Example error messages:
https://github.com/apache/spark/assets/68875504/04b5b985-4670-4d4b-8032-9704523a3df1;>

https://github.com/apache/spark/assets/68875504/d7b420c0-80d3-4853-b8e5-48ff6b459dc7;>

Closes #42196 from asl3/update-difflib.

Authored-by: Amanda Liu 
Signed-off-by: Hyukjin Kwon 
---
 python/pyspark/errors/error_classes.py|  24 ++-
 python/pyspark/pandas/tests/test_utils.py |   1 -
 python/pyspark/sql/tests/test_utils.py| 269 --
 python/pyspark/testing/utils.py   |  69 +++-
 4 files changed, 261 insertions(+), 102 deletions(-)

diff --git a/python/pyspark/errors/error_classes.py 
b/python/pyspark/errors/error_classes.py
index f1396c49af0..d6f093246da 100644
--- a/python/pyspark/errors/error_classes.py
+++ b/python/pyspark/errors/error_classes.py
@@ -167,36 +167,44 @@ ERROR_CLASSES_JSON = """
   "DIFFERENT_PANDAS_DATAFRAME" : {
 "message" : [
   "DataFrames are not almost equal:",
-  "Left: ",
+  "Left:",
+  "",
   "",
-  "Right: ",
+  "Right:",
+  "",
   ""
 ]
   },
   "DIFFERENT_PANDAS_INDEX" : {
 "message" : [
   "Indices are not almost equal:",
-  "Left: ",
+  "Left:",
+  "",
   "",
-  "Right: ",
+  "Right:",
+  "",
   ""
 ]
   },
   "DIFFERENT_PANDAS_MULTIINDEX" : {
 "message" : [
   "MultiIndices are not almost equal:",
-  "Left: ",
+  "Left:",
+  "",
   "",
-  "Right: ",
+  "Right:",
+  "",
   ""
 ]
   },
   "DIFFERENT_PANDAS_SERIES" : {
 "message" : [
   "Series are not almost equal:",
-  "Left: ",
+  "Left:",
+  "",
   "",
-  "Right: ",
+  "Right:",
+  "",
   ""
 ]
   },
diff --git a/python/pyspark/pandas/tests/test_utils.py 
b/python/pyspark/pandas/tests/test_utils.py
index de7b0449dae..3d658446f27 100644
--- a/python/pyspark/pandas/tests/test_utils.py
+++ b/python/pyspark/pandas/tests/test_utils.py
@@ -16,7 +16,6 @@
 #
 
 import pandas as pd
-from typing import Union
 
 from pyspark.pandas.indexes.base import Index
 from pyspark.pandas.utils import (
diff --git a/python/pyspark/sql/tests/test_utils.py 
b/python/pyspark/sql/tests/test_utils.py
index b7ab596880f..76d397e3ade 100644
--- a/python/pyspark/sql/tests/test_utils.py
+++ b/python/pyspark/sql/tests/test_utils.py
@@ -23,7 +23,7 @@ from pyspark.errors import (
 IllegalArgumentException,
 SparkUpgradeException,
 )
-from pyspark.testing.utils import assertDataFrameEqual, assertSchemaEqual
+from pyspark.testing.utils import assertDataFrameEqual, assertSchemaEqual, 
_context_diff
 from pyspark.testing.sqlutils import ReusedSQLTestCase
 from pyspark.sql import Row
 import pyspark.sql.functions as F
@@ -44,6 +44,7 @@ from pyspark.sql.dataframe import DataFrame
 
 import difflib
 from typing import List, Union
+from itertools import zip_longest
 
 
 class UtilsTestsMixin:
@@ -151,17 +152,22 @@ class UtilsTestsMixin:
 ),
 )
 
-expected_error_message = "Results do not match: "
-percent_diff = (1 / 2) * 100
-expected_error_message += "( %.5f %% )" % percent_diff
+rows_str1 = ""
+rows_str2 = ""
+
+# count different rows
+for r1, r2 in list(zip_longest(df1.collect(), df2.collect())):
+rows_str1 += str(r1) + "\n"
+rows_str2 += str(r2) + "\n"
 
-generated_diff = difflib.ndiff(
-str(df1.collect()[1]).splitlines(), 
str(df2.collect()[1]).splitlines()
+generated_diff = _context_diff(
+actual=rows_str1.splitlines(),

[spark] branch branch-3.5 updated: [SPARK-44218][PYTHON] Customize diff log in assertDataFrameEqual error message format

2023-08-01 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new dbee20a426e [SPARK-44218][PYTHON] Customize diff log in 
assertDataFrameEqual error message format
dbee20a426e is described below

commit dbee20a426e8a290e83131dddf23c35b5b249959
Author: Amanda Liu 
AuthorDate: Wed Aug 2 08:40:54 2023 +0900

[SPARK-44218][PYTHON] Customize diff log in assertDataFrameEqual error 
message format

### What changes were proposed in this pull request?
This PR improves the error message format for `assertDataFrameEqual` by 
creating a new custom diff log function, `_context_diff_ `to print all Rows for 
`actual` and `expected` and highlight their different rows in color.

### Why are the changes needed?
The change is needed to clarify the error message for unequal DataFrames.

### Does this PR introduce _any_ user-facing change?
Yes, the PR affects the error message display for users. See the new error 
message output examples below.

### How was this patch tested?
Added tests to `python/pyspark/sql/tests/test_utils.py` and 
`python/pyspark/sql/tests/connect/test_utils.py`.

Example error messages:
https://github.com/apache/spark/assets/68875504/04b5b985-4670-4d4b-8032-9704523a3df1;>

https://github.com/apache/spark/assets/68875504/d7b420c0-80d3-4853-b8e5-48ff6b459dc7;>

Closes #42196 from asl3/update-difflib.

Authored-by: Amanda Liu 
Signed-off-by: Hyukjin Kwon 
(cherry picked from commit 4f66f091ad5dbdd9177a7550a8da7e266a869147)
Signed-off-by: Hyukjin Kwon 
---
 python/pyspark/errors/error_classes.py|  24 ++-
 python/pyspark/pandas/tests/test_utils.py |   1 -
 python/pyspark/sql/tests/test_utils.py| 269 --
 python/pyspark/testing/utils.py   |  69 +++-
 4 files changed, 261 insertions(+), 102 deletions(-)

diff --git a/python/pyspark/errors/error_classes.py 
b/python/pyspark/errors/error_classes.py
index acecc48f0a8..554a25952b9 100644
--- a/python/pyspark/errors/error_classes.py
+++ b/python/pyspark/errors/error_classes.py
@@ -167,36 +167,44 @@ ERROR_CLASSES_JSON = """
   "DIFFERENT_PANDAS_DATAFRAME" : {
 "message" : [
   "DataFrames are not almost equal:",
-  "Left: ",
+  "Left:",
+  "",
   "",
-  "Right: ",
+  "Right:",
+  "",
   ""
 ]
   },
   "DIFFERENT_PANDAS_INDEX" : {
 "message" : [
   "Indices are not almost equal:",
-  "Left: ",
+  "Left:",
+  "",
   "",
-  "Right: ",
+  "Right:",
+  "",
   ""
 ]
   },
   "DIFFERENT_PANDAS_MULTIINDEX" : {
 "message" : [
   "MultiIndices are not almost equal:",
-  "Left: ",
+  "Left:",
+  "",
   "",
-  "Right: ",
+  "Right:",
+  "",
   ""
 ]
   },
   "DIFFERENT_PANDAS_SERIES" : {
 "message" : [
   "Series are not almost equal:",
-  "Left: ",
+  "Left:",
+  "",
   "",
-  "Right: ",
+  "Right:",
+  "",
   ""
 ]
   },
diff --git a/python/pyspark/pandas/tests/test_utils.py 
b/python/pyspark/pandas/tests/test_utils.py
index de7b0449dae..3d658446f27 100644
--- a/python/pyspark/pandas/tests/test_utils.py
+++ b/python/pyspark/pandas/tests/test_utils.py
@@ -16,7 +16,6 @@
 #
 
 import pandas as pd
-from typing import Union
 
 from pyspark.pandas.indexes.base import Index
 from pyspark.pandas.utils import (
diff --git a/python/pyspark/sql/tests/test_utils.py 
b/python/pyspark/sql/tests/test_utils.py
index b7ab596880f..76d397e3ade 100644
--- a/python/pyspark/sql/tests/test_utils.py
+++ b/python/pyspark/sql/tests/test_utils.py
@@ -23,7 +23,7 @@ from pyspark.errors import (
 IllegalArgumentException,
 SparkUpgradeException,
 )
-from pyspark.testing.utils import assertDataFrameEqual, assertSchemaEqual
+from pyspark.testing.utils import assertDataFrameEqual, assertSchemaEqual, 
_context_diff
 from pyspark.testing.sqlutils import ReusedSQLTestCase
 from pyspark.sql import Row
 import pyspark.sql.functions as F
@@ -44,6 +44,7 @@ from pyspark.sql.dataframe import DataFrame
 
 import difflib
 from typing import List, Union
+from itertools import zip_longest
 
 
 class UtilsTestsMixin:
@@ -151,17 +152,22 @@ class UtilsTestsMixin:
 ),
 )
 
-expected_error_message = "Results do not match: "
-percent_diff = (1 / 2) * 100
-expected_error_message += "( %.5f %% )" % percent_diff
+rows_str1 = ""
+rows_str2 = ""
+
+# count different rows
+for r1, r2 in list(zip_longest(df1.collect(), df2.collect())):
+rows_str1 += str(r1) + "\n"
+rows_str2 += str(r2) + "\n"
 
-generated_diff = difflib.ndiff(
-str(df1.collect()[1]).splitlines(), 
str(df2.collect()[1]).splitlines()

[spark] branch branch-3.5 updated: [SPARK-44588][CORE] Fix double encryption issue for migrated shuffle blocks

2023-08-01 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 53799e15c1e [SPARK-44588][CORE] Fix double encryption issue for 
migrated shuffle blocks
53799e15c1e is described below

commit 53799e15c1e4396189449f02667a7716fab4cdef
Author: Henry Mai 
AuthorDate: Tue Aug 1 14:33:10 2023 -0700

[SPARK-44588][CORE] Fix double encryption issue for migrated shuffle blocks

### What changes were proposed in this pull request?

Fix double encryption issue for migrated shuffle blocks

Shuffle blocks upon migration are sent without decryption when 
io.encryption is enabled. The code on the receiving side ends up using 
serializer.wrapStream on the OutputStream to the file which results in the 
already encrypted bytes being encrypted again when the bytes are written out.

This patch removes the usage of serializerManager.wrapStream on the 
receiving side and also adds tests that validate that this works as expected. I 
have also validated that the added tests will fail if the fix is not in place.

Jira ticket with more details: 
https://issues.apache.org/jira/browse/SPARK-44588

### Why are the changes needed?

Migrated shuffle blocks will be double encrypted when `spark.io.encryption 
= true` without this fix.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Unit tests were added to test shuffle block migration with 
spark.io.encryption enabled and also fixes a test helper method to properly 
construct the SerializerManager with the encryption key.

Closes #42214 from henrymai/fix_shuffle_migration_double_encryption.

Authored-by: Henry Mai 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 6f6dd38ce542bf9876b849f85c01f47ee1ec93ca)
Signed-off-by: Dongjoon Hyun 
---
 .../spark/shuffle/IndexShuffleBlockResolver.scala  |  8 ---
 .../apache/spark/storage/BlockManagerSuite.scala   | 28 ++
 2 files changed, 28 insertions(+), 8 deletions(-)

diff --git 
a/core/src/main/scala/org/apache/spark/shuffle/IndexShuffleBlockResolver.scala 
b/core/src/main/scala/org/apache/spark/shuffle/IndexShuffleBlockResolver.scala
index 08dec2e4dd3..919b0f5f7c1 100644
--- 
a/core/src/main/scala/org/apache/spark/shuffle/IndexShuffleBlockResolver.scala
+++ 
b/core/src/main/scala/org/apache/spark/shuffle/IndexShuffleBlockResolver.scala
@@ -242,9 +242,11 @@ private[spark] class IndexShuffleBlockResolver(
   s"${blockId.getClass().getSimpleName()}", category = "SHUFFLE")
 }
 val fileTmp = createTempFile(file)
-val channel = Channels.newChannel(
-  serializerManager.wrapStream(blockId,
-new FileOutputStream(fileTmp)))
+
+// Shuffle blocks' file bytes are being sent directly over the wire, so 
there is no need to
+// serializerManager.wrapStream() on it. Meaning if it was originally 
encrypted, then
+// it will stay encrypted when being written out to the file here.
+val channel = Channels.newChannel(new FileOutputStream(fileTmp))
 
 new StreamCallbackWithID {
 
diff --git 
a/core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala 
b/core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala
index ab6c2693b0e..ecd66dc2c5f 100644
--- a/core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala
+++ b/core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala
@@ -131,7 +131,7 @@ class BlockManagerSuite extends SparkFunSuite with Matchers 
with PrivateMethodTe
   None
 }
 val bmSecurityMgr = new SecurityManager(bmConf, encryptionKey)
-val serializerManager = new SerializerManager(serializer, bmConf)
+val serializerManager = new SerializerManager(serializer, bmConf, 
encryptionKey)
 val transfer = transferService.getOrElse(new NettyBlockTransferService(
   conf, securityMgr, serializerManager, "localhost", "localhost", 0, 1))
 val memManager = UnifiedMemoryManager(bmConf, numCores = 1)
@@ -2033,10 +2033,13 @@ class BlockManagerSuite extends SparkFunSuite with 
Matchers with PrivateMethodTe
 assert(master.getLocations(blockIdLarge) === Seq(store1.blockManagerId))
   }
 
-  private def testShuffleBlockDecommissioning(maxShuffleSize: Option[Int], 
willReject: Boolean) = {
+  private def testShuffleBlockDecommissioning(
+  maxShuffleSize: Option[Int], willReject: Boolean, enableIoEncryption: 
Boolean) = {
 maxShuffleSize.foreach{ size =>
   conf.set(STORAGE_DECOMMISSION_SHUFFLE_MAX_DISK_SIZE.key, s"${size}b")
 }
+conf.set(IO_ENCRYPTION_ENABLED, enableIoEncryption)
+
 val shuffleManager1 = makeSortShuffleManager(Some(conf))
 val bm1 = makeBlockManager(3500, "exec1", shuffleManager = shuffleManager1)

[spark] branch master updated: [SPARK-44588][CORE] Fix double encryption issue for migrated shuffle blocks

2023-08-01 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 6f6dd38ce54 [SPARK-44588][CORE] Fix double encryption issue for 
migrated shuffle blocks
6f6dd38ce54 is described below

commit 6f6dd38ce542bf9876b849f85c01f47ee1ec93ca
Author: Henry Mai 
AuthorDate: Tue Aug 1 14:33:10 2023 -0700

[SPARK-44588][CORE] Fix double encryption issue for migrated shuffle blocks

### What changes were proposed in this pull request?

Fix double encryption issue for migrated shuffle blocks

Shuffle blocks upon migration are sent without decryption when 
io.encryption is enabled. The code on the receiving side ends up using 
serializer.wrapStream on the OutputStream to the file which results in the 
already encrypted bytes being encrypted again when the bytes are written out.

This patch removes the usage of serializerManager.wrapStream on the 
receiving side and also adds tests that validate that this works as expected. I 
have also validated that the added tests will fail if the fix is not in place.

Jira ticket with more details: 
https://issues.apache.org/jira/browse/SPARK-44588

### Why are the changes needed?

Migrated shuffle blocks will be double encrypted when `spark.io.encryption 
= true` without this fix.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Unit tests were added to test shuffle block migration with 
spark.io.encryption enabled and also fixes a test helper method to properly 
construct the SerializerManager with the encryption key.

Closes #42214 from henrymai/fix_shuffle_migration_double_encryption.

Authored-by: Henry Mai 
Signed-off-by: Dongjoon Hyun 
---
 .../spark/shuffle/IndexShuffleBlockResolver.scala  |  8 ---
 .../apache/spark/storage/BlockManagerSuite.scala   | 28 ++
 2 files changed, 28 insertions(+), 8 deletions(-)

diff --git 
a/core/src/main/scala/org/apache/spark/shuffle/IndexShuffleBlockResolver.scala 
b/core/src/main/scala/org/apache/spark/shuffle/IndexShuffleBlockResolver.scala
index 08dec2e4dd3..919b0f5f7c1 100644
--- 
a/core/src/main/scala/org/apache/spark/shuffle/IndexShuffleBlockResolver.scala
+++ 
b/core/src/main/scala/org/apache/spark/shuffle/IndexShuffleBlockResolver.scala
@@ -242,9 +242,11 @@ private[spark] class IndexShuffleBlockResolver(
   s"${blockId.getClass().getSimpleName()}", category = "SHUFFLE")
 }
 val fileTmp = createTempFile(file)
-val channel = Channels.newChannel(
-  serializerManager.wrapStream(blockId,
-new FileOutputStream(fileTmp)))
+
+// Shuffle blocks' file bytes are being sent directly over the wire, so 
there is no need to
+// serializerManager.wrapStream() on it. Meaning if it was originally 
encrypted, then
+// it will stay encrypted when being written out to the file here.
+val channel = Channels.newChannel(new FileOutputStream(fileTmp))
 
 new StreamCallbackWithID {
 
diff --git 
a/core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala 
b/core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala
index ab6c2693b0e..ecd66dc2c5f 100644
--- a/core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala
+++ b/core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala
@@ -131,7 +131,7 @@ class BlockManagerSuite extends SparkFunSuite with Matchers 
with PrivateMethodTe
   None
 }
 val bmSecurityMgr = new SecurityManager(bmConf, encryptionKey)
-val serializerManager = new SerializerManager(serializer, bmConf)
+val serializerManager = new SerializerManager(serializer, bmConf, 
encryptionKey)
 val transfer = transferService.getOrElse(new NettyBlockTransferService(
   conf, securityMgr, serializerManager, "localhost", "localhost", 0, 1))
 val memManager = UnifiedMemoryManager(bmConf, numCores = 1)
@@ -2033,10 +2033,13 @@ class BlockManagerSuite extends SparkFunSuite with 
Matchers with PrivateMethodTe
 assert(master.getLocations(blockIdLarge) === Seq(store1.blockManagerId))
   }
 
-  private def testShuffleBlockDecommissioning(maxShuffleSize: Option[Int], 
willReject: Boolean) = {
+  private def testShuffleBlockDecommissioning(
+  maxShuffleSize: Option[Int], willReject: Boolean, enableIoEncryption: 
Boolean) = {
 maxShuffleSize.foreach{ size =>
   conf.set(STORAGE_DECOMMISSION_SHUFFLE_MAX_DISK_SIZE.key, s"${size}b")
 }
+conf.set(IO_ENCRYPTION_ENABLED, enableIoEncryption)
+
 val shuffleManager1 = makeSortShuffleManager(Some(conf))
 val bm1 = makeBlockManager(3500, "exec1", shuffleManager = shuffleManager1)
 shuffleManager1.shuffleBlockResolver._blockManager = bm1
@@ -2095,15 +2098,30 @@ class BlockManagerSuite extends SparkFunSuite with

[spark] branch master updated: [SPARK-44601][BUILD] Add `jackson-mapper-asl` as test dependency to `hive-thriftserver` module to make Maven test pass

2023-08-01 Thread yangjie01

This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new cdcf12b9fc7 [SPARK-44601][BUILD] Add `jackson-mapper-asl` as test 
dependency to `hive-thriftserver` module to make Maven test pass
cdcf12b9fc7 is described below

commit cdcf12b9fc7026c77d4e7c2b5506e5daa3472ff0
Author: yangjie01 
AuthorDate: Wed Aug 2 04:56:10 2023 +0800

[SPARK-44601][BUILD] Add `jackson-mapper-asl` as test dependency to 
`hive-thriftserver` module to make Maven test pass

### What changes were proposed in this pull request?
Run the following maven test commands to test `hive-thriftserver`

```
./build/mvn -DskipTests -Pyarn -Pmesos -Pkubernetes -Pvolcano -Phive 
-Phive-thriftserver -Phadoop-cloud -Pspark-ganglia-lgpl clean install
./build/mvn -pl sql/hive-thriftserver -Phive -Phive-thriftserver clean 
install
```

There are many similar test failures like:

```
2023-08-01 02:46:59.286 - stderr> Setting default log level to "WARN".
  2023-08-01 02:46:59.287 - stderr> To adjust logging level use 
sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
  2023-08-01 02:47:00.191 - stderr> 2023-08-01 17:47:00.190:INFO::main: 
Logging initialized 2591ms to org.eclipse.jetty.util.log.StdErrLog
  2023-08-01 02:47:03.673 - stderr> Spark master: local, Application Id: 
local-1690883219913
  2023-08-01 02:47:04.582 - stdout> spark-sql> CREATE TABLE t1(key string, 
val string)
  2023-08-01 02:47:04.594 - stdout>  > ROW FORMAT SERDE 
'org.apache.hive.hcatalog.data.JsonSerDe';
  2023-08-01 02:47:05.615 - stderr> org/codehaus/jackson/JsonParseException
  2023-08-01 02:47:05.616 - stderr> java.lang.NoClassDefFoundError: 
org/codehaus/jackson/JsonParseException
  2023-08-01 02:47:05.616 - stderr> at java.lang.Class.forName0(Native 
Method)
  2023-08-01 02:47:05.616 - stderr> at 
java.lang.Class.forName(Class.java:348)
  2023-08-01 02:47:05.616 - stderr> at 
org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:2630)
  2023-08-01 02:47:05.616 - stderr> at 
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2595)
  2023-08-01 02:47:05.616 - stderr> at 
org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:447)
  2023-08-01 02:47:05.616 - stderr> at 
org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:440)
  2023-08-01 02:47:05.616 - stderr> at 
org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:281)
  2023-08-01 02:47:05.616 - stderr> at 
org.apache.hadoop.hive.ql.metadata.Table.getDeserializer(Table.java:263)
  2023-08-01 02:47:05.616 - stderr> at 
org.apache.hadoop.hive.ql.metadata.Table.getColsInternal(Table.java:641)
  2023-08-01 02:47:05.616 - stderr> at 
org.apache.hadoop.hive.ql.metadata.Table.getCols(Table.java:624)
  2023-08-01 02:47:05.616 - stderr> at 
org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:838)
  2023-08-01 02:47:05.616 - stderr> at 
org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:874)
  2023-08-01 02:47:05.616 - stderr> at 
org.apache.spark.sql.hive.client.Shim_v0_12.createTable(HiveShim.scala:614)
  2023-08-01 02:47:05.616 - stderr> at 
org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$createTable$1(HiveClientImpl.scala:573)
  2023-08-01 02:47:05.616 - stderr> at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
  2023-08-01 02:47:05.616 - stderr> at 
org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:303)
  2023-08-01 02:47:05.616 - stderr> at 
org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:234)
  2023-08-01 02:47:05.616 - stderr> at 
org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:233)
  2023-08-01 02:47:05.616 - stderr> at 
org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:283)
  2023-08-01 02:47:05.616 - stderr> at 
org.apache.spark.sql.hive.client.HiveClientImpl.createTable(HiveClientImpl.scala:571)
  2023-08-01 02:47:05.616 - stderr> at 
org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$createTable$1(HiveExternalCatalog.scala:288)
  2023-08-01 02:47:05.616 - stderr> at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
  2023-08-01 02:47:05.616 - stderr> at 
org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:99)
  2023-08-01 02:47:05.616 - stderr> at 
org.apache.spark.sql.hive.HiveExternalCatalog.createTable(HiveExternalCatalog.scala:245)

[spark] branch branch-3.5 updated: [SPARK-44601][BUILD] Add `jackson-mapper-asl` as test dependency to `hive-thriftserver` module to make Maven test pass

2023-08-01 Thread yangjie01

This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 41b2bf85a59 [SPARK-44601][BUILD] Add `jackson-mapper-asl` as test 
dependency to `hive-thriftserver` module to make Maven test pass
41b2bf85a59 is described below

commit 41b2bf85a59c2187f92279ab39fc61060acbf27a
Author: yangjie01 
AuthorDate: Wed Aug 2 04:56:10 2023 +0800

[SPARK-44601][BUILD] Add `jackson-mapper-asl` as test dependency to 
`hive-thriftserver` module to make Maven test pass

### What changes were proposed in this pull request?
Run the following maven test commands to test `hive-thriftserver`

```
./build/mvn -DskipTests -Pyarn -Pmesos -Pkubernetes -Pvolcano -Phive 
-Phive-thriftserver -Phadoop-cloud -Pspark-ganglia-lgpl clean install
./build/mvn -pl sql/hive-thriftserver -Phive -Phive-thriftserver clean 
install
```

There are many similar test failures like:

```
2023-08-01 02:46:59.286 - stderr> Setting default log level to "WARN".
  2023-08-01 02:46:59.287 - stderr> To adjust logging level use 
sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
  2023-08-01 02:47:00.191 - stderr> 2023-08-01 17:47:00.190:INFO::main: 
Logging initialized 2591ms to org.eclipse.jetty.util.log.StdErrLog
  2023-08-01 02:47:03.673 - stderr> Spark master: local, Application Id: 
local-1690883219913
  2023-08-01 02:47:04.582 - stdout> spark-sql> CREATE TABLE t1(key string, 
val string)
  2023-08-01 02:47:04.594 - stdout>  > ROW FORMAT SERDE 
'org.apache.hive.hcatalog.data.JsonSerDe';
  2023-08-01 02:47:05.615 - stderr> org/codehaus/jackson/JsonParseException
  2023-08-01 02:47:05.616 - stderr> java.lang.NoClassDefFoundError: 
org/codehaus/jackson/JsonParseException
  2023-08-01 02:47:05.616 - stderr> at java.lang.Class.forName0(Native 
Method)
  2023-08-01 02:47:05.616 - stderr> at 
java.lang.Class.forName(Class.java:348)
  2023-08-01 02:47:05.616 - stderr> at 
org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:2630)
  2023-08-01 02:47:05.616 - stderr> at 
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2595)
  2023-08-01 02:47:05.616 - stderr> at 
org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:447)
  2023-08-01 02:47:05.616 - stderr> at 
org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:440)
  2023-08-01 02:47:05.616 - stderr> at 
org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:281)
  2023-08-01 02:47:05.616 - stderr> at 
org.apache.hadoop.hive.ql.metadata.Table.getDeserializer(Table.java:263)
  2023-08-01 02:47:05.616 - stderr> at 
org.apache.hadoop.hive.ql.metadata.Table.getColsInternal(Table.java:641)
  2023-08-01 02:47:05.616 - stderr> at 
org.apache.hadoop.hive.ql.metadata.Table.getCols(Table.java:624)
  2023-08-01 02:47:05.616 - stderr> at 
org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:838)
  2023-08-01 02:47:05.616 - stderr> at 
org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:874)
  2023-08-01 02:47:05.616 - stderr> at 
org.apache.spark.sql.hive.client.Shim_v0_12.createTable(HiveShim.scala:614)
  2023-08-01 02:47:05.616 - stderr> at 
org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$createTable$1(HiveClientImpl.scala:573)
  2023-08-01 02:47:05.616 - stderr> at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
  2023-08-01 02:47:05.616 - stderr> at 
org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:303)
  2023-08-01 02:47:05.616 - stderr> at 
org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:234)
  2023-08-01 02:47:05.616 - stderr> at 
org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:233)
  2023-08-01 02:47:05.616 - stderr> at 
org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:283)
  2023-08-01 02:47:05.616 - stderr> at 
org.apache.spark.sql.hive.client.HiveClientImpl.createTable(HiveClientImpl.scala:571)
  2023-08-01 02:47:05.616 - stderr> at 
org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$createTable$1(HiveExternalCatalog.scala:288)
  2023-08-01 02:47:05.616 - stderr> at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
  2023-08-01 02:47:05.616 - stderr> at 
org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:99)
  2023-08-01 02:47:05.616 - stderr> at 
org.apache.spark.sql.hive.HiveExternalCatalog.createTable(HiveExternalCatalog.scala:245)

[spark] branch master updated: [SPARK-44480][SS] Use thread pool to perform maintenance activity for hdfs/rocksdb state store providers

2023-08-01 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 9e42805f77c [SPARK-44480][SS] Use thread pool to perform maintenance 
activity for hdfs/rocksdb state store providers
9e42805f77c is described below

commit 9e42805f77c1ab5beb3c193f47f016e735a68f06
Author: Eric Marnadi 
AuthorDate: Wed Aug 2 04:52:43 2023 +0900

[SPARK-44480][SS] Use thread pool to perform maintenance activity for 
hdfs/rocksdb state store providers

### What changes were proposed in this pull request?

Maintenance tasks on StateStore was being done by a single background 
thread, which is prone to straggling. In this change, the single background 
thread would instead schedule maintenance tasks to a thread pool.

By default, the maximum number of threads in the new thread pool is 
determined via the number of cores * 0.25, so that this thread pool doesn't 
take too many resources away from the query and affect performance. Users can 
set the number of threads explicitly via 
`spark.sql.streaming.stateStore.numStateStoreMaintenanceThreads`.

### Why are the changes needed?

Using a thread pool instead of a single thread for snapshotting and cleanup 
reduces the effect of stragglers in the background task.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Manual testing, wrote a stateful query, tracked how many maintenance tasks 
were done and compared this to the baseline.  `StateStoreSuite` is enough to 
test functional correctness and cleanup here

Closes #42066 from ericm-db/maintenance-thread-pool-optional.

Lead-authored-by: Eric Marnadi 
Co-authored-by: ericm-db <132308037+ericm...@users.noreply.github.com>
Signed-off-by: Jungtaek Lim 
---
 .../org/apache/spark/sql/internal/SQLConf.scala|  13 +++
 .../sql/execution/streaming/state/StateStore.scala | 108 ++---
 .../execution/streaming/state/StateStoreConf.scala |   5 +
 3 files changed, 114 insertions(+), 12 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
index 6eb2d9c38d9..dfa2a0f251f 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
@@ -1856,6 +1856,17 @@ object SQLConf {
   .createWithDefault(
 
"org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider")
 
+  val NUM_STATE_STORE_MAINTENANCE_THREADS =
+buildConf("spark.sql.streaming.stateStore.numStateStoreMaintenanceThreads")
+  .internal()
+  .doc("Number of threads in the thread pool that perform clean up and 
snapshotting tasks " +
+"for stateful streaming queries. The default value is the number of 
cores * 0.25 " +
+"so that this thread pool doesn't take too many resources " +
+"away from the query and affect performance.")
+  .intConf
+  .checkValue(_ > 0, "Must be greater than 0")
+  .createWithDefault(Math.max(Runtime.getRuntime.availableProcessors() / 
4, 1))
+
   val STATE_SCHEMA_CHECK_ENABLED =
 buildConf("spark.sql.streaming.stateStore.stateSchemaCheck")
   .doc("When true, Spark will validate the state schema against schema on 
existing state and " +
@@ -4523,6 +4534,8 @@ class SQLConf extends Serializable with Logging with 
SqlApiConf {
 
   def isStateSchemaCheckEnabled: Boolean = getConf(STATE_SCHEMA_CHECK_ENABLED)
 
+  def numStateStoreMaintenanceThreads: Int = 
getConf(NUM_STATE_STORE_MAINTENANCE_THREADS)
+
   def stateStoreMinDeltasForSnapshot: Int = 
getConf(STATE_STORE_MIN_DELTAS_FOR_SNAPSHOT)
 
   def stateStoreFormatValidationEnabled: Boolean = 
getConf(STATE_STORE_FORMAT_VALIDATION_ENABLED)
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStore.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStore.scala
index cabad54be64..a1d4f7f40a7 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStore.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStore.scala
@@ -19,6 +19,7 @@ package org.apache.spark.sql.execution.streaming.state
 
 import java.util.UUID
 import java.util.concurrent.{ScheduledFuture, TimeUnit}
+import java.util.concurrent.atomic.AtomicReference
 import javax.annotation.concurrent.GuardedBy
 
 import scala.collection.mutable
@@ -440,6 +441,18 @@ object StateStore extends Logging {
   @GuardedBy("loadedProviders")
   private val schemaValidated = new mutable.HashMap[StateStoreProviderId, 
Option[Throwable]]()
 
+  private val maintenanceThreadPoolLock = new Object
+
+  //

[spark] branch master updated (f54b4020217 -> 6933d43cc63)

2023-08-01 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from f54b4020217 [SPARK-29497][CONNECT] Throw error when UDF is not 
deserializable
 add 6933d43cc63 [SPARK-44623][BUILD] Upgrade `commons-lang3` to 3.13.0

No new revisions were added by this update.

Summary of changes:
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +-
 pom.xml   | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.5 updated: [SPARK-29497][CONNECT] Throw error when UDF is not deserializable

2023-08-01 Thread hvanhovell

This is an automated email from the ASF dual-hosted git repository.

hvanhovell pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 8537fa634cd [SPARK-29497][CONNECT] Throw error when UDF is not 
deserializable
8537fa634cd is described below

commit 8537fa634cd02f46e7b42afd6b35f877f3a2c161
Author: Herman van Hovell 
AuthorDate: Tue Aug 1 14:53:54 2023 -0400

[SPARK-29497][CONNECT] Throw error when UDF is not deserializable

### What changes were proposed in this pull request?
This PR adds a better error message when a JVM UDF cannot be deserialized.

### Why are the changes needed?
In some cases a UDF cannot be deserialized. The happens when a lambda 
references itself (typically through the capturing class). Java cannot 
deserialize such an object graph because SerializedLambda's are serialization 
proxies which need the full graph to be deserialized before they can be 
transformed into the actual lambda. This is not possible if there is such a 
cycle. This PR adds a more readable and understandable error when this happens, 
the original java one is a `ClassCastExcep [...]

### Does this PR introduce _any_ user-facing change?
Yes. It will throw an error on the client when a UDF is not deserializable. 
The error is better and more actionable then what we got before.

### How was this patch tested?
Added tests.

Closes #42245 from hvanhovell/SPARK-29497.

Lead-authored-by: Herman van Hovell 
Co-authored-by: Herman van Hovell 
Signed-off-by: Herman van Hovell 
(cherry picked from commit f54b402021785e0b0ec976ec889de67d3b2fdc6e)
Signed-off-by: Herman van Hovell 
---
 .../org/apache/spark/util/SparkSerDeUtils.scala| 21 ++-
 .../sql/expressions/UserDefinedFunction.scala  | 24 +++-
 .../spark/sql/UserDefinedFunctionSuite.scala   | 44 --
 .../main/scala/org/apache/spark/util/Utils.scala   | 23 +--
 4 files changed, 85 insertions(+), 27 deletions(-)

diff --git 
a/common/utils/src/main/scala/org/apache/spark/util/SparkSerDeUtils.scala 
b/common/utils/src/main/scala/org/apache/spark/util/SparkSerDeUtils.scala
index 3069e4c36a7..9b6174c47bd 100644
--- a/common/utils/src/main/scala/org/apache/spark/util/SparkSerDeUtils.scala
+++ b/common/utils/src/main/scala/org/apache/spark/util/SparkSerDeUtils.scala
@@ -16,9 +16,9 @@
  */
 package org.apache.spark.util
 
-import java.io.{ByteArrayInputStream, ByteArrayOutputStream, 
ObjectInputStream, ObjectOutputStream}
+import java.io.{ByteArrayInputStream, ByteArrayOutputStream, 
ObjectInputStream, ObjectOutputStream, ObjectStreamClass}
 
-object SparkSerDeUtils {
+trait SparkSerDeUtils {
   /** Serialize an object using Java serialization */
   def serialize[T](o: T): Array[Byte] = {
 val bos = new ByteArrayOutputStream()
@@ -34,4 +34,21 @@ object SparkSerDeUtils {
 val ois = new ObjectInputStream(bis)
 ois.readObject.asInstanceOf[T]
   }
+
+  /**
+   * Deserialize an object using Java serialization and the given ClassLoader
+   */
+  def deserialize[T](bytes: Array[Byte], loader: ClassLoader): T = {
+val bis = new ByteArrayInputStream(bytes)
+val ois = new ObjectInputStream(bis) {
+  override def resolveClass(desc: ObjectStreamClass): Class[_] = {
+// scalastyle:off classforname
+Class.forName(desc.getName, false, loader)
+// scalastyle:on classforname
+  }
+}
+ois.readObject.asInstanceOf[T]
+  }
 }
+
+object SparkSerDeUtils extends SparkSerDeUtils
diff --git 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala
 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala
index 3a38029c265..e060dba0b7e 100644
--- 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala
+++ 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala
@@ -18,16 +18,18 @@ package org.apache.spark.sql.expressions
 
 import scala.collection.JavaConverters._
 import scala.reflect.runtime.universe.TypeTag
+import scala.util.control.NonFatal
 
 import com.google.protobuf.ByteString
 
+import org.apache.spark.SparkException
 import org.apache.spark.connect.proto
 import org.apache.spark.sql.Column
 import org.apache.spark.sql.catalyst.ScalaReflection
 import org.apache.spark.sql.catalyst.encoders.{AgnosticEncoder, RowEncoder}
 import org.apache.spark.sql.connect.common.{DataTypeProtoConverter, UdfPacket}
 import org.apache.spark.sql.types.DataType
-import org.apache.spark.util.SparkSerDeUtils
+import org.apache.spark.util.{SparkClassUtils, SparkSerDeUtils}
 
 /**
  * A user-defined function. To create one, use the `udf` functions in 
`functions`.
@@ -144,6 +146,25 @@ case class

[spark] branch master updated (4f62f8a718e -> f54b4020217)

2023-08-01 Thread hvanhovell

This is an automated email from the ASF dual-hosted git repository.

hvanhovell pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 4f62f8a718e [SPARK-44613][CONNECT] Add Encoders object
 add f54b4020217 [SPARK-29497][CONNECT] Throw error when UDF is not 
deserializable

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/util/SparkSerDeUtils.scala| 21 ++-
 .../sql/expressions/UserDefinedFunction.scala  | 24 +++-
 .../spark/sql/UserDefinedFunctionSuite.scala   | 44 --
 .../main/scala/org/apache/spark/util/Utils.scala   | 23 +--
 4 files changed, 85 insertions(+), 27 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.5 updated: [SPARK-44613][CONNECT] Add Encoders object

2023-08-01 Thread hvanhovell

This is an automated email from the ASF dual-hosted git repository.

hvanhovell pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new bde7aa61ce3 [SPARK-44613][CONNECT] Add Encoders object
bde7aa61ce3 is described below

commit bde7aa61ce3de15323a8920e8114a681fcd17000
Author: Herman van Hovell 
AuthorDate: Tue Aug 1 14:39:38 2023 -0400

[SPARK-44613][CONNECT] Add Encoders object

### What changes were proposed in this pull request?
This PR adds the org.apache.spark.sql.Encoders object to Connect.

### Why are the changes needed?
To increase compatibility with the SQL Dataframe API

### Does this PR introduce _any_ user-facing change?
Yes, it adds missing functionality.

### How was this patch tested?
Added a couple of java based tests.

Closes #42264 from hvanhovell/SPARK-44613.

Authored-by: Herman van Hovell 
Signed-off-by: Herman van Hovell 
(cherry picked from commit 4f62f8a718e80dca13a1d44b6fdf8857f037c15e)
Signed-off-by: Herman van Hovell 
---
 .../main/scala/org/apache/spark/sql/Encoders.scala | 262 +
 .../spark/sql/connect/client/SparkResult.scala |  14 +-
 .../org/apache/spark/sql/JavaEncoderSuite.java |  94 
 .../CheckConnectJvmClientCompatibility.scala   |   8 +-
 .../connect/client/util/RemoteSparkSession.scala   |   2 +-
 5 files changed, 371 insertions(+), 9 deletions(-)

diff --git 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Encoders.scala
 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Encoders.scala
new file mode 100644
index 000..3f2f7ec96d4
--- /dev/null
+++ 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Encoders.scala
@@ -0,0 +1,262 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql
+
+import scala.reflect.runtime.universe.TypeTag
+
+import org.apache.spark.sql.catalyst.{JavaTypeInference, ScalaReflection}
+import org.apache.spark.sql.catalyst.encoders.AgnosticEncoder
+import org.apache.spark.sql.catalyst.encoders.AgnosticEncoders._
+
+/**
+ * Methods for creating an [[Encoder]].
+ *
+ * @since 3.5.0
+ */
+object Encoders {
+
+  /**
+   * An encoder for nullable boolean type. The Scala primitive encoder is 
available as
+   * [[scalaBoolean]].
+   * @since 3.5.0
+   */
+  def BOOLEAN: Encoder[java.lang.Boolean] = BoxedBooleanEncoder
+
+  /**
+   * An encoder for nullable byte type. The Scala primitive encoder is 
available as [[scalaByte]].
+   * @since 3.5.0
+   */
+  def BYTE: Encoder[java.lang.Byte] = BoxedByteEncoder
+
+  /**
+   * An encoder for nullable short type. The Scala primitive encoder is 
available as
+   * [[scalaShort]].
+   * @since 3.5.0
+   */
+  def SHORT: Encoder[java.lang.Short] = BoxedShortEncoder
+
+  /**
+   * An encoder for nullable int type. The Scala primitive encoder is 
available as [[scalaInt]].
+   * @since 3.5.0
+   */
+  def INT: Encoder[java.lang.Integer] = BoxedIntEncoder
+
+  /**
+   * An encoder for nullable long type. The Scala primitive encoder is 
available as [[scalaLong]].
+   * @since 3.5.0
+   */
+  def LONG: Encoder[java.lang.Long] = BoxedLongEncoder
+
+  /**
+   * An encoder for nullable float type. The Scala primitive encoder is 
available as
+   * [[scalaFloat]].
+   * @since 3.5.0
+   */
+  def FLOAT: Encoder[java.lang.Float] = BoxedFloatEncoder
+
+  /**
+   * An encoder for nullable double type. The Scala primitive encoder is 
available as
+   * [[scalaDouble]].
+   * @since 3.5.0
+   */
+  def DOUBLE: Encoder[java.lang.Double] = BoxedDoubleEncoder
+
+  /**
+   * An encoder for nullable string type.
+   *
+   * @since 3.5.0
+   */
+  def STRING: Encoder[java.lang.String] = StringEncoder
+
+  /**
+   * An encoder for nullable decimal type.
+   *
+   * @since 3.5.0
+   */
+  def DECIMAL: Encoder[java.math.BigDecimal] = DEFAULT_JAVA_DECIMAL_ENCODER
+
+  /**
+   * An encoder for nullable date type.
+   *
+   * @since 3.5.0
+   */
+  def DATE: Encoder[java.sql.Date] = DateEncoder(lenientSerialization = false)
+
+  /**
+   * Creates an

[spark] branch master updated: [SPARK-44613][CONNECT] Add Encoders object

2023-08-01 Thread hvanhovell

This is an automated email from the ASF dual-hosted git repository.

hvanhovell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 4f62f8a718e [SPARK-44613][CONNECT] Add Encoders object
4f62f8a718e is described below

commit 4f62f8a718e80dca13a1d44b6fdf8857f037c15e
Author: Herman van Hovell 
AuthorDate: Tue Aug 1 14:39:38 2023 -0400

[SPARK-44613][CONNECT] Add Encoders object

### What changes were proposed in this pull request?
This PR adds the org.apache.spark.sql.Encoders object to Connect.

### Why are the changes needed?
To increase compatibility with the SQL Dataframe API

### Does this PR introduce _any_ user-facing change?
Yes, it adds missing functionality.

### How was this patch tested?
Added a couple of java based tests.

Closes #42264 from hvanhovell/SPARK-44613.

Authored-by: Herman van Hovell 
Signed-off-by: Herman van Hovell 
---
 .../main/scala/org/apache/spark/sql/Encoders.scala | 262 +
 .../spark/sql/connect/client/SparkResult.scala |  14 +-
 .../org/apache/spark/sql/JavaEncoderSuite.java |  94 
 .../CheckConnectJvmClientCompatibility.scala   |   8 +-
 .../connect/client/util/RemoteSparkSession.scala   |   2 +-
 5 files changed, 371 insertions(+), 9 deletions(-)

diff --git 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Encoders.scala
 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Encoders.scala
new file mode 100644
index 000..3f2f7ec96d4
--- /dev/null
+++ 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Encoders.scala
@@ -0,0 +1,262 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql
+
+import scala.reflect.runtime.universe.TypeTag
+
+import org.apache.spark.sql.catalyst.{JavaTypeInference, ScalaReflection}
+import org.apache.spark.sql.catalyst.encoders.AgnosticEncoder
+import org.apache.spark.sql.catalyst.encoders.AgnosticEncoders._
+
+/**
+ * Methods for creating an [[Encoder]].
+ *
+ * @since 3.5.0
+ */
+object Encoders {
+
+  /**
+   * An encoder for nullable boolean type. The Scala primitive encoder is 
available as
+   * [[scalaBoolean]].
+   * @since 3.5.0
+   */
+  def BOOLEAN: Encoder[java.lang.Boolean] = BoxedBooleanEncoder
+
+  /**
+   * An encoder for nullable byte type. The Scala primitive encoder is 
available as [[scalaByte]].
+   * @since 3.5.0
+   */
+  def BYTE: Encoder[java.lang.Byte] = BoxedByteEncoder
+
+  /**
+   * An encoder for nullable short type. The Scala primitive encoder is 
available as
+   * [[scalaShort]].
+   * @since 3.5.0
+   */
+  def SHORT: Encoder[java.lang.Short] = BoxedShortEncoder
+
+  /**
+   * An encoder for nullable int type. The Scala primitive encoder is 
available as [[scalaInt]].
+   * @since 3.5.0
+   */
+  def INT: Encoder[java.lang.Integer] = BoxedIntEncoder
+
+  /**
+   * An encoder for nullable long type. The Scala primitive encoder is 
available as [[scalaLong]].
+   * @since 3.5.0
+   */
+  def LONG: Encoder[java.lang.Long] = BoxedLongEncoder
+
+  /**
+   * An encoder for nullable float type. The Scala primitive encoder is 
available as
+   * [[scalaFloat]].
+   * @since 3.5.0
+   */
+  def FLOAT: Encoder[java.lang.Float] = BoxedFloatEncoder
+
+  /**
+   * An encoder for nullable double type. The Scala primitive encoder is 
available as
+   * [[scalaDouble]].
+   * @since 3.5.0
+   */
+  def DOUBLE: Encoder[java.lang.Double] = BoxedDoubleEncoder
+
+  /**
+   * An encoder for nullable string type.
+   *
+   * @since 3.5.0
+   */
+  def STRING: Encoder[java.lang.String] = StringEncoder
+
+  /**
+   * An encoder for nullable decimal type.
+   *
+   * @since 3.5.0
+   */
+  def DECIMAL: Encoder[java.math.BigDecimal] = DEFAULT_JAVA_DECIMAL_ENCODER
+
+  /**
+   * An encoder for nullable date type.
+   *
+   * @since 3.5.0
+   */
+  def DATE: Encoder[java.sql.Date] = DateEncoder(lenientSerialization = false)
+
+  /**
+   * Creates an encoder that serializes instances of the `java.time.LocalDate` 
class to the
+   * internal representation of nullable

[spark] branch branch-3.5 updated: [SPARK-44421][FOLLOWUP] Fix doc test in SparkThrowableSuite

2023-08-01 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 922b44b51ff [SPARK-44421][FOLLOWUP] Fix doc test in SparkThrowableSuite
922b44b51ff is described below

commit 922b44b51ff747d3971bc7c3badcb474a3a2d1f9
Author: Juliusz Sompolski 
AuthorDate: Wed Aug 2 02:58:28 2023 +0900

[SPARK-44421][FOLLOWUP] Fix doc test in SparkThrowableSuite

### What changes were proposed in this pull request?

Forgot to `git add` the file after regenerating docs.

### Why are the changes needed?

Fix docs and reenable test.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manually testing SparkThrowableSuite works in both master and branch-3.5.

Closes #42263 from 
juliuszsompolski/SPARK-44421-SparkThrowableSuite-followup.

Authored-by: Juliusz Sompolski 
Signed-off-by: Hyukjin Kwon 
(cherry picked from commit 5b8eaf8be5089f431caffcae4edfdaf97c499e8d)
Signed-off-by: Hyukjin Kwon 
---
 .../org/apache/spark/SparkThrowableSuite.scala |  2 +-
 ...-error-conditions-invalid-cursor-error-class.md | 44 ++
 2 files changed, 45 insertions(+), 1 deletion(-)

diff --git a/core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala 
b/core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala
index 3762d6fdcc1..0249cde5488 100644
--- a/core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala
+++ b/core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala
@@ -149,7 +149,7 @@ class SparkThrowableSuite extends SparkFunSuite {
 checkIfUnique(messageFormats)
   }
 
-  ignore("Error classes match with document") {
+  test("Error classes match with document") {
 val errors = errorReader.errorInfoMap
 
 // the black list of error class name which should not add quote
diff --git a/docs/sql-error-conditions-invalid-cursor-error-class.md 
b/docs/sql-error-conditions-invalid-cursor-error-class.md
new file mode 100644
index 000..0b6a4ae17aa
--- /dev/null
+++ b/docs/sql-error-conditions-invalid-cursor-error-class.md
@@ -0,0 +1,44 @@
+---
+layout: global
+title: INVALID_CURSOR error class
+displayTitle: INVALID_CURSOR error class
+license: |
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+---
+
+[SQLSTATE: 
HY109](sql-error-conditions-sqlstates.html#class-HY-cli-specific-condition)
+
+The cursor is invalid.
+
+This error class has the following derived error classes:
+
+## DISCONNECTED
+
+The cursor has been disconnected by the server.
+
+## NOT_REATTACHABLE
+
+The cursor is not reattachable.
+
+## POSITION_NOT_AVAILABLE
+
+The cursor position id `` is no longer available at index 
``.
+
+## POSITION_NOT_FOUND
+
+The cursor position id `` is not found.
+
+


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (821026bc730 -> 5b8eaf8be50)

2023-08-01 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 821026bc730 [SPARK-44311][CONNECT][SQL] Improved support for UDFs on 
value classes
 add 5b8eaf8be50 [SPARK-44421][FOLLOWUP] Fix doc test in SparkThrowableSuite

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/SparkThrowableSuite.scala |  2 +-
 ...error-conditions-invalid-cursor-error-class.md} | 28 ++
 2 files changed, 13 insertions(+), 17 deletions(-)
 copy docs/{sql-error-conditions-invalid-handle-error-class.md => 
sql-error-conditions-invalid-cursor-error-class.md} (66%)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.5 updated: [SPARK-44614][PYTHON][CONNECT][3.5] Add missing packages in setup.py

2023-08-01 Thread ueshin

This is an automated email from the ASF dual-hosted git repository.

ueshin pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 84f17e7dcd5 [SPARK-44614][PYTHON][CONNECT][3.5] Add missing packages 
in setup.py
84f17e7dcd5 is described below

commit 84f17e7dcd5e6fa71a1a35a58d0ea19170e897a5
Author: Takuya UESHIN 
AuthorDate: Tue Aug 1 08:15:45 2023 -0700

[SPARK-44614][PYTHON][CONNECT][3.5] Add missing packages in setup.py

### What changes were proposed in this pull request?

Adds missing packages in `setup.py`.

### Why are the changes needed?

The following packages are not listed in `setup.py`.

- `pyspark.sql.connect.avro`
- `pyspark.sql.connect.client`
- `pyspark.sql.connect.streaming`
- `pyspark.ml.connect`

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

The existing tests.

Closes #42257 from ueshin/issues/SPARK-44614/3.5/packages.

Authored-by: Takuya UESHIN 
Signed-off-by: Takuya UESHIN 
---
 python/setup.py | 4 
 1 file changed, 4 insertions(+)

diff --git a/python/setup.py b/python/setup.py
index a0297d4f9dc..4d14dfd3cb9 100755
--- a/python/setup.py
+++ b/python/setup.py
@@ -235,6 +235,7 @@ try:
 "pyspark",
 "pyspark.cloudpickle",
 "pyspark.mllib",
+"pyspark.ml.connect",
 "pyspark.mllib.linalg",
 "pyspark.mllib.stat",
 "pyspark.ml",
@@ -245,7 +246,10 @@ try:
 "pyspark.sql",
 "pyspark.sql.avro",
 "pyspark.sql.connect",
+"pyspark.sql.connect.avro",
+"pyspark.sql.connect.client",
 "pyspark.sql.connect.proto",
+"pyspark.sql.connect.streaming",
 "pyspark.sql.pandas",
 "pyspark.sql.protobuf",
 "pyspark.sql.streaming",


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.5 updated: [SPARK-44311][CONNECT][SQL] Improved support for UDFs on value classes

2023-08-01 Thread hvanhovell

This is an automated email from the ASF dual-hosted git repository.

hvanhovell pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 0843b7741fa [SPARK-44311][CONNECT][SQL] Improved support for UDFs on 
value classes
0843b7741fa is described below

commit 0843b7741fa959173fcc66067eedda9be501192c
Author: Emil Ejbyfeldt 
AuthorDate: Tue Aug 1 10:50:04 2023 -0400

[SPARK-44311][CONNECT][SQL] Improved support for UDFs on value classes

### What changes were proposed in this pull request?

This pr fixes using UDFs on value classes when it serialized as in 
underlying type. Previously it would only work if one either defined a UDF 
taking the underlying type and/or for cases where the schema derived does not 
"unbox" the value to its underlying type.

Before this change the following code:
```
final case class ValueClass(a: Int) extends AnyVal
final case class Wrapper(v: ValueClass)

val f = udf((a: ValueClass) => a.a > 0)

spark.createDataset(Seq(Wrapper(ValueClass(1.filter(f(col("v"))).show()
```
would fails with
```
java.lang.ClassCastException: class org.apache.spark.sql.types.IntegerType$ 
cannot be cast to class org.apache.spark.sql.types.StructType 
(org.apache.spark.sql.types.IntegerType$ and 
org.apache.spark.sql.types.StructType are in unnamed module of loader 'app')
  at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveEncodersInUDF$$anonfun$apply$42$$anonfun$applyOrElse$218.$anonfun$applyOrElse$220(Analyzer.scala:3241)
  at scala.Option.map(Option.scala:242)
  at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveEncodersInUDF$$anonfun$apply$42$$anonfun$applyOrElse$218.$anonfun$applyOrElse$219(Analyzer.scala:3239)
  at scala.collection.immutable.List.map(List.scala:246)
  at scala.collection.immutable.List.map(List.scala:79)
  at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveEncodersInUDF$$anonfun$apply$42$$anonfun$applyOrElse$218.applyOrElse(Analyzer.scala:3237)
  at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveEncodersInUDF$$anonfun$apply$42$$anonfun$applyOrElse$218.applyOrElse(Analyzer.scala:3234)
  at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$2(TreeNode.scala:566)
  at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:104)
  at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:566)
```

### Why are the changes needed?
This is something as a user I would expect to just work.

### Does this PR introduce _any_ user-facing change?
Yes, it if fixes using a UDF on value class that is serialized as it 
underlying type.

### How was this patch tested?
Existing test and new tests cases in DatasetSuite.scala

Closes #41876 from eejbyfeldt/SPARK-44311.

Authored-by: Emil Ejbyfeldt 
Signed-off-by: Herman van Hovell 
(cherry picked from commit 821026bc730ce87e6e97d304c7673bfcb23fd03a)
Signed-off-by: Herman van Hovell 
---
 .../spark/sql/catalyst/analysis/Analyzer.scala |  7 ++-
 .../spark/sql/catalyst/expressions/ScalaUDF.scala  |  4 +++-
 .../scala/org/apache/spark/sql/DatasetSuite.scala  | 24 ++
 3 files changed, 33 insertions(+), 2 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
index 30c6e4b4bc0..7f2471c9e19 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
@@ -3245,7 +3245,12 @@ class Analyzer(override val catalogManager: 
CatalogManager) extends RuleExecutor
 val dataType = udf.children(i).dataType
 encOpt.map { enc =>
   val attrs = if (enc.isSerializedAsStructForTopLevel) {
-DataTypeUtils.toAttributes(dataType.asInstanceOf[StructType])
+// Value class that has been replaced with its underlying type
+if (enc.schema.fields.size == 1 && 
enc.schema.fields.head.dataType == dataType) {
+  
DataTypeUtils.toAttributes(enc.schema.asInstanceOf[StructType])
+} else {
+  DataTypeUtils.toAttributes(dataType.asInstanceOf[StructType])
+}
   } else {
 // the field name doesn't matter here, so we use
 // a simple literal to avoid any overhead
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala
index 40274a83340..910960bf84b 100644
---

[spark] branch master updated: [SPARK-44311][CONNECT][SQL] Improved support for UDFs on value classes

2023-08-01 Thread hvanhovell

This is an automated email from the ASF dual-hosted git repository.

hvanhovell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 821026bc730 [SPARK-44311][CONNECT][SQL] Improved support for UDFs on 
value classes
821026bc730 is described below

commit 821026bc730ce87e6e97d304c7673bfcb23fd03a
Author: Emil Ejbyfeldt 
AuthorDate: Tue Aug 1 10:50:04 2023 -0400

[SPARK-44311][CONNECT][SQL] Improved support for UDFs on value classes

### What changes were proposed in this pull request?

This pr fixes using UDFs on value classes when it serialized as in 
underlying type. Previously it would only work if one either defined a UDF 
taking the underlying type and/or for cases where the schema derived does not 
"unbox" the value to its underlying type.

Before this change the following code:
```
final case class ValueClass(a: Int) extends AnyVal
final case class Wrapper(v: ValueClass)

val f = udf((a: ValueClass) => a.a > 0)

spark.createDataset(Seq(Wrapper(ValueClass(1.filter(f(col("v"))).show()
```
would fails with
```
java.lang.ClassCastException: class org.apache.spark.sql.types.IntegerType$ 
cannot be cast to class org.apache.spark.sql.types.StructType 
(org.apache.spark.sql.types.IntegerType$ and 
org.apache.spark.sql.types.StructType are in unnamed module of loader 'app')
  at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveEncodersInUDF$$anonfun$apply$42$$anonfun$applyOrElse$218.$anonfun$applyOrElse$220(Analyzer.scala:3241)
  at scala.Option.map(Option.scala:242)
  at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveEncodersInUDF$$anonfun$apply$42$$anonfun$applyOrElse$218.$anonfun$applyOrElse$219(Analyzer.scala:3239)
  at scala.collection.immutable.List.map(List.scala:246)
  at scala.collection.immutable.List.map(List.scala:79)
  at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveEncodersInUDF$$anonfun$apply$42$$anonfun$applyOrElse$218.applyOrElse(Analyzer.scala:3237)
  at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveEncodersInUDF$$anonfun$apply$42$$anonfun$applyOrElse$218.applyOrElse(Analyzer.scala:3234)
  at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$2(TreeNode.scala:566)
  at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:104)
  at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:566)
```

### Why are the changes needed?
This is something as a user I would expect to just work.

### Does this PR introduce _any_ user-facing change?
Yes, it if fixes using a UDF on value class that is serialized as it 
underlying type.

### How was this patch tested?
Existing test and new tests cases in DatasetSuite.scala

Closes #41876 from eejbyfeldt/SPARK-44311.

Authored-by: Emil Ejbyfeldt 
Signed-off-by: Herman van Hovell 
---
 .../spark/sql/catalyst/analysis/Analyzer.scala |  7 ++-
 .../spark/sql/catalyst/expressions/ScalaUDF.scala  |  4 +++-
 .../scala/org/apache/spark/sql/DatasetSuite.scala  | 24 ++
 3 files changed, 33 insertions(+), 2 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
index 92e550ea941..dc42d27e8e5 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
@@ -3251,7 +3251,12 @@ class Analyzer(override val catalogManager: 
CatalogManager) extends RuleExecutor
 val dataType = udf.children(i).dataType
 encOpt.map { enc =>
   val attrs = if (enc.isSerializedAsStructForTopLevel) {
-DataTypeUtils.toAttributes(dataType.asInstanceOf[StructType])
+// Value class that has been replaced with its underlying type
+if (enc.schema.fields.size == 1 && 
enc.schema.fields.head.dataType == dataType) {
+  
DataTypeUtils.toAttributes(enc.schema.asInstanceOf[StructType])
+} else {
+  DataTypeUtils.toAttributes(dataType.asInstanceOf[StructType])
+}
   } else {
 // the field name doesn't matter here, so we use
 // a simple literal to avoid any overhead
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala
index 40274a83340..910960bf84b 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala
+++

[spark] branch branch-3.5 updated: [SPARK-44604][BUILD] Upgrade Netty to 4.1.96.Final

2023-08-01 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 7b68ccd1cb4 [SPARK-44604][BUILD] Upgrade Netty to 4.1.96.Final
7b68ccd1cb4 is described below

commit 7b68ccd1cb48c38052b0458c5192d5ffcfc97409
Author: panbingkun 
AuthorDate: Tue Aug 1 08:55:27 2023 -0500

[SPARK-44604][BUILD] Upgrade Netty to 4.1.96.Final

### What changes were proposed in this pull request?
The pr aims to upgrade Netty from  4.1.93.Final to 4.1.96.Final.

### Why are the changes needed?
1.Netty 4.1.93.Final VS 4.1.96.Final

https://github.com/netty/netty/compare/netty-4.1.93.Final...netty-4.1.96.Final

2.Netty newest version Fix a possible security issue:

([CVE-2023-34462](https://github.com/netty/netty/security/advisories/GHSA-6mjq-h674-j845))
 when using SniHandler.

3.Netty full release notes:
https://netty.io/news/2023/07/27/4-1-96-Final.html
https://netty.io/news/2023/07/20/4-1-95-Final.html
https://netty.io/news/2023/06/19/4-1-94-Final.html

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA.

Closes #42232 from panbingkun/SPARK-44604.

Authored-by: panbingkun 
Signed-off-by: Sean Owen 
(cherry picked from commit 8053d5f16541edb8e17cbc50684abae69187ff5a)
Signed-off-by: Sean Owen 
---
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 36 +--
 pom.xml   |  6 +-
 2 files changed, 19 insertions(+), 23 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index beae2232202..566f7c9a3ea 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -183,24 +183,24 @@ metrics-jmx/4.2.19//metrics-jmx-4.2.19.jar
 metrics-json/4.2.19//metrics-json-4.2.19.jar
 metrics-jvm/4.2.19//metrics-jvm-4.2.19.jar
 minlog/1.3.0//minlog-1.3.0.jar
-netty-all/4.1.93.Final//netty-all-4.1.93.Final.jar
-netty-buffer/4.1.93.Final//netty-buffer-4.1.93.Final.jar
-netty-codec-http/4.1.93.Final//netty-codec-http-4.1.93.Final.jar
-netty-codec-http2/4.1.93.Final//netty-codec-http2-4.1.93.Final.jar
-netty-codec-socks/4.1.93.Final//netty-codec-socks-4.1.93.Final.jar
-netty-codec/4.1.93.Final//netty-codec-4.1.93.Final.jar
-netty-common/4.1.93.Final//netty-common-4.1.93.Final.jar
-netty-handler-proxy/4.1.93.Final//netty-handler-proxy-4.1.93.Final.jar
-netty-handler/4.1.93.Final//netty-handler-4.1.93.Final.jar
-netty-resolver/4.1.93.Final//netty-resolver-4.1.93.Final.jar
-netty-transport-classes-epoll/4.1.93.Final//netty-transport-classes-epoll-4.1.93.Final.jar
-netty-transport-classes-kqueue/4.1.93.Final//netty-transport-classes-kqueue-4.1.93.Final.jar
-netty-transport-native-epoll/4.1.93.Final/linux-aarch_64/netty-transport-native-epoll-4.1.93.Final-linux-aarch_64.jar
-netty-transport-native-epoll/4.1.93.Final/linux-x86_64/netty-transport-native-epoll-4.1.93.Final-linux-x86_64.jar
-netty-transport-native-kqueue/4.1.93.Final/osx-aarch_64/netty-transport-native-kqueue-4.1.93.Final-osx-aarch_64.jar
-netty-transport-native-kqueue/4.1.93.Final/osx-x86_64/netty-transport-native-kqueue-4.1.93.Final-osx-x86_64.jar
-netty-transport-native-unix-common/4.1.93.Final//netty-transport-native-unix-common-4.1.93.Final.jar
-netty-transport/4.1.93.Final//netty-transport-4.1.93.Final.jar
+netty-all/4.1.96.Final//netty-all-4.1.96.Final.jar
+netty-buffer/4.1.96.Final//netty-buffer-4.1.96.Final.jar
+netty-codec-http/4.1.96.Final//netty-codec-http-4.1.96.Final.jar
+netty-codec-http2/4.1.96.Final//netty-codec-http2-4.1.96.Final.jar
+netty-codec-socks/4.1.96.Final//netty-codec-socks-4.1.96.Final.jar
+netty-codec/4.1.96.Final//netty-codec-4.1.96.Final.jar
+netty-common/4.1.96.Final//netty-common-4.1.96.Final.jar
+netty-handler-proxy/4.1.96.Final//netty-handler-proxy-4.1.96.Final.jar
+netty-handler/4.1.96.Final//netty-handler-4.1.96.Final.jar
+netty-resolver/4.1.96.Final//netty-resolver-4.1.96.Final.jar
+netty-transport-classes-epoll/4.1.96.Final//netty-transport-classes-epoll-4.1.96.Final.jar
+netty-transport-classes-kqueue/4.1.96.Final//netty-transport-classes-kqueue-4.1.96.Final.jar
+netty-transport-native-epoll/4.1.96.Final/linux-aarch_64/netty-transport-native-epoll-4.1.96.Final-linux-aarch_64.jar
+netty-transport-native-epoll/4.1.96.Final/linux-x86_64/netty-transport-native-epoll-4.1.96.Final-linux-x86_64.jar
+netty-transport-native-kqueue/4.1.96.Final/osx-aarch_64/netty-transport-native-kqueue-4.1.96.Final-osx-aarch_64.jar
+netty-transport-native-kqueue/4.1.96.Final/osx-x86_64/netty-transport-native-kqueue-4.1.96.Final-osx-x86_64.jar
+netty-transport-native-unix-common/4.1.96.Final//netty-transport-native-unix-common-4.1.96.Final.jar

[spark] branch master updated: [SPARK-44604][BUILD] Upgrade Netty to 4.1.96.Final

2023-08-01 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 8053d5f1654 [SPARK-44604][BUILD] Upgrade Netty to 4.1.96.Final
8053d5f1654 is described below

commit 8053d5f16541edb8e17cbc50684abae69187ff5a
Author: panbingkun 
AuthorDate: Tue Aug 1 08:55:27 2023 -0500

[SPARK-44604][BUILD] Upgrade Netty to 4.1.96.Final

### What changes were proposed in this pull request?
The pr aims to upgrade Netty from  4.1.93.Final to 4.1.96.Final.

### Why are the changes needed?
1.Netty 4.1.93.Final VS 4.1.96.Final

https://github.com/netty/netty/compare/netty-4.1.93.Final...netty-4.1.96.Final

2.Netty newest version Fix a possible security issue:

([CVE-2023-34462](https://github.com/netty/netty/security/advisories/GHSA-6mjq-h674-j845))
 when using SniHandler.

3.Netty full release notes:
https://netty.io/news/2023/07/27/4-1-96-Final.html
https://netty.io/news/2023/07/20/4-1-95-Final.html
https://netty.io/news/2023/06/19/4-1-94-Final.html

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA.

Closes #42232 from panbingkun/SPARK-44604.

Authored-by: panbingkun 
Signed-off-by: Sean Owen 
---
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 36 +--
 pom.xml   |  6 +-
 2 files changed, 19 insertions(+), 23 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index 3b54ef43f6a..52a1d00f204 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -183,24 +183,24 @@ metrics-jmx/4.2.19//metrics-jmx-4.2.19.jar
 metrics-json/4.2.19//metrics-json-4.2.19.jar
 metrics-jvm/4.2.19//metrics-jvm-4.2.19.jar
 minlog/1.3.0//minlog-1.3.0.jar
-netty-all/4.1.93.Final//netty-all-4.1.93.Final.jar
-netty-buffer/4.1.93.Final//netty-buffer-4.1.93.Final.jar
-netty-codec-http/4.1.93.Final//netty-codec-http-4.1.93.Final.jar
-netty-codec-http2/4.1.93.Final//netty-codec-http2-4.1.93.Final.jar
-netty-codec-socks/4.1.93.Final//netty-codec-socks-4.1.93.Final.jar
-netty-codec/4.1.93.Final//netty-codec-4.1.93.Final.jar
-netty-common/4.1.93.Final//netty-common-4.1.93.Final.jar
-netty-handler-proxy/4.1.93.Final//netty-handler-proxy-4.1.93.Final.jar
-netty-handler/4.1.93.Final//netty-handler-4.1.93.Final.jar
-netty-resolver/4.1.93.Final//netty-resolver-4.1.93.Final.jar
-netty-transport-classes-epoll/4.1.93.Final//netty-transport-classes-epoll-4.1.93.Final.jar
-netty-transport-classes-kqueue/4.1.93.Final//netty-transport-classes-kqueue-4.1.93.Final.jar
-netty-transport-native-epoll/4.1.93.Final/linux-aarch_64/netty-transport-native-epoll-4.1.93.Final-linux-aarch_64.jar
-netty-transport-native-epoll/4.1.93.Final/linux-x86_64/netty-transport-native-epoll-4.1.93.Final-linux-x86_64.jar
-netty-transport-native-kqueue/4.1.93.Final/osx-aarch_64/netty-transport-native-kqueue-4.1.93.Final-osx-aarch_64.jar
-netty-transport-native-kqueue/4.1.93.Final/osx-x86_64/netty-transport-native-kqueue-4.1.93.Final-osx-x86_64.jar
-netty-transport-native-unix-common/4.1.93.Final//netty-transport-native-unix-common-4.1.93.Final.jar
-netty-transport/4.1.93.Final//netty-transport-4.1.93.Final.jar
+netty-all/4.1.96.Final//netty-all-4.1.96.Final.jar
+netty-buffer/4.1.96.Final//netty-buffer-4.1.96.Final.jar
+netty-codec-http/4.1.96.Final//netty-codec-http-4.1.96.Final.jar
+netty-codec-http2/4.1.96.Final//netty-codec-http2-4.1.96.Final.jar
+netty-codec-socks/4.1.96.Final//netty-codec-socks-4.1.96.Final.jar
+netty-codec/4.1.96.Final//netty-codec-4.1.96.Final.jar
+netty-common/4.1.96.Final//netty-common-4.1.96.Final.jar
+netty-handler-proxy/4.1.96.Final//netty-handler-proxy-4.1.96.Final.jar
+netty-handler/4.1.96.Final//netty-handler-4.1.96.Final.jar
+netty-resolver/4.1.96.Final//netty-resolver-4.1.96.Final.jar
+netty-transport-classes-epoll/4.1.96.Final//netty-transport-classes-epoll-4.1.96.Final.jar
+netty-transport-classes-kqueue/4.1.96.Final//netty-transport-classes-kqueue-4.1.96.Final.jar
+netty-transport-native-epoll/4.1.96.Final/linux-aarch_64/netty-transport-native-epoll-4.1.96.Final-linux-aarch_64.jar
+netty-transport-native-epoll/4.1.96.Final/linux-x86_64/netty-transport-native-epoll-4.1.96.Final-linux-x86_64.jar
+netty-transport-native-kqueue/4.1.96.Final/osx-aarch_64/netty-transport-native-kqueue-4.1.96.Final-osx-aarch_64.jar
+netty-transport-native-kqueue/4.1.96.Final/osx-x86_64/netty-transport-native-kqueue-4.1.96.Final-osx-x86_64.jar
+netty-transport-native-unix-common/4.1.96.Final//netty-transport-native-unix-common-4.1.96.Final.jar
+netty-transport/4.1.96.Final//netty-transport-4.1.96.Final.jar
 objenesis/3.3//objenesis-3.3.jar
 okhttp/3.12.12//okhttp-3.12.12.jar
 okio/1.15.0//okio-1.15.0.jar

[spark] branch master updated: [SPARK-44567][INFRA] Add a new Daily testing GitHub Action job for Maven

2023-08-01 Thread ruifengz

This is an automated email from the ASF dual-hosted git repository.

ruifengz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new f64c72c3572 [SPARK-44567][INFRA] Add a new Daily testing GitHub Action 
job for Maven
f64c72c3572 is described below

commit f64c72c35729d4ae2e23e24f26351502f8763bcb
Author: yangjie01 
AuthorDate: Tue Aug 1 18:22:29 2023 +0800

[SPARK-44567][INFRA] Add a new Daily testing GitHub Action job for Maven

### What changes were proposed in this pull request?
This pr aims add a new Daily testing GitHub Action job for Maven.

### Why are the changes needed?
Need Daily testing for Maven.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
- Pass GitHub Actions

Closes #42203 from LuciferYang/SPARK-44567-4.

Authored-by: yangjie01 
Signed-off-by: Ruifeng Zheng 
---
 .github/workflows/build_maven.yml |  32 ++
 .github/workflows/maven_test.yml  | 210 ++
 2 files changed, 242 insertions(+)

diff --git a/.github/workflows/build_maven.yml 
b/.github/workflows/build_maven.yml
new file mode 100644
index 000..4b68224e967
--- /dev/null
+++ b/.github/workflows/build_maven.yml
@@ -0,0 +1,32 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+
+name: "Build using Maven (master, Scala 2.12, Hadoop 3, JDK 8)"
+
+on:
+  schedule:
+- cron: '0 13 * * *'
+
+jobs:
+  run-build:
+permissions:
+  packages: write
+name: Run
+uses: ./.github/workflows/maven_test.yml
+if: github.repository == 'apache/spark'
diff --git a/.github/workflows/maven_test.yml b/.github/workflows/maven_test.yml
new file mode 100644
index 000..48a4d6b5ff9
--- /dev/null
+++ b/.github/workflows/maven_test.yml
@@ -0,0 +1,210 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+
+name: Build and test using Maven
+
+on:
+  workflow_call:
+inputs:
+  java:
+required: false
+type: string
+default: 8
+  branch:
+description: Branch to run the build against
+required: false
+type: string
+default: master
+  hadoop:
+description: Hadoop version to run with. HADOOP_PROFILE environment 
variable should accept it.
+required: false
+type: string
+default: hadoop3
+  envs:
+description: Additional environment variables to set when running the 
tests. Should be in JSON format.
+required: false
+type: string
+default: '{}'
+jobs:
+  # Build: build Spark and run the tests for specified modules using maven
+  build:
+name: "Build modules using Maven: ${{ matrix.modules }} ${{ matrix.comment 
}}"
+runs-on: ubuntu-22.04
+strategy:
+  fail-fast: false
+  matrix:
+java:
+  - ${{ inputs.java }}
+hadoop:
+  - ${{ inputs.hadoop }}
+hive:
+  - hive2.3
+modules:
+  - >-
+
core,repl,launcher,common#unsafe,common#kvstore,common#network-common,common#network-shuffle,common#sketch
+  - >-
+graphx,streaming,mllib-local,mllib,hadoop-cloud
+  - >-
+sql#hive-thriftserver
+  - >-
+

svn commit: r63291 - /dev/spark/KEYS

2023-08-01 Thread liyuanjian

Author: liyuanjian
Date: Tue Aug  1 09:02:39 2023
New Revision: 63291

Log:
Update KEYS from Yuanjian

Modified:
dev/spark/KEYS

Modified: dev/spark/KEYS
==
--- dev/spark/KEYS (original)
+++ dev/spark/KEYS Tue Aug  1 09:02:39 2023
@@ -1906,3 +1906,60 @@ SwG6KnEvjr0Rt2ktLnx2RWcKBhHNP78tOKIK1Gf/
 fVwsw+tRkSftdafHeZuJKHkXbw==
 =LnC6
 -END PGP PUBLIC KEY BLOCK-
+
+pub   rsa4096 2023-07-17 [SC]
+  FC3AE3A7EAA1BAC98770840E7E1ABCC53AAA2216
+uid  Yuanjian Li (CODE SIGNING KEY) 
+sub   rsa4096 2023-07-17 [E]
+-BEGIN PGP PUBLIC KEY BLOCK-
+
+mQINBGS1szEBEAC4Bve0aRc5IYF2D+bhvf/6LFF8fiN2WHGiS2p8PImdWwvABwM5
+uHIOGLQVlJ5vEjIhhyJ8tBWJMnUdP0nHS1Y6bl5P7FC0ry/+JmWu8S8beCspREm1
+isMfnOXI/xYrx/bwFtkbpoIB4QDBoJ/yrlSJLO86DRjRBTYwv3UpUXUz8xIUfTca
+MF+k9zbbXrttcUeP23BKWELvkdmz5PwBxLfLHa6qKFXO2+rBdtzGSIwiwgcr1N1b
+o/ePihXQzkKa8vCDrM9e0GvMqT0eLwy647ATNHg/3OF0X5zC/oQjU38kBQ8KKDsu
+KTct1/k5RfyfiIDEXI5vcJxJOjs4LCQiW/aBqM4fJ0MbBTu8Fkj/hkg06wKBYu53
+sFjkUZL43imfa68M1/xpCb0EjATP7vzp6hWqor7++n1/Vkq3xyxZKWi7ALWu/9pp
+1LO/Uc/xENIDfCDZ9ppkji0s4UARd/87BP8VPKUrCuTXCLWcRr7so4a3TBLKj9iR
+9aYUijzj0adeBQ72XyCofB5asiFpfoytPtAhBQur05b+WIPMIkLC8RjMe5Tg18Wr
+eJ4rzAC5o186B8xvfqnHYzLgs2q/3roUwz7NiQgqY5NXXDcwWtrMvXIlJ1KNHdCL
++Ge1yn+KbSNrHrBX5yqGunqDSuwSrZ7mKaenqBD4RavHi3nn9IvAWRwjTwARAQAB
+tDZZdWFuamlhbiBMaSAoQ09ERSBTSUdOSU5HIEtFWSkgPGxpeXVhbmppYW5AYXBh
+Y2hlLm9yZz6JAlEEEwEIADsWIQT8OuOn6qG6yYdwhA5+GrzFOqoiFgUCZLWzMQIb
+AwULCQgHAgIiAgYVCgkICwIEFgIDAQIeBwIXgAAKCRB+GrzFOqoiFkEkD/kBnFGC
+p4fmJVTOTh0Jmpxqfj+uTweT5103KbNgjdGuvVYk1OfpfX2RCaJotwRBMJxL0lwe
+DF51tP6V4EVcgfKxuhQ38qZ5OLxOB+wq/sxeTZ9uRtvxYmVzMtyMJcEy8UaqkR4n
+9amSF50kZBTeAGyh/3YIXWYXqMUx22RNlKExv/tBvI3ZHnzlJD5xV3IX0ZrA/3xi
+JAR9/gwAmuIFW0RVjjzeT88fVgv+CRfda6YFFJkLFjLMP/48Xa/AsmR2ll6rtgEQ
+rL5a3S1nwGIHi0bU3aqbBchr0APZFVjmtlyqg965ITQqLUqieXlcVLWodDxS0rUz
+ytxXpRcs2byow221iHOBkKkLzOIZv93GoFh4ZbNamfbqYuLR+pe0Ylukp0+bwxIO
+BE2/ByJZnMWAP1eoMe54cL539hgh1E08kw2mk1ldC9aJNUItimxcOUsaXpmfQlvR
+nbBOIyy0gAfaxa9S1GKZdVLqslBHBklhq9bcF8m4jk4mHifriwCXeINEz8OQWFg1
+DuIyLiB3Zfq+bxwAdhxz7DznBpUVnumWPda9UZB3Sj21B/5XjFO6uTvWRuecTCxD
+tqR8USXZ2PgurSaSW51U/3SLj3NmKopf/j5eZfWcLbty7qh/j0ZfZFJsfAbKG0AY
+XlqcNI0EUizhRmeHYnujjCmdEas7Tz0m2p3WjrkCDQRktbMxARAAy5Rz1uege1zV
+zNFt3dhF6OVxc6FP3ik/KBsoaOBmm0+K/1qOE/zK02pcEuJoUdP5deU9XU8It+6Y
+vYtQf/ygHj0ArWqtkMb7av86gpWPJWpL3LrMpG6H6CzoJ327KAYfulxJdPn/tmk6
+oUdbP8q0b+xXNVfuzXxHmPVlSXh0WHZvCnZzn08f+PQbWBf7iv2ctaWRBQyQbbvz
+yGO8RqzDRnXz/YFLjvUrSaBIDX2k9Ev0378T9R/DLQUKUtM/EBeoWa0+Nr9BHIc3
+6eOBlmWROptDdurBeydiWcFBnl3KI9is2iXw05xvViS7nIWe+5HX0UBs3Vv/wBb2
+pMvA9h7rYfXer7A8Mu/GzG8j3z+d/tzAakLJArcWhJ/tM1PD4K5q49MlZYjmvAK5
+2tv2LTCN2ug8+8DonMonSQa+jwTBeUyMqtP8GK3m2r/o2NheHAprx20bzjp+Q9Tv
+Q30rdca+Y4/XpZDyj8s7eZd+Cankr0EMvM6lkaWLEjs45lMu00E4DMM2qAiFQqRh
+faA6DPXy2FKT9BJqrz+XuSiyrt1qxsEbCIE7+gHuqGclPibzAGG334XJ+GxWRXY1
+a9xjRpUkFkkcEdFJnHqRId1jlNrg4j4nqbaVzZ1eY6EYsuQyeWojVFUM8CHnjT0I
+c3xvFUedetuLCpvOAcJDCti2A48fqRsAEQEAAYkCNgQYAQgAIBYhBPw646fqobrJ
+h3CEDn4avMU6qiIWBQJktbMxAhsMAAoJEH4avMU6qiIWRKsP+wY3oXhTI6WIT2Bw
+UBGnoxBzF0sooyUubTpkStJTB4r9fhV64QCfaqh9KJst3rnFDisAm7rohtDy/rQe
+N/TxMwTLgqB1wF5a5bhMkOPglrJXRVJp+JJjr9Ot/aQUSx22bmuxrLgUvSCGjCa6
+jtWoU2jckR5YtW3NJWovPSJ07zu76sfjKvnvjD1M4udzatRlzVGQUQR5Q3Z86xP1
+T/LF6DbI1aF/LadZsROgVRKPkV8imEd/5pa/d3KgF4RZvxoJCtRzV5nmA8sbqWQA
++dQPMlUZ6DeDR9D2mtFOpvEscO5QafBH8m0eMuUfWPZ/aMXiLiwhgCUy/Z+/BjFa
+zxIG9YqUgLOFjQW1AxHrq5dFa0OQdO0zTnesdgfUDrmgFZGzI+JzykTgmG3dNbT7
+WjDJafAoOwWOgg8r180oq6CD4I67SQSvFiJtGNoVF8zHY8+Gzx1YTB/4016eZTki
+QpnZvBLLSWZWP2l2xZJbPILUbik8kxDcToRJXpoV9KktALA5CWjVZ5tJWajtoL33
+eWXXiC+qnz+Wk+Hs87zUSQxbway19r79Jz1+aHQUpR0P1V+kqqNMuUyQmSVSMLLJ
+3BTWQdHTH/vxCWAj9RlzBBdWufB+a9CDTI7dLpHx1az+rGBf85fYC+uVaNRmrS/1
+udDbyJxrGbyiCaZiSV6aSgf+U5xi
+=Yumi
+-END PGP PUBLIC KEY BLOCK-



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-44614][PYTHON][CONNECT] Add missing packages in setup.py

2023-08-01 Thread ueshin

This is an automated email from the ASF dual-hosted git repository.

ueshin pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new a011d16a24d [SPARK-44614][PYTHON][CONNECT] Add missing packages in 
setup.py
a011d16a24d is described below

commit a011d16a24db383496cc64c396f08bc64f34e379
Author: Takuya UESHIN 
AuthorDate: Tue Aug 1 01:08:23 2023 -0700

[SPARK-44614][PYTHON][CONNECT] Add missing packages in setup.py

### What changes were proposed in this pull request?

Adds missing packages in `setup.py`.

### Why are the changes needed?

The following packages are not listed in `setup.py`.

- `pyspark.sql.connect.avro`
- `pyspark.sql.connect.client`
- `pyspark.sql.connect.streaming`
- `pyspark.ml.connect`

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

The existing tests.

Closes #42248 from ueshin/issues/SPARK-44614/packages.

Authored-by: Takuya UESHIN 
Signed-off-by: Takuya UESHIN 
---
 python/setup.py | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/python/setup.py b/python/setup.py
index 2282b2535e7..6642ec5042f 100755
--- a/python/setup.py
+++ b/python/setup.py
@@ -238,6 +238,7 @@ try:
 "pyspark.mllib.linalg",
 "pyspark.mllib.stat",
 "pyspark.ml",
+"pyspark.ml.connect",
 "pyspark.ml.linalg",
 "pyspark.ml.param",
 "pyspark.ml.torch",
@@ -245,13 +246,16 @@ try:
 "pyspark.sql",
 "pyspark.sql.avro",
 "pyspark.sql.connect",
+"pyspark.sql.connect.avro",
+"pyspark.sql.connect.client",
 "pyspark.sql.connect.proto",
+"pyspark.sql.connect.streaming",
+"pyspark.sql.connect.streaming.worker",
 "pyspark.sql.pandas",
 "pyspark.sql.protobuf",
 "pyspark.sql.streaming",
 "pyspark.sql.worker",
 "pyspark.streaming",
-"pyspark.sql.connect.streaming.worker",
 "pyspark.bin",
 "pyspark.sbin",
 "pyspark.jars",


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-44618][INFRA] Free up disk space for non-container jobs

2023-08-01 Thread ruifengz

This is an automated email from the ASF dual-hosted git repository.

ruifengz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 3f815953d65 [SPARK-44618][INFRA] Free up disk space for non-container 
jobs
3f815953d65 is described below

commit 3f815953d6583d1f728fa8b40b316ed22a2c93b4
Author: Ruifeng Zheng 
AuthorDate: Tue Aug 1 15:37:26 2023 +0800

[SPARK-44618][INFRA] Free up disk space for non-container jobs

### What changes were proposed in this pull request?
uninstall unneeded packages, referring to 
https://github.com/apache/flink/blob/master/tools/azure-pipelines/free_disk_space.sh

### Why are the changes needed?
increase the available disk space

before:

![image](https://github.com/apache/spark/assets/7322292/ba7c6489-41d8-472f-8f73-a7b36422e029)

after:

![image](https://github.com/apache/spark/assets/7322292/687d2f1f-8f45-41a8-9fe8-36dcc09e30c7)

Unfortunately, this won't work for container jobs (`pyspark`, `sparkr`, 
`lint`), since those packages were not installed at all

### Does this PR introduce _any_ user-facing change?
no, test-only

### How was this patch tested?
updated CI

Closes #42241 from zhengruifeng/infra_clean_non_container.

Authored-by: Ruifeng Zheng 
Signed-off-by: Ruifeng Zheng 
---
 .github/workflows/build_and_test.yml |  2 ++
 dev/free_disk_space  | 45 
 2 files changed, 47 insertions(+)

diff --git a/.github/workflows/build_and_test.yml 
b/.github/workflows/build_and_test.yml
index 81d16e4..47c1be1ba86 100644
--- a/.github/workflows/build_and_test.yml
+++ b/.github/workflows/build_and_test.yml
@@ -240,6 +240,8 @@ jobs:
 key: ${{ matrix.java }}-${{ matrix.hadoop }}-coursier-${{ 
hashFiles('**/pom.xml', '**/plugins.sbt') }}
 restore-keys: |
   ${{ matrix.java }}-${{ matrix.hadoop }}-coursier-
+- name: Free up disk space
+  run: ./dev/free_disk_space
 - name: Install Java ${{ matrix.java }}
   uses: actions/setup-java@v3
   with:
diff --git a/dev/free_disk_space b/dev/free_disk_space
new file mode 100755
index 000..f063780334e
--- /dev/null
+++ b/dev/free_disk_space
@@ -0,0 +1,45 @@
+#!/usr/bin/env bash
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+echo "=="
+echo "Free up disk space on CI system"
+echo "=="
+
+echo "Listing 100 largest packages"
+dpkg-query -Wf '${Installed-Size}\t${Package}\n' | sort -n | tail -n 100
+df -h
+echo "Removing large packages"
+sudo rm -rf /usr/share/dotnet/
+sudo rm -rf /usr/local/graalvm/
+sudo rm -rf /usr/local/.ghcup/
+sudo rm -rf /usr/local/share/powershell
+sudo rm -rf /usr/local/share/chromium
+sudo rm -rf /usr/local/lib/android
+sudo rm -rf /usr/local/lib/node_modules
+
+sudo apt-get remove --purge -y '^aspnet.*'
+sudo apt-get remove --purge -y '^dotnet-.*'
+sudo apt-get remove --purge -y '^llvm-.*'
+sudo apt-get remove --purge -y 'php.*'
+sudo apt-get remove --purge -y '^mongodb-.*'
+sudo apt-get remove --purge -y snapd google-chrome-stable 
microsoft-edge-stable firefox
+sudo apt-get remove --purge -y azure-cli google-cloud-sdk mono-devel 
powershell libgl1-mesa-dri
+sudo apt-get autoremove --purge -y
+sudo apt-get clean
+df -h


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.5 updated: [SPARK-44591][CONNECT][WEBUI] Use jobTags in SparkListenerSQLExecutionStart to link SQL Execution IDs for Spark UI Connect page

2023-08-01 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 1146ba7e944 [SPARK-44591][CONNECT][WEBUI] Use jobTags in 
SparkListenerSQLExecutionStart to link SQL Execution IDs for Spark UI Connect 
page
1146ba7e944 is described below

commit 1146ba7e944254c0ebbbd082a186fce818fd033c
Author: Jason Li 
AuthorDate: Tue Aug 1 00:15:09 2023 -0700

[SPARK-44591][CONNECT][WEBUI] Use jobTags in SparkListenerSQLExecutionStart 
to link SQL Execution IDs for Spark UI Connect page

### What changes were proposed in this pull request?
Use jobTags in SparkListenerSQLExecutionStart to get the corresponding SQL 
Execution for Spark Connect request rather than 
SparkListenerJobStart.props.getProperty(SQLExecution.EXECUTION_ID_KEY), which 
won't work when the SQL Execution does not trigger a job.

### Why are the changes needed?
This change handles cases where a SQL Execution doesn't trigger a job and 
we can't retrieve the SQL Execution ID from SparkListenerJobStart

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?
Update unit test + local manual testing

Closes #42244 from jasonli-db/spark-connect-ui-sql-start.

Authored-by: Jason Li 
Signed-off-by: Gengliang Wang 
(cherry picked from commit a40e46fe6dc35226b27335bb1431583f455f1e58)
Signed-off-by: Gengliang Wang 
---
 .../connect/ui/SparkConnectServerListener.scala| 49 --
 .../ui/SparkConnectServerListenerSuite.scala   | 74 --
 2 files changed, 83 insertions(+), 40 deletions(-)

diff --git 
a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/ui/SparkConnectServerListener.scala
 
b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/ui/SparkConnectServerListener.scala
index b40e847f404..90f9afebcb6 100644
--- 
a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/ui/SparkConnectServerListener.scala
+++ 
b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/ui/SparkConnectServerListener.scala
@@ -26,7 +26,7 @@ import 
org.apache.spark.internal.config.Status.LIVE_ENTITY_UPDATE_PERIOD
 import org.apache.spark.scheduler._
 import org.apache.spark.sql.connect.config.Connect.{CONNECT_UI_SESSION_LIMIT, 
CONNECT_UI_STATEMENT_LIMIT}
 import org.apache.spark.sql.connect.service._
-import org.apache.spark.sql.execution.SQLExecution
+import org.apache.spark.sql.execution.ui.SparkListenerSQLExecutionStart
 import org.apache.spark.status.{ElementTrackingStore, KVUtils, LiveEntity}
 
 private[connect] class SparkConnectServerListener(
@@ -80,12 +80,9 @@ private[connect] class SparkConnectServerListener(
 }
 val executeJobTag = executeJobTagOpt.get
 val exec = executionList.get(executeJobTag)
-val executionIdOpt: Option[String] = Option(jobStart.properties)
-  .flatMap { p => Option(p.getProperty(SQLExecution.EXECUTION_ID_KEY)) }
 if (exec.nonEmpty) {
   exec.foreach { exec =>
 exec.jobId += jobStart.jobId.toString
-executionIdOpt.foreach { execId => exec.sqlExecId += execId }
 updateLiveStore(exec)
   }
 } else {
@@ -105,8 +102,8 @@ private[connect] class SparkConnectServerListener(
   exec.userId,
   exec.operationId,
   exec.sparkSessionTags)
+liveExec.sqlExecId = exec.sqlExecId
 liveExec.jobId += jobStart.jobId.toString
-executionIdOpt.foreach { execId => exec.sqlExecId += execId }
 updateStoreWithTriggerEnabled(liveExec)
 executionList.remove(liveExec.jobTag)
   }
@@ -115,6 +112,7 @@ private[connect] class SparkConnectServerListener(
 
   override def onOtherEvent(event: SparkListenerEvent): Unit = {
 event match {
+  case e: SparkListenerSQLExecutionStart => onSQLExecutionStart(e)
   case e: SparkListenerConnectOperationStarted => onOperationStarted(e)
   case e: SparkListenerConnectOperationAnalyzed => onOperationAnalyzed(e)
   case e: SparkListenerConnectOperationReadyForExecution => 
onOperationReadyForExecution(e)
@@ -128,6 +126,45 @@ private[connect] class SparkConnectServerListener(
 }
   }
 
+  def onSQLExecutionStart(e: SparkListenerSQLExecutionStart): Unit = {
+val executeJobTagOpt = e.jobTags.find {
+  case ExecuteJobTag(_) => true
+  case _ => false
+}
+if (executeJobTagOpt.isEmpty) {
+  return
+}
+val executeJobTag = executeJobTagOpt.get
+val exec = executionList.get(executeJobTag)
+if (exec.nonEmpty) {
+  exec.foreach { exec =>
+exec.sqlExecId += e.executionId.toString
+updateLiveStore(exec)
+  }
+} else {
+  // This block guards against potential event re-ordering where a 
SQLExecutionStart
+  // event is processed after a

[spark] branch master updated: [SPARK-44591][CONNECT][WEBUI] Use jobTags in SparkListenerSQLExecutionStart to link SQL Execution IDs for Spark UI Connect page

2023-08-01 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new a40e46fe6dc [SPARK-44591][CONNECT][WEBUI] Use jobTags in 
SparkListenerSQLExecutionStart to link SQL Execution IDs for Spark UI Connect 
page
a40e46fe6dc is described below

commit a40e46fe6dc35226b27335bb1431583f455f1e58
Author: Jason Li 
AuthorDate: Tue Aug 1 00:15:09 2023 -0700

[SPARK-44591][CONNECT][WEBUI] Use jobTags in SparkListenerSQLExecutionStart 
to link SQL Execution IDs for Spark UI Connect page

### What changes were proposed in this pull request?
Use jobTags in SparkListenerSQLExecutionStart to get the corresponding SQL 
Execution for Spark Connect request rather than 
SparkListenerJobStart.props.getProperty(SQLExecution.EXECUTION_ID_KEY), which 
won't work when the SQL Execution does not trigger a job.

### Why are the changes needed?
This change handles cases where a SQL Execution doesn't trigger a job and 
we can't retrieve the SQL Execution ID from SparkListenerJobStart

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?
Update unit test + local manual testing

Closes #42244 from jasonli-db/spark-connect-ui-sql-start.

Authored-by: Jason Li 
Signed-off-by: Gengliang Wang 
---
 .../connect/ui/SparkConnectServerListener.scala| 49 --
 .../ui/SparkConnectServerListenerSuite.scala   | 74 --
 2 files changed, 83 insertions(+), 40 deletions(-)

diff --git 
a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/ui/SparkConnectServerListener.scala
 
b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/ui/SparkConnectServerListener.scala
index b40e847f404..90f9afebcb6 100644
--- 
a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/ui/SparkConnectServerListener.scala
+++ 
b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/ui/SparkConnectServerListener.scala
@@ -26,7 +26,7 @@ import 
org.apache.spark.internal.config.Status.LIVE_ENTITY_UPDATE_PERIOD
 import org.apache.spark.scheduler._
 import org.apache.spark.sql.connect.config.Connect.{CONNECT_UI_SESSION_LIMIT, 
CONNECT_UI_STATEMENT_LIMIT}
 import org.apache.spark.sql.connect.service._
-import org.apache.spark.sql.execution.SQLExecution
+import org.apache.spark.sql.execution.ui.SparkListenerSQLExecutionStart
 import org.apache.spark.status.{ElementTrackingStore, KVUtils, LiveEntity}
 
 private[connect] class SparkConnectServerListener(
@@ -80,12 +80,9 @@ private[connect] class SparkConnectServerListener(
 }
 val executeJobTag = executeJobTagOpt.get
 val exec = executionList.get(executeJobTag)
-val executionIdOpt: Option[String] = Option(jobStart.properties)
-  .flatMap { p => Option(p.getProperty(SQLExecution.EXECUTION_ID_KEY)) }
 if (exec.nonEmpty) {
   exec.foreach { exec =>
 exec.jobId += jobStart.jobId.toString
-executionIdOpt.foreach { execId => exec.sqlExecId += execId }
 updateLiveStore(exec)
   }
 } else {
@@ -105,8 +102,8 @@ private[connect] class SparkConnectServerListener(
   exec.userId,
   exec.operationId,
   exec.sparkSessionTags)
+liveExec.sqlExecId = exec.sqlExecId
 liveExec.jobId += jobStart.jobId.toString
-executionIdOpt.foreach { execId => exec.sqlExecId += execId }
 updateStoreWithTriggerEnabled(liveExec)
 executionList.remove(liveExec.jobTag)
   }
@@ -115,6 +112,7 @@ private[connect] class SparkConnectServerListener(
 
   override def onOtherEvent(event: SparkListenerEvent): Unit = {
 event match {
+  case e: SparkListenerSQLExecutionStart => onSQLExecutionStart(e)
   case e: SparkListenerConnectOperationStarted => onOperationStarted(e)
   case e: SparkListenerConnectOperationAnalyzed => onOperationAnalyzed(e)
   case e: SparkListenerConnectOperationReadyForExecution => 
onOperationReadyForExecution(e)
@@ -128,6 +126,45 @@ private[connect] class SparkConnectServerListener(
 }
   }
 
+  def onSQLExecutionStart(e: SparkListenerSQLExecutionStart): Unit = {
+val executeJobTagOpt = e.jobTags.find {
+  case ExecuteJobTag(_) => true
+  case _ => false
+}
+if (executeJobTagOpt.isEmpty) {
+  return
+}
+val executeJobTag = executeJobTagOpt.get
+val exec = executionList.get(executeJobTag)
+if (exec.nonEmpty) {
+  exec.foreach { exec =>
+exec.sqlExecId += e.executionId.toString
+updateLiveStore(exec)
+  }
+} else {
+  // This block guards against potential event re-ordering where a 
SQLExecutionStart
+  // event is processed after a ConnectOperationClosed event, in which 
case the Execution
+  // has already been evicted from the executionList.
+

[spark] branch master updated: [SPARK-44490][WEBUI] Remove unused `TaskPagedTable` in StagePage

2023-08-01 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 546e39c5dab [SPARK-44490][WEBUI] Remove unused `TaskPagedTable` in 
StagePage
546e39c5dab is described below

commit 546e39c5dabc243ab81b6238dc893d9993e0
Author: sychen 
AuthorDate: Tue Aug 1 15:37:27 2023 +0900

[SPARK-44490][WEBUI] Remove unused `TaskPagedTable` in StagePage

### What changes were proposed in this pull request?
 Remove `TaskPagedTable`

### Why are the changes needed?
In [SPARK-21809](https://issues.apache.org/jira/browse/SPARK-21809), we 
introduced `stagespage-template.html` to show the running status of Stage.
`TaskPagedTable` is no longer effective, but there are still many PRs 
updating related codes.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
local test

Closes #42085 from cxzl25/SPARK-44490.

Authored-by: sychen 
Signed-off-by: Kousuke Saruta 
---
 .../scala/org/apache/spark/ui/jobs/StagePage.scala | 301 +
 .../scala/org/apache/spark/ui/StagePageSuite.scala |  12 +-
 2 files changed, 13 insertions(+), 300 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala 
b/core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala
index 02aece6e50a..d50ccdadff5 100644
--- a/core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala
+++ b/core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala
@@ -17,17 +17,12 @@
 
 package org.apache.spark.ui.jobs
 
-import java.net.URLEncoder
-import java.nio.charset.StandardCharsets.UTF_8
 import java.util.Date
-import java.util.concurrent.TimeUnit
 import javax.servlet.http.HttpServletRequest
 
-import scala.collection.mutable.{HashMap, HashSet}
+import scala.collection.mutable.HashSet
 import scala.xml.{Node, Unparsed}
 
-import org.apache.commons.text.StringEscapeUtils
-
 import org.apache.spark.internal.config.UI._
 import org.apache.spark.scheduler.TaskLocality
 import org.apache.spark.status._
@@ -209,32 +204,20 @@ private[ui] class StagePage(parent: StagesTab, store: 
AppStatusStore) extends We
 val dagViz = UIUtils.showDagVizForStage(stageId, stageGraph)
 
 val currentTime = System.currentTimeMillis()
-val taskTable = try {
-  val _taskTable = new TaskPagedTable(
-stageData,
-UIUtils.prependBaseUri(request, parent.basePath) +
-  s"/stages/stage/?id=${stageId}=${stageAttemptId}",
-pageSize = taskPageSize,
-sortColumn = taskSortColumn,
-desc = taskSortDesc,
-store = parent.store
-  )
-  _taskTable
-} catch {
-  case e @ (_ : IllegalArgumentException | _ : IndexOutOfBoundsException) 
=>
-null
-}
 
 val content =
   summary ++
   dagViz ++  ++
   makeTimeline(
 // Only show the tasks in the table
-Option(taskTable).map({ taskPagedTable =>
+() => {
   val from = (eventTimelineTaskPage - 1) * eventTimelineTaskPageSize
-  val to = taskPagedTable.dataSource.dataSize.min(
-eventTimelineTaskPage * eventTimelineTaskPageSize)
-  taskPagedTable.dataSource.sliceData(from, to)}).getOrElse(Nil), 
currentTime,
+  val dataSize = store.taskCount(stageData.stageId, 
stageData.attemptId).toInt
+  val to = dataSize.min(eventTimelineTaskPage * 
eventTimelineTaskPageSize)
+  val sliceData = store.taskList(stageData.stageId, 
stageData.attemptId, from, to - from,
+indexName(taskSortColumn), !taskSortDesc)
+  sliceData
+}, currentTime,
 eventTimelineTaskPage, eventTimelineTaskPageSize, 
eventTimelineTotalPages, stageId,
 stageAttemptId, totalTasks) ++
 
@@ -246,8 +229,8 @@ private[ui] class StagePage(parent: StagesTab, store: 
AppStatusStore) extends We
 
   }
 
-  def makeTimeline(
-  tasks: Seq[TaskData],
+  private def makeTimeline(
+  tasksFunc: () => Seq[TaskData],
   currentTime: Long,
   page: Int,
   pageSize: Int,
@@ -258,6 +241,8 @@ private[ui] class StagePage(parent: StagesTab, store: 
AppStatusStore) extends We
 
 if (!TIMELINE_ENABLED) return Seq.empty[Node]
 
+val tasks = tasksFunc()
+
 val executorsSet = new HashSet[(String, String)]
 var minLaunchTime = Long.MaxValue
 var maxFinishTime = Long.MinValue
@@ -453,268 +438,6 @@ private[ui] class StagePage(parent: StagesTab, store: 
AppStatusStore) extends We
 
 }
 
-private[ui] class TaskDataSource(
-stage: StageData,
-pageSize: Int,
-sortColumn: String,
-desc: Boolean,
-store: AppStatusStore) extends PagedDataSource[TaskData](pageSize) {
-  import ApiHelper._
-
-  // Keep an internal cache of executor log maps so that long task lists 
render faster.
-  private val

45 matches

Mail list logo