[spark] 01/01: Preparing development version 3.5.1-SNAPSHOT

2023-08-28 Thread liyuanjian
This is an automated email from the ASF dual-hosted git repository.

liyuanjian pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git

commit cecd79ab57e323a55e99be89f372a61ac50bfe82
Author: Yuanjian Li 
AuthorDate: Tue Aug 29 05:57:11 2023 +

Preparing development version 3.5.1-SNAPSHOT
---
 R/pkg/DESCRIPTION  | 2 +-
 assembly/pom.xml   | 2 +-
 common/kvstore/pom.xml | 2 +-
 common/network-common/pom.xml  | 2 +-
 common/network-shuffle/pom.xml | 2 +-
 common/network-yarn/pom.xml| 2 +-
 common/sketch/pom.xml  | 2 +-
 common/tags/pom.xml| 2 +-
 common/unsafe/pom.xml  | 2 +-
 common/utils/pom.xml   | 2 +-
 connector/avro/pom.xml | 2 +-
 connector/connect/client/jvm/pom.xml   | 2 +-
 connector/connect/common/pom.xml   | 2 +-
 connector/connect/server/pom.xml   | 2 +-
 connector/docker-integration-tests/pom.xml | 2 +-
 connector/kafka-0-10-assembly/pom.xml  | 2 +-
 connector/kafka-0-10-sql/pom.xml   | 2 +-
 connector/kafka-0-10-token-provider/pom.xml| 2 +-
 connector/kafka-0-10/pom.xml   | 2 +-
 connector/kinesis-asl-assembly/pom.xml | 2 +-
 connector/kinesis-asl/pom.xml  | 2 +-
 connector/protobuf/pom.xml | 2 +-
 connector/spark-ganglia-lgpl/pom.xml   | 2 +-
 core/pom.xml   | 2 +-
 docs/_config.yml   | 6 +++---
 examples/pom.xml   | 2 +-
 graphx/pom.xml | 2 +-
 hadoop-cloud/pom.xml   | 2 +-
 launcher/pom.xml   | 2 +-
 mllib-local/pom.xml| 2 +-
 mllib/pom.xml  | 2 +-
 pom.xml| 2 +-
 python/pyspark/version.py  | 2 +-
 repl/pom.xml   | 2 +-
 resource-managers/kubernetes/core/pom.xml  | 2 +-
 resource-managers/kubernetes/integration-tests/pom.xml | 2 +-
 resource-managers/mesos/pom.xml| 2 +-
 resource-managers/yarn/pom.xml | 2 +-
 sql/api/pom.xml| 2 +-
 sql/catalyst/pom.xml   | 2 +-
 sql/core/pom.xml   | 2 +-
 sql/hive-thriftserver/pom.xml  | 2 +-
 sql/hive/pom.xml   | 2 +-
 streaming/pom.xml  | 2 +-
 tools/pom.xml  | 2 +-
 45 files changed, 47 insertions(+), 47 deletions(-)

diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index 1c093a4a980..66faa8031c4 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: SparkR
 Type: Package
-Version: 3.5.0
+Version: 3.5.1
 Title: R Front End for 'Apache Spark'
 Description: Provides an R Front end for 'Apache Spark' 
.
 Authors@R:
diff --git a/assembly/pom.xml b/assembly/pom.xml
index a0aca22eab9..45b68dd81cb 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.5.0
+3.5.1-SNAPSHOT
 ../pom.xml
   
 
diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml
index ce180f49ff1..1b1a8d0066f 100644
--- a/common/kvstore/pom.xml
+++ b/common/kvstore/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.5.0
+3.5.1-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index 8da48076a43..54c10a05eed 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.5.0
+3.5.1-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index 48e64d21a58..92bf5bc0785 100644
--- a/common/network-shuffle/pom.xml
+++ b/common/network-shuffle/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.5.0
+3.5.1-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml
index 2bbacbe71a4..3003927e713 100644
--- a/common/network-yarn/pom.xml
+++ b/common/network-yarn/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.5.0

[spark] tag v3.5.0-rc3 created (now 9f137aa4dc4)

2023-08-28 Thread liyuanjian
This is an automated email from the ASF dual-hosted git repository.

liyuanjian pushed a change to tag v3.5.0-rc3
in repository https://gitbox.apache.org/repos/asf/spark.git


  at 9f137aa4dc4 (commit)
This tag includes the following new commits:

 new 9f137aa4dc4 Preparing Spark release v3.5.0-rc3

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.5 updated (bbe12e148eb -> cecd79ab57e)

2023-08-28 Thread liyuanjian
This is an automated email from the ASF dual-hosted git repository.

liyuanjian pushed a change to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


from bbe12e148eb Revert "[SPARK-44742][PYTHON][DOCS] Add Spark version drop 
down to the PySpark doc site"
 add 9f137aa4dc4 Preparing Spark release v3.5.0-rc3
 new cecd79ab57e Preparing development version 3.5.1-SNAPSHOT

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] 01/01: Preparing Spark release v3.5.0-rc3

2023-08-28 Thread liyuanjian
This is an automated email from the ASF dual-hosted git repository.

liyuanjian pushed a commit to tag v3.5.0-rc3
in repository https://gitbox.apache.org/repos/asf/spark.git

commit 9f137aa4dc43398aafa0c3e035ed3174182d7d6c
Author: Yuanjian Li 
AuthorDate: Tue Aug 29 05:57:06 2023 +

Preparing Spark release v3.5.0-rc3
---
 R/pkg/DESCRIPTION  | 2 +-
 assembly/pom.xml   | 2 +-
 common/kvstore/pom.xml | 2 +-
 common/network-common/pom.xml  | 2 +-
 common/network-shuffle/pom.xml | 2 +-
 common/network-yarn/pom.xml| 2 +-
 common/sketch/pom.xml  | 2 +-
 common/tags/pom.xml| 2 +-
 common/unsafe/pom.xml  | 2 +-
 common/utils/pom.xml   | 2 +-
 connector/avro/pom.xml | 2 +-
 connector/connect/client/jvm/pom.xml   | 2 +-
 connector/connect/common/pom.xml   | 2 +-
 connector/connect/server/pom.xml   | 2 +-
 connector/docker-integration-tests/pom.xml | 2 +-
 connector/kafka-0-10-assembly/pom.xml  | 2 +-
 connector/kafka-0-10-sql/pom.xml   | 2 +-
 connector/kafka-0-10-token-provider/pom.xml| 2 +-
 connector/kafka-0-10/pom.xml   | 2 +-
 connector/kinesis-asl-assembly/pom.xml | 2 +-
 connector/kinesis-asl/pom.xml  | 2 +-
 connector/protobuf/pom.xml | 2 +-
 connector/spark-ganglia-lgpl/pom.xml   | 2 +-
 core/pom.xml   | 2 +-
 docs/_config.yml   | 6 +++---
 examples/pom.xml   | 2 +-
 graphx/pom.xml | 2 +-
 hadoop-cloud/pom.xml   | 2 +-
 launcher/pom.xml   | 2 +-
 mllib-local/pom.xml| 2 +-
 mllib/pom.xml  | 2 +-
 pom.xml| 2 +-
 python/pyspark/version.py  | 2 +-
 repl/pom.xml   | 2 +-
 resource-managers/kubernetes/core/pom.xml  | 2 +-
 resource-managers/kubernetes/integration-tests/pom.xml | 2 +-
 resource-managers/mesos/pom.xml| 2 +-
 resource-managers/yarn/pom.xml | 2 +-
 sql/api/pom.xml| 2 +-
 sql/catalyst/pom.xml   | 2 +-
 sql/core/pom.xml   | 2 +-
 sql/hive-thriftserver/pom.xml  | 2 +-
 sql/hive/pom.xml   | 2 +-
 streaming/pom.xml  | 2 +-
 tools/pom.xml  | 2 +-
 45 files changed, 47 insertions(+), 47 deletions(-)

diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index 66faa8031c4..1c093a4a980 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: SparkR
 Type: Package
-Version: 3.5.1
+Version: 3.5.0
 Title: R Front End for 'Apache Spark'
 Description: Provides an R Front end for 'Apache Spark' 
.
 Authors@R:
diff --git a/assembly/pom.xml b/assembly/pom.xml
index 45b68dd81cb..a0aca22eab9 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.5.1-SNAPSHOT
+3.5.0
 ../pom.xml
   
 
diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml
index 1b1a8d0066f..ce180f49ff1 100644
--- a/common/kvstore/pom.xml
+++ b/common/kvstore/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.5.1-SNAPSHOT
+3.5.0
 ../../pom.xml
   
 
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index 54c10a05eed..8da48076a43 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.5.1-SNAPSHOT
+3.5.0
 ../../pom.xml
   
 
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index 92bf5bc0785..48e64d21a58 100644
--- a/common/network-shuffle/pom.xml
+++ b/common/network-shuffle/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.5.1-SNAPSHOT
+3.5.0
 ../../pom.xml
   
 
diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml
index 3003927e713..2bbacbe71a4 100644
--- a/common/network-yarn/pom.xml
+++ b/common/network-yarn/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.5.1-SNAPSHOT
+   

[spark] tag v3.5.0-rc3 deleted (was d5423e7a89c)

2023-08-28 Thread liyuanjian
This is an automated email from the ASF dual-hosted git repository.

liyuanjian pushed a change to tag v3.5.0-rc3
in repository https://gitbox.apache.org/repos/asf/spark.git


*** WARNING: tag v3.5.0-rc3 was deleted! ***

 was d5423e7a89c Preparing Spark release v3.5.0-rc3

The revisions that were on this tag are still contained in
other references; therefore, this change does not discard any commits
from the repository.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.5 updated: Revert "[SPARK-44742][PYTHON][DOCS] Add Spark version drop down to the PySpark doc site"

2023-08-28 Thread liyuanjian
This is an automated email from the ASF dual-hosted git repository.

liyuanjian pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new bbe12e148eb Revert "[SPARK-44742][PYTHON][DOCS] Add Spark version drop 
down to the PySpark doc site"
bbe12e148eb is described below

commit bbe12e148eb1f289cfb1f4412525f4c4381c10a9
Author: Yuanjian Li 
AuthorDate: Mon Aug 28 22:43:50 2023 -0700

Revert "[SPARK-44742][PYTHON][DOCS] Add Spark version drop down to the 
PySpark doc site"

This reverts commit 319dff11c373cc872aab4e7d55745561ee5d7b0e.
---
 python/docs/source/_static/css/pyspark.css | 13 
 python/docs/source/_static/versions.json   | 22 ---
 .../docs/source/_templates/version-switcher.html   | 77 --
 python/docs/source/conf.py |  9 +--
 4 files changed, 1 insertion(+), 120 deletions(-)

diff --git a/python/docs/source/_static/css/pyspark.css 
b/python/docs/source/_static/css/pyspark.css
index ccfe60f2bca..89b7c65f27a 100644
--- a/python/docs/source/_static/css/pyspark.css
+++ b/python/docs/source/_static/css/pyspark.css
@@ -95,16 +95,3 @@ u.bd-sidebar .nav>li>ul>.active:hover>a,.bd-sidebar 
.nav>li>ul>.active>a {
 .spec_table tr, td, th {
 border-top: none!important;
 }
-
-/* Styling to the version dropdown */
-#version-button {
-  padding-left: 0.2rem;
-  padding-right: 3.2rem;
-}
-
-#version_switcher {
-  height: auto;
-  max-height: 300px;
-  width: 165px;
-  overflow-y: auto;
-}
diff --git a/python/docs/source/_static/versions.json 
b/python/docs/source/_static/versions.json
deleted file mode 100644
index 3d0bd148180..000
--- a/python/docs/source/_static/versions.json
+++ /dev/null
@@ -1,22 +0,0 @@
-[
-{
-"name": "3.4.1",
-"version": "3.4.1"
-},
-{
-"name": "3.4.0",
-"version": "3.4.0"
-},
-{
-"name": "3.3.2",
-"version": "3.3.2"
-},
-{
-"name": "3.3.1",
-"version": "3.3.1"
-},
-{
-"name": "3.3.0",
-"version": "3.3.0"
-}
-]
diff --git a/python/docs/source/_templates/version-switcher.html 
b/python/docs/source/_templates/version-switcher.html
deleted file mode 100644
index 16c443229f4..000
--- a/python/docs/source/_templates/version-switcher.html
+++ /dev/null
@@ -1,77 +0,0 @@
-
-
-
-
-{{ release }}
-
-
-
-
-
-
-
-
-// Function to construct the target URL from the JSON components
-function buildURL(entry) {
-var template = "{{ switcher_template_url }}";  // supplied by jinja
-template = template.replace("{version}", entry.version);
-return template;
-}
-
-// Function to check if corresponding page path exists in other version of docs
-// and, if so, go there instead of the homepage of the other docs version
-function checkPageExistsAndRedirect(event) {
-const currentFilePath = "{{ pagename }}.html",
-  otherDocsHomepage = event.target.getAttribute("href");
-let tryUrl = `${otherDocsHomepage}${currentFilePath}`;
-$.ajax({
-type: 'HEAD',
-url: tryUrl,
-// if the page exists, go there
-success: function() {
-location.href = tryUrl;
-}
-}).fail(function() {
-location.href = otherDocsHomepage;
-});
-return false;
-}
-
-// Function to populate the version switcher
-(function () {
-// get JSON config
-$.getJSON("{{ switcher_json_url }}", function(data, textStatus, jqXHR) {
-// create the nodes first (before AJAX calls) to ensure the order is
-// correct (for now, links will go to doc version homepage)
-$.each(data, function(index, entry) {
-// if no custom name specified (e.g., "latest"), use version string
-if (!("name" in entry)) {
-entry.name = entry.version;
-}
-// construct the appropriate URL, and add it to the dropdown
-entry.url = buildURL(entry);
-const node = document.createElement("a");
-node.setAttribute("class", "list-group-item list-group-item-action 
py-1");
-node.setAttribute("href", `${entry.url}`);
-node.textContent = `${entry.name}`;
-node.onclick = checkPageExistsAndRedirect;
-$("#version_switcher").append(node);
-});
-});
-})();
-
diff --git a/python/docs/source/conf.py b/python/docs/source/conf.py
index 0f57cb37cee..38c331048e7 100644
--- a/python/docs/source/conf.py
+++ b/python/docs/source/conf.py
@@ -177,17 +177,10 @@ autosummary_generate = True
 # a list of builtin themes.
 html_theme = 'pydata_sphinx_theme'
 
-html_context = {
-"switcher_json_url": "_static/versions.json",
-"switcher_template_url": 
"https://spark.apache.org/docs/{version}/api/python/index.html;,
-}
-
 # Theme options are theme-specific and 

[spark] branch master updated (df63adf7343 -> 7315a046e22)

2023-08-28 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from df63adf7343 [SPARK-43646][CONNECT][TESTS] Make both SBT and Maven use 
`spark-proto` uber jar to test the `connect` module
 add 7315a046e22 [SPARK-44996][K8S] Use `lazy val` for 
`DefaultVolcanoClient`

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/deploy/k8s/features/VolcanoFeatureStep.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.5 updated: [SPARK-43646][CONNECT][TESTS] Make both SBT and Maven use `spark-proto` uber jar to test the `connect` module

2023-08-28 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 179aaab3c48 [SPARK-43646][CONNECT][TESTS] Make both SBT and Maven use 
`spark-proto` uber jar to test the `connect` module
179aaab3c48 is described below

commit 179aaab3c48fd6bcce00885d40a2e4a496e0802f
Author: yangjie01 
AuthorDate: Tue Aug 29 11:15:23 2023 +0800

[SPARK-43646][CONNECT][TESTS] Make both SBT and Maven use `spark-proto` 
uber jar to test the `connect` module

### What changes were proposed in this pull request?
Before this pr, when we tested the `connect` module, Maven used the shaded 
`spark-protobuf` jar for testing, while SBT used the original jar for testing, 
which also led to inconsistent testing behavior. So some tests passed when 
using SBT, but failed when using Maven:

run

```
build/mvn clean install -DskipTests
build/mvn test -pl connector/connect/server
```

there will be two test failed as follows:

```
- from_protobuf_messageClassName *** FAILED ***
  org.apache.spark.sql.AnalysisException: [CANNOT_LOAD_PROTOBUF_CLASS] 
Could not load Protobuf class with name 
org.apache.spark.connect.proto.StorageLevel. 
org.apache.spark.connect.proto.StorageLevel does not extend shaded Protobuf 
Message class org.sparkproject.spark_protobuf.protobuf.Message. The jar with 
Protobuf classes needs to be shaded (com.google.protobuf.* --> 
org.sparkproject.spark_protobuf.protobuf.*).
  at 
org.apache.spark.sql.errors.QueryCompilationErrors$.protobufClassLoadError(QueryCompilationErrors.scala:3417)
  at 
org.apache.spark.sql.protobuf.utils.ProtobufUtils$.buildDescriptorFromJavaClass(ProtobufUtils.scala:193)
  at 
org.apache.spark.sql.protobuf.utils.ProtobufUtils$.buildDescriptor(ProtobufUtils.scala:151)
  at 
org.apache.spark.sql.protobuf.ProtobufDataToCatalyst.messageDescriptor$lzycompute(ProtobufDataToCatalyst.scala:58)
  at 
org.apache.spark.sql.protobuf.ProtobufDataToCatalyst.messageDescriptor(ProtobufDataToCatalyst.scala:57)
  at 
org.apache.spark.sql.protobuf.ProtobufDataToCatalyst.dataType$lzycompute(ProtobufDataToCatalyst.scala:43)
  at 
org.apache.spark.sql.protobuf.ProtobufDataToCatalyst.dataType(ProtobufDataToCatalyst.scala:42)
  at 
org.apache.spark.sql.catalyst.expressions.Alias.toAttribute(namedExpressions.scala:194)
  at 
org.apache.spark.sql.catalyst.plans.logical.Project.$anonfun$output$1(basicLogicalOperators.scala:72)
  at 
scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)

- from_protobuf_messageClassName_options *** FAILED ***
  org.apache.spark.sql.AnalysisException: [CANNOT_LOAD_PROTOBUF_CLASS] 
Could not load Protobuf class with name 
org.apache.spark.connect.proto.StorageLevel. 
org.apache.spark.connect.proto.StorageLevel does not extend shaded Protobuf 
Message class org.sparkproject.spark_protobuf.protobuf.Message. The jar with 
Protobuf classes needs to be shaded (com.google.protobuf.* --> 
org.sparkproject.spark_protobuf.protobuf.*).
  at 
org.apache.spark.sql.errors.QueryCompilationErrors$.protobufClassLoadError(QueryCompilationErrors.scala:3417)
  at 
org.apache.spark.sql.protobuf.utils.ProtobufUtils$.buildDescriptorFromJavaClass(ProtobufUtils.scala:193)
  at 
org.apache.spark.sql.protobuf.utils.ProtobufUtils$.buildDescriptor(ProtobufUtils.scala:151)
  at 
org.apache.spark.sql.protobuf.ProtobufDataToCatalyst.messageDescriptor$lzycompute(ProtobufDataToCatalyst.scala:58)
  at 
org.apache.spark.sql.protobuf.ProtobufDataToCatalyst.messageDescriptor(ProtobufDataToCatalyst.scala:57)
  at 
org.apache.spark.sql.protobuf.ProtobufDataToCatalyst.dataType$lzycompute(ProtobufDataToCatalyst.scala:43)
  at 
org.apache.spark.sql.protobuf.ProtobufDataToCatalyst.dataType(ProtobufDataToCatalyst.scala:42)
  at 
org.apache.spark.sql.catalyst.expressions.Alias.toAttribute(namedExpressions.scala:194)
  at 
org.apache.spark.sql.catalyst.plans.logical.Project.$anonfun$output$1(basicLogicalOperators.scala:72)
  at 
scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
```

So this pr make SBT also use `spark-proto` uber 
jar(`spark-protobuf-assembly-**-SNAPSHOT.jar`) for the above tests and refactor 
the test cases to make them pass both SBT and Maven after this pr.

### Why are the changes needed?
Make connect server module can test pass using both SBT and maven.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
- Pass Github Actions
- Manual check

```
build/mvn clean install -DskipTests
build/mvn test -pl connector/connect/server
```

all test passed after this pr.

Closes #42236 from 

[spark] branch master updated: [SPARK-43646][CONNECT][TESTS] Make both SBT and Maven use `spark-proto` uber jar to test the `connect` module

2023-08-28 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new df63adf7343 [SPARK-43646][CONNECT][TESTS] Make both SBT and Maven use 
`spark-proto` uber jar to test the `connect` module
df63adf7343 is described below

commit df63adf734370f5c2d71a348f9d36658718b302c
Author: yangjie01 
AuthorDate: Tue Aug 29 11:15:23 2023 +0800

[SPARK-43646][CONNECT][TESTS] Make both SBT and Maven use `spark-proto` 
uber jar to test the `connect` module

### What changes were proposed in this pull request?
Before this pr, when we tested the `connect` module, Maven used the shaded 
`spark-protobuf` jar for testing, while SBT used the original jar for testing, 
which also led to inconsistent testing behavior. So some tests passed when 
using SBT, but failed when using Maven:

run

```
build/mvn clean install -DskipTests
build/mvn test -pl connector/connect/server
```

there will be two test failed as follows:

```
- from_protobuf_messageClassName *** FAILED ***
  org.apache.spark.sql.AnalysisException: [CANNOT_LOAD_PROTOBUF_CLASS] 
Could not load Protobuf class with name 
org.apache.spark.connect.proto.StorageLevel. 
org.apache.spark.connect.proto.StorageLevel does not extend shaded Protobuf 
Message class org.sparkproject.spark_protobuf.protobuf.Message. The jar with 
Protobuf classes needs to be shaded (com.google.protobuf.* --> 
org.sparkproject.spark_protobuf.protobuf.*).
  at 
org.apache.spark.sql.errors.QueryCompilationErrors$.protobufClassLoadError(QueryCompilationErrors.scala:3417)
  at 
org.apache.spark.sql.protobuf.utils.ProtobufUtils$.buildDescriptorFromJavaClass(ProtobufUtils.scala:193)
  at 
org.apache.spark.sql.protobuf.utils.ProtobufUtils$.buildDescriptor(ProtobufUtils.scala:151)
  at 
org.apache.spark.sql.protobuf.ProtobufDataToCatalyst.messageDescriptor$lzycompute(ProtobufDataToCatalyst.scala:58)
  at 
org.apache.spark.sql.protobuf.ProtobufDataToCatalyst.messageDescriptor(ProtobufDataToCatalyst.scala:57)
  at 
org.apache.spark.sql.protobuf.ProtobufDataToCatalyst.dataType$lzycompute(ProtobufDataToCatalyst.scala:43)
  at 
org.apache.spark.sql.protobuf.ProtobufDataToCatalyst.dataType(ProtobufDataToCatalyst.scala:42)
  at 
org.apache.spark.sql.catalyst.expressions.Alias.toAttribute(namedExpressions.scala:194)
  at 
org.apache.spark.sql.catalyst.plans.logical.Project.$anonfun$output$1(basicLogicalOperators.scala:72)
  at 
scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)

- from_protobuf_messageClassName_options *** FAILED ***
  org.apache.spark.sql.AnalysisException: [CANNOT_LOAD_PROTOBUF_CLASS] 
Could not load Protobuf class with name 
org.apache.spark.connect.proto.StorageLevel. 
org.apache.spark.connect.proto.StorageLevel does not extend shaded Protobuf 
Message class org.sparkproject.spark_protobuf.protobuf.Message. The jar with 
Protobuf classes needs to be shaded (com.google.protobuf.* --> 
org.sparkproject.spark_protobuf.protobuf.*).
  at 
org.apache.spark.sql.errors.QueryCompilationErrors$.protobufClassLoadError(QueryCompilationErrors.scala:3417)
  at 
org.apache.spark.sql.protobuf.utils.ProtobufUtils$.buildDescriptorFromJavaClass(ProtobufUtils.scala:193)
  at 
org.apache.spark.sql.protobuf.utils.ProtobufUtils$.buildDescriptor(ProtobufUtils.scala:151)
  at 
org.apache.spark.sql.protobuf.ProtobufDataToCatalyst.messageDescriptor$lzycompute(ProtobufDataToCatalyst.scala:58)
  at 
org.apache.spark.sql.protobuf.ProtobufDataToCatalyst.messageDescriptor(ProtobufDataToCatalyst.scala:57)
  at 
org.apache.spark.sql.protobuf.ProtobufDataToCatalyst.dataType$lzycompute(ProtobufDataToCatalyst.scala:43)
  at 
org.apache.spark.sql.protobuf.ProtobufDataToCatalyst.dataType(ProtobufDataToCatalyst.scala:42)
  at 
org.apache.spark.sql.catalyst.expressions.Alias.toAttribute(namedExpressions.scala:194)
  at 
org.apache.spark.sql.catalyst.plans.logical.Project.$anonfun$output$1(basicLogicalOperators.scala:72)
  at 
scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
```

So this pr make SBT also use `spark-proto` uber 
jar(`spark-protobuf-assembly-**-SNAPSHOT.jar`) for the above tests and refactor 
the test cases to make them pass both SBT and Maven after this pr.

### Why are the changes needed?
Make connect server module can test pass using both SBT and maven.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
- Pass Github Actions
- Manual check

```
build/mvn clean install -DskipTests
build/mvn test -pl connector/connect/server
```

all test passed after this pr.

Closes #42236 from LuciferYang/protobuf-test.
  

[spark] branch master updated: [SPARK-44860][SQL] Add SESSION_USER function

2023-08-28 Thread yao
This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 600f62e0edd [SPARK-44860][SQL] Add SESSION_USER function
600f62e0edd is described below

commit 600f62e0edd92f11f1bf940e87ea2a64e045a2e7
Author: Vitalii Li 
AuthorDate: Tue Aug 29 10:41:26 2023 +0800

[SPARK-44860][SQL] Add SESSION_USER function

### What changes were proposed in this pull request?

This change implements `SESSION_USER` expression. It behaves exactly the 
same as `CURRENT_USER` but according to standard when respective function is 
used inside a routine (UDF):
- `CURRENT_USER` should return security definer, i.e. owner of an UDF
- `SESSION_USER` should return connected user.

The code is duplicated for this reason - to be able to identify which 
expression is used inside a routing.

### Why are the changes needed?

This is a missing expression defined by SQL standard.

### Does this PR introduce _any_ user-facing change?

Yes, this change introduces a new expression.

### How was this patch tested?

Updating existing unit tests.

Closes #42549 from vitaliili-db/session_user.

Authored-by: Vitalii Li 
Signed-off-by: Kent Yao 
---
 .../main/scala/org/apache/spark/sql/functions.scala |   8 
 .../apache/spark/sql/PlanGenerationTestSuite.scala  |   4 
 .../explain-results/function_session_user.explain   |   2 ++
 .../query-tests/queries/function_session_user.json  |  20 
 .../queries/function_session_user.proto.bin | Bin 0 -> 174 bytes
 python/pyspark/sql/tests/test_functions.py  |   1 +
 .../spark/sql/catalyst/parser/SqlBaseParser.g4  |   2 +-
 .../catalyst/analysis/ColumnResolutionHelper.scala  |   3 ++-
 .../sql/catalyst/analysis/FunctionRegistry.scala|   1 +
 .../spark/sql/catalyst/parser/AstBuilder.scala  |   2 +-
 .../main/scala/org/apache/spark/sql/functions.scala |   8 
 .../sql-functions/sql-expression-schema.md  |  13 +++--
 .../apache/spark/sql/DataFrameFunctionsSuite.scala  |   3 ++-
 .../org/apache/spark/sql/MiscFunctionsSuite.scala   |   9 +
 .../ThriftServerWithSparkContextSuite.scala |   4 ++--
 15 files changed, 64 insertions(+), 16 deletions(-)

diff --git 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/functions.scala
 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/functions.scala
index 7cd27ecaafb..8ea5f07c528 100644
--- 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/functions.scala
+++ 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/functions.scala
@@ -3346,6 +3346,14 @@ object functions {
*/
   def user(): Column = Column.fn("user")
 
+  /**
+   * Returns the user name of current execution context.
+   *
+   * @group misc_funcs
+   * @since 4.0.0
+   */
+  def session_user(): Column = Column.fn("session_user")
+
   /**
* Returns an universally unique identifier (UUID) string. The value is 
returned as a canonical
* UUID 36-character string.
diff --git 
a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/PlanGenerationTestSuite.scala
 
b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/PlanGenerationTestSuite.scala
index 4916ff1f597..ccd68f75bda 100644
--- 
a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/PlanGenerationTestSuite.scala
+++ 
b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/PlanGenerationTestSuite.scala
@@ -1564,6 +1564,10 @@ class PlanGenerationTestSuite
 fn.user()
   }
 
+  functionTest("session_user") {
+fn.session_user()
+  }
+
   functionTest("md5") {
 fn.md5(fn.col("g").cast("binary"))
   }
diff --git 
a/connector/connect/common/src/test/resources/query-tests/explain-results/function_session_user.explain
 
b/connector/connect/common/src/test/resources/query-tests/explain-results/function_session_user.explain
new file mode 100644
index 000..82f5d2adcec
--- /dev/null
+++ 
b/connector/connect/common/src/test/resources/query-tests/explain-results/function_session_user.explain
@@ -0,0 +1,2 @@
+Project [current_user() AS current_user()#0]
++- LocalRelation , [id#0L, a#0, b#0, d#0, e#0, f#0, g#0]
diff --git 
a/connector/connect/common/src/test/resources/query-tests/queries/function_session_user.json
 
b/connector/connect/common/src/test/resources/query-tests/queries/function_session_user.json
new file mode 100644
index 000..07afa4a77c1
--- /dev/null
+++ 
b/connector/connect/common/src/test/resources/query-tests/queries/function_session_user.json
@@ -0,0 +1,20 @@
+{
+  "common": {
+"planId": "1"
+  },
+  "project": {
+"input": {
+  "common": {
+"planId": "0"
+  },
+  "localRelation": {
+"schema": 

[spark] branch master updated: [SPARK-44965][PYTHON] Hide internal functions/variables from `pyspark.sql.functions`

2023-08-28 Thread ruifengz
This is an automated email from the ASF dual-hosted git repository.

ruifengz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new c39a82593c3 [SPARK-44965][PYTHON] Hide internal functions/variables 
from `pyspark.sql.functions`
c39a82593c3 is described below

commit c39a82593c3b85e507d6431966bc840ba8c06d60
Author: Ruifeng Zheng 
AuthorDate: Tue Aug 29 09:29:22 2023 +0800

[SPARK-44965][PYTHON] Hide internal functions/variables from 
`pyspark.sql.functions`

### What changes were proposed in this pull request?
Hide internal functions/variables from `pyspark.sql.functions`

### Why are the changes needed?
internal functions/variables should not be exposed to end users:

```
Welcome to
    __
 / __/__  ___ _/ /__
_\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 3.4.1
  /_/

Using Python version 3.10.12 (main, Jul  5 2023 15:02:25)
Spark context Web UI available at http://localhost:4040/
Spark context available as 'sc' (master = local[*], app id = 
local-1692949938125).
SparkSession available as 'spark'.

In [1]: from pyspark.sql.functions import *

In [2]: ??to_str
Signature: to_str(value: Any) -> Optional[str]
Source:
def to_str(value: Any) -> Optional[str]:
"""
A wrapper over str(), but converts bool values to lower case strings.
If None is given, just returns None, instead of converting it to string 
"None".
"""
if isinstance(value, bool):
return str(value).lower()
elif value is None:
return value
else:
return str(value)
File:  ~/.dev/bin/spark-3.4.1-bin-hadoop3/python/pyspark/sql/utils.py
Type:  function
```

`to_str` here is a internal helper function

### Does this PR introduce _any_ user-facing change?
yes

### How was this patch tested?
CI

### Was this patch authored or co-authored using generative AI tooling?
NO

Closes #42680 from zhengruifeng/py_func_all.

Authored-by: Ruifeng Zheng 
Signed-off-by: Ruifeng Zheng 
---
 python/pyspark/sql/functions.py| 430 +
 python/pyspark/sql/tests/test_functions.py |  33 +++
 2 files changed, 463 insertions(+)

diff --git a/python/pyspark/sql/functions.py b/python/pyspark/sql/functions.py
index 5d5557cb916..43b82d31368 100644
--- a/python/pyspark/sql/functions.py
+++ b/python/pyspark/sql/functions.py
@@ -79,6 +79,436 @@ if has_numpy:
 # since it requires making every single overridden definition.
 
 
+__all__ = [
+"abs",
+"acos",
+"acosh",
+"add_months",
+"aes_decrypt",
+"aes_encrypt",
+"aggregate",
+"any_value",
+"approxCountDistinct",
+"approx_count_distinct",
+"approx_percentile",
+"array",
+"array_agg",
+"array_append",
+"array_compact",
+"array_contains",
+"array_distinct",
+"array_except",
+"array_insert",
+"array_intersect",
+"array_join",
+"array_max",
+"array_min",
+"array_position",
+"array_prepend",
+"array_remove",
+"array_repeat",
+"array_size",
+"array_sort",
+"array_union",
+"arrays_overlap",
+"arrays_zip",
+"asc",
+"asc_nulls_first",
+"asc_nulls_last",
+"ascii",
+"asin",
+"asinh",
+"assert_true",
+"atan",
+"atan2",
+"atanh",
+"avg",
+"base64",
+"bin",
+"bit_and",
+"bit_count",
+"bit_get",
+"bit_length",
+"bit_or",
+"bit_xor",
+"bitmap_bit_position",
+"bitmap_bucket_number",
+"bitmap_construct_agg",
+"bitmap_count",
+"bitmap_or_agg",
+"bitwiseNOT",
+"bitwise_not",
+"bool_and",
+"bool_or",
+"broadcast",
+"bround",
+"btrim",
+"bucket",
+"call_function",
+"call_udf",
+"cardinality",
+"cast",
+"cbrt",
+"ceil",
+"ceiling",
+"char",
+"char_length",
+"character_length",
+"coalesce",
+"col",
+"collect_list",
+"collect_set",
+"column",
+"concat",
+"concat_ws",
+"contains",
+"conv",
+"convert_timezone",
+"corr",
+"cos",
+"cosh",
+"cot",
+"count",
+"countDistinct",
+"count_distinct",
+"count_if",
+"count_min_sketch",
+"covar_pop",
+"covar_samp",
+"crc32",
+"create_map",
+"csc",
+"cume_dist",
+"curdate",
+"current_catalog",
+"current_database",
+"current_date",
+"current_schema",
+"current_timestamp",
+"current_timezone",
+"current_user",
+"date_add",
+"date_diff",
+"date_format",
+"date_from_unix_date",
+"date_part",
+"date_sub",
+"date_trunc",
+"dateadd",
+"datediff",
+"datepart",
+"day",
+"dayofmonth",

[spark] branch master updated: [SPARK-44995][K8S] Promote `SparkKubernetesClientFactory` to `DeveloperApi`

2023-08-28 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new c596fceebb9 [SPARK-44995][K8S] Promote `SparkKubernetesClientFactory` 
to `DeveloperApi`
c596fceebb9 is described below

commit c596fceebb9e8b0501052b5c0fc3b63ad1293d4a
Author: Dongjoon Hyun 
AuthorDate: Mon Aug 28 17:30:31 2023 -0700

[SPARK-44995][K8S] Promote `SparkKubernetesClientFactory` to `DeveloperApi`

### What changes were proposed in this pull request?

This PR aims to promote `SparkKubernetesClientFactory` as **stable** 
`DeveloperApi` in order to maintain it officially in a backward compatible way 
at Apache Spark 4.0.0.

### Why are the changes needed?

Like SPARK-35280 and SPARK-37497, `SparkKubernetesClientFactory` is also 
able to be used to develop new `ExternalClusterManager` for K8s environment.
- https://github.com/apache/spark/pull/32406
- https://github.com/apache/spark/pull/34751

### Does this PR introduce _any_ user-facing change?

No. Previously, it was `private[spark]`.

### How was this patch tested?

Manual review because this is only a visibility change.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #42709 from dongjoon-hyun/SPARK-44995.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 .../spark/deploy/k8s/SparkKubernetesClientFactory.scala  | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git 
a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/SparkKubernetesClientFactory.scala
 
b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/SparkKubernetesClientFactory.scala
index bc0e7934024..3763aeadea0 100644
--- 
a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/SparkKubernetesClientFactory.scala
+++ 
b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/SparkKubernetesClientFactory.scala
@@ -30,18 +30,28 @@ import okhttp3.Dispatcher
 import okhttp3.OkHttpClient
 
 import org.apache.spark.SparkConf
+import org.apache.spark.annotation.{DeveloperApi, Since, Stable}
 import org.apache.spark.deploy.k8s.Config._
 import org.apache.spark.internal.Logging
 import org.apache.spark.internal.config.ConfigEntry
 import org.apache.spark.util.ThreadUtils
 
 /**
+ * :: DeveloperApi ::
+ *
  * Spark-opinionated builder for Kubernetes clients. It uses a prefix plus 
common suffixes to
  * parse configuration keys, similar to the manner in which Spark's 
SecurityManager parses SSL
  * options for different components.
+ *
+ * This can be used to implement new ExternalClusterManagers.
+ *
+ * @since 4.0.0
  */
-private[spark] object SparkKubernetesClientFactory extends Logging {
+@Stable
+@DeveloperApi
+object SparkKubernetesClientFactory extends Logging {
 
+  @Since("4.0.0")
   def createKubernetesClient(
   master: String,
   namespace: Option[String],


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-44993][CORE] Add `ShuffleChecksumUtils.compareChecksums` by reusing `ShuffleChecksumTestHelp.compareChecksums`

2023-08-28 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 5db58f92538 [SPARK-44993][CORE] Add 
`ShuffleChecksumUtils.compareChecksums` by reusing 
`ShuffleChecksumTestHelp.compareChecksums`
5db58f92538 is described below

commit 5db58f92538d2cf2fee90a5ca08c07c4e2242aad
Author: Dongjoon Hyun 
AuthorDate: Mon Aug 28 17:29:33 2023 -0700

[SPARK-44993][CORE] Add `ShuffleChecksumUtils.compareChecksums` by reusing 
`ShuffleChecksumTestHelp.compareChecksums`

### What changes were proposed in this pull request?

This PR aims to add `ShuffleChecksumUtils.compareChecksums` by reusing the 
existing test code `ShuffleChecksumTestHelp.compareChecksums` in order to reuse 
the functionality in the main code.

### Why are the changes needed?

This is very useful in the test code. We can take advantage of this 
verification logic in `core` module.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs with the existing test codes because this is a kind of 
refactoring.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #42707 from dongjoon-hyun/SPARK-44993.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 .../spark/shuffle/ShuffleChecksumUtils.scala}  | 13 +++---
 .../spark/shuffle/ShuffleChecksumTestHelper.scala  | 49 ++
 2 files changed, 8 insertions(+), 54 deletions(-)

diff --git 
a/core/src/test/scala/org/apache/spark/shuffle/ShuffleChecksumTestHelper.scala 
b/core/src/main/scala/org/apache/spark/shuffle/ShuffleChecksumUtils.scala
similarity index 87%
copy from 
core/src/test/scala/org/apache/spark/shuffle/ShuffleChecksumTestHelper.scala
copy to core/src/main/scala/org/apache/spark/shuffle/ShuffleChecksumUtils.scala
index 3db2f77fe15..75b0efcf5cd 100644
--- 
a/core/src/test/scala/org/apache/spark/shuffle/ShuffleChecksumTestHelper.scala
+++ b/core/src/main/scala/org/apache/spark/shuffle/ShuffleChecksumUtils.scala
@@ -23,21 +23,17 @@ import java.util.zip.CheckedInputStream
 import org.apache.spark.network.shuffle.checksum.ShuffleChecksumHelper
 import org.apache.spark.network.util.LimitedInputStream
 
-trait ShuffleChecksumTestHelper {
+object ShuffleChecksumUtils {
 
   /**
-   * Ensure that the checksum values are consistent between write and read 
side.
+   * Ensure that the checksum values are consistent with index file and data 
file.
*/
   def compareChecksums(
   numPartition: Int,
   algorithm: String,
   checksum: File,
   data: File,
-  index: File): Unit = {
-assert(checksum.exists(), "Checksum file doesn't exist")
-assert(data.exists(), "Data file doesn't exist")
-assert(index.exists(), "Index file doesn't exist")
-
+  index: File): Boolean = {
 var checksumIn: DataInputStream = null
 val expectChecksums = Array.ofDim[Long](numPartition)
 try {
@@ -66,7 +62,7 @@ trait ShuffleChecksumTestHelper {
 checkedIn.read(bytes, 0, limit)
 prevOffset = curOffset
 // checksum must be consistent at both write and read sides
-assert(checkedIn.getChecksum.getValue == expectChecksums(i))
+if (checkedIn.getChecksum.getValue != expectChecksums(i)) return false
   }
 } finally {
   if (dataIn != null) {
@@ -79,5 +75,6 @@ trait ShuffleChecksumTestHelper {
 checkedIn.close()
   }
 }
+true
   }
 }
diff --git 
a/core/src/test/scala/org/apache/spark/shuffle/ShuffleChecksumTestHelper.scala 
b/core/src/test/scala/org/apache/spark/shuffle/ShuffleChecksumTestHelper.scala
index 3db2f77fe15..8be103b7be8 100644
--- 
a/core/src/test/scala/org/apache/spark/shuffle/ShuffleChecksumTestHelper.scala
+++ 
b/core/src/test/scala/org/apache/spark/shuffle/ShuffleChecksumTestHelper.scala
@@ -17,11 +17,7 @@
 
 package org.apache.spark.shuffle
 
-import java.io.{DataInputStream, File, FileInputStream}
-import java.util.zip.CheckedInputStream
-
-import org.apache.spark.network.shuffle.checksum.ShuffleChecksumHelper
-import org.apache.spark.network.util.LimitedInputStream
+import java.io.File
 
 trait ShuffleChecksumTestHelper {
 
@@ -38,46 +34,7 @@ trait ShuffleChecksumTestHelper {
 assert(data.exists(), "Data file doesn't exist")
 assert(index.exists(), "Index file doesn't exist")
 
-var checksumIn: DataInputStream = null
-val expectChecksums = Array.ofDim[Long](numPartition)
-try {
-  checksumIn = new DataInputStream(new FileInputStream(checksum))
-  (0 until numPartition).foreach(i => expectChecksums(i) = 
checksumIn.readLong())
-} finally {
-  if (checksumIn != null) {
-checksumIn.close()
-  }
-}
-
-var dataIn: FileInputStream = null
-var indexIn: 

[spark] branch master updated: [SPARK-44989][INFRA] Add a directional message to promote JIRA_ACCESS_TOKEN

2023-08-28 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 8e779d1af75 [SPARK-44989][INFRA] Add a directional message to promote 
JIRA_ACCESS_TOKEN
8e779d1af75 is described below

commit 8e779d1af75ce553a6ef2fe99c6a0c45954f377d
Author: Dongjoon Hyun 
AuthorDate: Mon Aug 28 11:48:10 2023 -0700

[SPARK-44989][INFRA] Add a directional message to promote JIRA_ACCESS_TOKEN

### What changes were proposed in this pull request?

This PR aims to add a directional message to promote `JIRA_ACCESS_TOKEN` 
when `JIRA_USERNAME` and `JIRA_PASSWORD` are used.

Also, this PR set the minimum JIRA library version to make it sure that 
`token_auths` features exist in the installed `jira` library.
```
-jira
+jira>=3.5.2
```

### Why are the changes needed?

Since SPARK-44802, `Token` feature seems to be stable and provides much 
secure environments to Apache Spark committers by hiding not only 
`JIRA_PASSWORD`, but also the static `JIRA_USERNAME`.
```
SPARK-44802 Token based ASF JIRA authentication
SPARK-44802 Fix to consider JIRA_ACCESS_TOKEN in precheck conditions
SPARK-44972 Eagerly check if the token is valid to align with the behavior 
of username/password auth
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manual review.

### Was this patch authored or co-authored using generative AI tooling?

No,

Closes #42704 from dongjoon-hyun/SPARK-44989.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 dev/merge_spark_pr.py | 3 +++
 dev/requirements.txt  | 2 +-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/dev/merge_spark_pr.py b/dev/merge_spark_pr.py
index 2e4b7d3a6fa..fa66d2b2021 100755
--- a/dev/merge_spark_pr.py
+++ b/dev/merge_spark_pr.py
@@ -538,6 +538,9 @@ def initialize_jira():
 else:
 raise e
 elif JIRA_USERNAME and JIRA_PASSWORD:
+print("You can use JIRA_ACCESS_TOKEN instead of 
JIRA_USERNAME/JIRA_PASSWORD.")
+print("Visit https://issues.apache.org/jira/secure/ViewProfile.jspa ")
+print("and click 'Personal Access Tokens' menu to manage your own 
tokens.")
 asf_jira = jira.client.JIRA(jira_server, basic_auth=(JIRA_USERNAME, 
JIRA_PASSWORD))
 else:
 print("Neither JIRA_ACCESS_TOKEN nor JIRA_USERNAME/JIRA_PASSWORD are 
set.")
diff --git a/dev/requirements.txt b/dev/requirements.txt
index 7011b85de47..51fcb719e99 100644
--- a/dev/requirements.txt
+++ b/dev/requirements.txt
@@ -42,7 +42,7 @@ docutils<0.18.0
 markupsafe==2.0.1
 
 # Development scripts
-jira
+jira>=3.5.2
 PyGithub
 
 # pandas API on Spark Code formatter.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-44868][SQL][FOLLOWUP] Invoke the `to_varchar` function in Scala API

2023-08-28 Thread maxgekk
This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 49da438ece8 [SPARK-44868][SQL][FOLLOWUP] Invoke the `to_varchar` 
function in Scala API
49da438ece8 is described below

commit 49da438ece84391db22f9c56e747d555d9b01969
Author: Max Gekk 
AuthorDate: Mon Aug 28 20:57:27 2023 +0300

[SPARK-44868][SQL][FOLLOWUP] Invoke the `to_varchar` function in Scala API

### What changes were proposed in this pull request?
In the PR, I propose to invoke the `to_varchar` function instead of 
`to_char` in `to_varchar` of Scala/Java API.

### Why are the changes needed?
1. To show correct function name in error messages and in `explain`.
2. To be consistent to other API: PySpark and the previous Spark SQL 
version 3.5.0.

### Does this PR introduce _any_ user-facing change?
Yes.

### How was this patch tested?
By running the modified test:
```
$ build/sbt "test:testOnly *.StringFunctionsSuite"
```

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #42703 from MaxGekk/fix-to_varchar-call.

Authored-by: Max Gekk 
Signed-off-by: Max Gekk 
---
 sql/core/src/main/scala/org/apache/spark/sql/functions.scala  | 2 +-
 .../src/test/scala/org/apache/spark/sql/StringFunctionsSuite.scala| 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/sql/core/src/main/scala/org/apache/spark/sql/functions.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/functions.scala
index f6699b66af9..6b474c84cdb 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/functions.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/functions.scala
@@ -4431,7 +4431,7 @@ object functions {
* @group string_funcs
* @since 3.5.0
*/
-  def to_varchar(e: Column, format: Column): Column = to_char(e, format)
+  def to_varchar(e: Column, format: Column): Column = 
call_function("to_varchar", e, format)
 
   /**
* Convert string 'e' to a number based on the string format 'format'.
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/StringFunctionsSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/StringFunctionsSuite.scala
index 12881f4a22a..03b9053c71a 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/StringFunctionsSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/StringFunctionsSuite.scala
@@ -878,7 +878,7 @@ class StringFunctionsSuite extends QueryTest with 
SharedSparkSession {
 errorClass = "_LEGACY_ERROR_TEMP_1100",
 parameters = Map(
   "argName" -> "format",
-  "funcName" -> "to_char",
+  "funcName" -> funcName,
   "requiredType" -> "string"))
   checkError(
 exception = intercept[AnalysisException] {
@@ -887,7 +887,7 @@ class StringFunctionsSuite extends QueryTest with 
SharedSparkSession {
 errorClass = "INVALID_PARAMETER_VALUE.BINARY_FORMAT",
 parameters = Map(
   "parameter" -> "`format`",
-  "functionName" -> "`to_char`",
+  "functionName" -> s"`$funcName`",
   "invalidFormat" -> "'invalid_format'"))
   checkError(
 exception = intercept[AnalysisException] {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.5 updated: [SPARK-44832][CONNECT] Make transitive dependencies work properly for Scala Client

2023-08-28 Thread hvanhovell
This is an automated email from the ASF dual-hosted git repository.

hvanhovell pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new fd07239505c [SPARK-44832][CONNECT] Make transitive dependencies work 
properly for Scala Client
fd07239505c is described below

commit fd07239505cff8baafa4f1684034278d41234de7
Author: Herman van Hovell 
AuthorDate: Mon Aug 28 19:53:40 2023 +0200

[SPARK-44832][CONNECT] Make transitive dependencies work properly for Scala 
Client

### What changes were proposed in this pull request?
This PR cleans up the Maven build for the Spark Connect Client and Spark 
Connect Common. The most important change is that we move `sql-api` from a 
`provided` to `compile` dependency. The net effect of this is that when a user 
takes a dependency on the client, all of its required (transitive) dependencies 
are automatically added.

Please note that this does not address concerns around creating an überjar 
and shading. That is for a different day :)

### Why are the changes needed?
When you take a dependency on the connect scala client you need to manually 
add the `sql-api` module as a dependency. This is rather poor UX.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Manually running maven, checking dependency tree, ...

Closes #42518 from hvanhovell/SPARK-44832.

Authored-by: Herman van Hovell 
Signed-off-by: Herman van Hovell 
(cherry picked from commit 50d9a56f824ae51d10543f4573753ff60dc9053b)
Signed-off-by: Herman van Hovell 
---
 connector/connect/client/jvm/pom.xml   | 48 --
 .../CheckConnectJvmClientCompatibility.scala   | 33 ---
 connector/connect/common/pom.xml   |  6 ---
 dev/connect-jvm-client-mima-check  |  2 +-
 4 files changed, 28 insertions(+), 61 deletions(-)

diff --git a/connector/connect/client/jvm/pom.xml 
b/connector/connect/client/jvm/pom.xml
index a7e5c5c2bab..67227ef38eb 100644
--- a/connector/connect/client/jvm/pom.xml
+++ b/connector/connect/client/jvm/pom.xml
@@ -39,55 +39,21 @@
   org.apache.spark
   spark-connect-common_${scala.binary.version}
   ${project.version}
-  
-
-  com.google.guava
-  guava
-
-  
 
 
   org.apache.spark
   spark-sql-api_${scala.binary.version}
   ${project.version}
-  provided
 
 
   org.apache.spark
   spark-sketch_${scala.binary.version}
   ${project.version}
 
-
-  com.google.protobuf
-  protobuf-java
-  compile
-
 
   com.google.guava
   guava
   ${connect.guava.version}
-  compile
-
-
-  com.google.guava
-  failureaccess
-  ${guava.failureaccess.version}
-  compile
-
-
-  io.netty
-  netty-codec-http2
-  ${netty.version}
-
-
-  io.netty
-  netty-handler-proxy
-  ${netty.version}
-
-
-  io.netty
-  netty-transport-native-unix-common
-  ${netty.version}
 
 
   com.lihaoyi
@@ -95,19 +61,6 @@
   ${ammonite.version}
   provided
 
-
-  org.apache.spark
-  spark-connect-common_${scala.binary.version}
-  ${project.version}
-  test-jar
-  test
-  
-
-  com.google.guava
-  guava
-
-  
-
 
   org.scalacheck
   scalacheck_${scala.binary.version}
@@ -148,7 +101,6 @@
   org.codehaus.mojo:*
   org.checkerframework:*
   
org.apache.spark:spark-connect-common_${scala.binary.version}
-  
org.apache.spark:spark-common-utils_${scala.binary.version}
 
   
   
diff --git 
a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/CheckConnectJvmClientCompatibility.scala
 
b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/CheckConnectJvmClientCompatibility.scala
index 1f599f2346e..72b0f02f378 100644
--- 
a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/CheckConnectJvmClientCompatibility.scala
+++ 
b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/CheckConnectJvmClientCompatibility.scala
@@ -24,6 +24,7 @@ import java.util.regex.Pattern
 import com.typesafe.tools.mima.core._
 import com.typesafe.tools.mima.lib.MiMaLib
 
+import org.apache.spark.SparkBuildInfo.spark_version
 import org.apache.spark.sql.test.IntegrationTestUtils._
 
 /**
@@ -46,18 +47,38 @@ object CheckConnectJvmClientCompatibility {
 sys.env("SPARK_HOME")
   }
 
+  private val sqlJar = {
+val path = Paths.get(
+  sparkHome,
+  "sql",
+  "core",
+  "target",
+  "scala-" + scalaVersion,
+  "spark-sql_" + scalaVersion 

[spark] branch master updated: [SPARK-44832][CONNECT] Make transitive dependencies work properly for Scala Client

2023-08-28 Thread hvanhovell
This is an automated email from the ASF dual-hosted git repository.

hvanhovell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 50d9a56f824 [SPARK-44832][CONNECT] Make transitive dependencies work 
properly for Scala Client
50d9a56f824 is described below

commit 50d9a56f824ae51d10543f4573753ff60dc9053b
Author: Herman van Hovell 
AuthorDate: Mon Aug 28 19:53:40 2023 +0200

[SPARK-44832][CONNECT] Make transitive dependencies work properly for Scala 
Client

### What changes were proposed in this pull request?
This PR cleans up the Maven build for the Spark Connect Client and Spark 
Connect Common. The most important change is that we move `sql-api` from a 
`provided` to `compile` dependency. The net effect of this is that when a user 
takes a dependency on the client, all of its required (transitive) dependencies 
are automatically added.

Please note that this does not address concerns around creating an überjar 
and shading. That is for a different day :)

### Why are the changes needed?
When you take a dependency on the connect scala client you need to manually 
add the `sql-api` module as a dependency. This is rather poor UX.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Manually running maven, checking dependency tree, ...

Closes #42518 from hvanhovell/SPARK-44832.

Authored-by: Herman van Hovell 
Signed-off-by: Herman van Hovell 
---
 connector/connect/client/jvm/pom.xml   | 48 --
 .../CheckConnectJvmClientCompatibility.scala   | 33 ---
 connector/connect/common/pom.xml   |  6 ---
 dev/connect-jvm-client-mima-check  |  2 +-
 4 files changed, 28 insertions(+), 61 deletions(-)

diff --git a/connector/connect/client/jvm/pom.xml 
b/connector/connect/client/jvm/pom.xml
index d4e9b147e02..8cb6758ec9f 100644
--- a/connector/connect/client/jvm/pom.xml
+++ b/connector/connect/client/jvm/pom.xml
@@ -39,55 +39,21 @@
   org.apache.spark
   spark-connect-common_${scala.binary.version}
   ${project.version}
-  
-
-  com.google.guava
-  guava
-
-  
 
 
   org.apache.spark
   spark-sql-api_${scala.binary.version}
   ${project.version}
-  provided
 
 
   org.apache.spark
   spark-sketch_${scala.binary.version}
   ${project.version}
 
-
-  com.google.protobuf
-  protobuf-java
-  compile
-
 
   com.google.guava
   guava
   ${connect.guava.version}
-  compile
-
-
-  com.google.guava
-  failureaccess
-  ${guava.failureaccess.version}
-  compile
-
-
-  io.netty
-  netty-codec-http2
-  ${netty.version}
-
-
-  io.netty
-  netty-handler-proxy
-  ${netty.version}
-
-
-  io.netty
-  netty-transport-native-unix-common
-  ${netty.version}
 
 
   com.lihaoyi
@@ -95,19 +61,6 @@
   ${ammonite.version}
   provided
 
-
-  org.apache.spark
-  spark-connect-common_${scala.binary.version}
-  ${project.version}
-  test-jar
-  test
-  
-
-  com.google.guava
-  guava
-
-  
-
 
   org.scalacheck
   scalacheck_${scala.binary.version}
@@ -148,7 +101,6 @@
   org.codehaus.mojo:*
   org.checkerframework:*
   
org.apache.spark:spark-connect-common_${scala.binary.version}
-  
org.apache.spark:spark-common-utils_${scala.binary.version}
 
   
   
diff --git 
a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/CheckConnectJvmClientCompatibility.scala
 
b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/CheckConnectJvmClientCompatibility.scala
index 1100babde79..1e536cd37fe 100644
--- 
a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/CheckConnectJvmClientCompatibility.scala
+++ 
b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/CheckConnectJvmClientCompatibility.scala
@@ -24,6 +24,7 @@ import java.util.regex.Pattern
 import com.typesafe.tools.mima.core._
 import com.typesafe.tools.mima.lib.MiMaLib
 
+import org.apache.spark.SparkBuildInfo.spark_version
 import org.apache.spark.sql.test.IntegrationTestUtils._
 
 /**
@@ -46,18 +47,38 @@ object CheckConnectJvmClientCompatibility {
 sys.env("SPARK_HOME")
   }
 
+  private val sqlJar = {
+val path = Paths.get(
+  sparkHome,
+  "sql",
+  "core",
+  "target",
+  "scala-" + scalaVersion,
+  "spark-sql_" + scalaVersion + "-" + spark_version + ".jar")
+assert(Files.exists(path), s"$path does not exist")
+path.toFile
+  }
+
+  

[spark] branch master updated (8523ee5d90f -> c8a20925137)

2023-08-28 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 8523ee5d90f [SPARK-44972][INFRA] Eagerly check if the token is valid 
to align with the behavior of username/password auth
 add c8a20925137 [SPARK-44985][CORE] Use toString instead of stacktrace for 
task reaper threadDump

No new revisions were added by this update.

Summary of changes:
 core/src/main/scala/org/apache/spark/executor/Executor.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-44972][INFRA] Eagerly check if the token is valid to align with the behavior of username/password auth

2023-08-28 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 8523ee5d90f [SPARK-44972][INFRA] Eagerly check if the token is valid 
to align with the behavior of username/password auth
8523ee5d90f is described below

commit 8523ee5d90f854cdeb96c70cb16db7cd32f5429e
Author: Kent Yao 
AuthorDate: Mon Aug 28 09:05:09 2023 -0700

[SPARK-44972][INFRA] Eagerly check if the token is valid to align with the 
behavior of username/password auth

### What changes were proposed in this pull request?

The SPARK-44802 now allows for token authentication when resolving Jira 
issues in pull request merging. However, the token auth is kinda lazy during 
the initial handshake, maintainers might get confused someday.

This pull request promptly calls the current_user() function to initiate 
authentication and provides clear instructions for token expiration.

### Why are the changes needed?

make it easy for maintainers to update their expired Jira tokens.

### Does this PR introduce _any_ user-facing change?

no, for maintainers

### How was this patch tested?

locally verified the code snippet

### Was this patch authored or co-authored using generative AI tooling?

no

Closes #42625 from yaooqinn/SPARK-44802-FF.

Authored-by: Kent Yao 
Signed-off-by: Dongjoon Hyun 
---
 dev/merge_spark_pr.py | 82 ++-
 1 file changed, 48 insertions(+), 34 deletions(-)

diff --git a/dev/merge_spark_pr.py b/dev/merge_spark_pr.py
index 01851b185dd..2e4b7d3a6fa 100755
--- a/dev/merge_spark_pr.py
+++ b/dev/merge_spark_pr.py
@@ -243,13 +243,6 @@ def cherry_pick(pr_num, merge_hash, default_branch):
 
 
 def resolve_jira_issue(merge_branches, comment, default_jira_id=""):
-jira_server = {"server": JIRA_API_BASE}
-
-if JIRA_ACCESS_TOKEN is not None:
-asf_jira = jira.client.JIRA(jira_server, token_auth=JIRA_ACCESS_TOKEN)
-else:
-asf_jira = jira.client.JIRA(jira_server, basic_auth=(JIRA_USERNAME, 
JIRA_PASSWORD))
-
 jira_id = input("Enter a JIRA id [%s]: " % default_jira_id)
 if jira_id == "":
 jira_id = default_jira_id
@@ -263,7 +256,7 @@ def resolve_jira_issue(merge_branches, comment, 
default_jira_id=""):
 cur_summary = issue.fields.summary
 cur_assignee = issue.fields.assignee
 if cur_assignee is None:
-cur_assignee = choose_jira_assignee(issue, asf_jira)
+cur_assignee = choose_jira_assignee(issue)
 # Check again, we might not have chosen an assignee
 if cur_assignee is None:
 cur_assignee = "NOT ASSIGNED!!!"
@@ -362,7 +355,7 @@ def resolve_jira_issue(merge_branches, comment, 
default_jira_id=""):
 print("Successfully resolved %s with fixVersions=%s!" % (jira_id, 
fix_versions))
 
 
-def choose_jira_assignee(issue, asf_jira):
+def choose_jira_assignee(issue):
 """
 Prompt the user to choose who to assign the issue to in jira, given a list 
of candidates,
 including the original reporter and all commentators
@@ -395,7 +388,7 @@ def choose_jira_assignee(issue, asf_jira):
 # assume it's a user id, and try to assign (might fail, we 
just prompt again)
 assignee = asf_jira.user(raw_assignee)
 try:
-assign_issue(asf_jira, issue.key, assignee.name)
+assign_issue(issue.key, assignee.name)
 except Exception as e:
 if (
 e.__class__.__name__ == "JIRAError"
@@ -406,8 +399,8 @@ def choose_jira_assignee(issue, asf_jira):
 "User '%s' cannot be assigned, add to contributors 
role and try again?"
 % assignee.name
 )
-grant_contributor_role(assignee.name, asf_jira)
-assign_issue(asf_jira, issue.key, assignee.name)
+grant_contributor_role(assignee.name)
+assign_issue(issue.key, assignee.name)
 else:
 raise e
 return assignee
@@ -418,22 +411,22 @@ def choose_jira_assignee(issue, asf_jira):
 print("Error assigning JIRA, try again (or leave blank and fix 
manually)")
 
 
-def grant_contributor_role(user: str, asf_jira):
+def grant_contributor_role(user: str):
 role = asf_jira.project_role("SPARK", 10010)
 role.add_user(user)
 print("Successfully added user '%s' to contributors role" % user)
 
 
-def assign_issue(client, issue: int, assignee: str) -> bool:
+def assign_issue(issue: int, assignee: str) -> bool:
 """
 Assign an issue to a user, which is a shorthand for 
jira.client.JIRA.assign_issue.
 The 

[spark] branch branch-3.5 updated: [SPARK-44867][CONNECT][DOCS] Refactor Spark Connect Docs to incorporate Scala setup

2023-08-28 Thread hvanhovell
This is an automated email from the ASF dual-hosted git repository.

hvanhovell pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 12964c26a45 [SPARK-44867][CONNECT][DOCS] Refactor Spark Connect Docs 
to incorporate Scala setup
12964c26a45 is described below

commit 12964c26a4511bc21005885e21ef572a69dde7c2
Author: vicennial 
AuthorDate: Mon Aug 28 16:38:25 2023 +0200

[SPARK-44867][CONNECT][DOCS] Refactor Spark Connect Docs to incorporate 
Scala setup

### What changes were proposed in this pull request?

This PR refactors the Spark Connect overview docs to include an Interactive 
(shell/REPL) section and a Standalone application section as well as 
incorporates new Scala documentation into each of these sections.

### Why are the changes needed?

Currently, there isn't much Scala-relevant documentation available to set 
up the Scala shell/project/application.

### Does this PR introduce _any_ user-facing change?

Yes, the documentation for the Spark Connect 
[overview](https://spark.apache.org/docs/latest/spark-connect-overview.html) 
page is updated.

### How was this patch tested?

Manually generating the docs locally.

Closes #42556 from vicennial/sparkConnectDocs.

Authored-by: vicennial 
Signed-off-by: Herman van Hovell 
(cherry picked from commit d95e8f3c65e5ae0bf39c0ccc477b7b0910513066)
Signed-off-by: Herman van Hovell 
---
 docs/spark-connect-overview.md | 204 ++---
 1 file changed, 170 insertions(+), 34 deletions(-)

diff --git a/docs/spark-connect-overview.md b/docs/spark-connect-overview.md
index 1e1464cfba0..0673763f03b 100644
--- a/docs/spark-connect-overview.md
+++ b/docs/spark-connect-overview.md
@@ -113,14 +113,15 @@ Now Spark server is running and ready to accept Spark 
Connect sessions from clie
 applications. In the next section we will walk through how to use Spark Connect
 when writing client applications.
 
-## Use Spark Connect in client applications
+## Use Spark Connect for interactive analysis
+
 
+
 When creating a Spark session, you can specify that you want to use Spark 
Connect
 and there are a few ways to do that outlined as follows.
 
 If you do not use one of the mechanisms outlined here, your Spark session will
-work just like before, without leveraging Spark Connect, and your application 
code
-will run on the Spark driver node.
+work just like before, without leveraging Spark Connect.
 
 ### Set SPARK_REMOTE environment variable
 
@@ -138,9 +139,6 @@ export SPARK_REMOTE="sc://localhost"
 
 And start the Spark shell as usual:
 
-
-
-
 {% highlight bash %}
 ./bin/pyspark
 {% endhighlight %}
@@ -150,25 +148,6 @@ The PySpark shell is now connected to Spark using Spark 
Connect as indicated in
 {% highlight python %}
 Client connected to the Spark Connect server at localhost
 {% endhighlight %}
-
-
-
-
-And if you write your own program, create a Spark session as shown in this 
example:
-
-
-
-
-{% highlight python %}
-from pyspark.sql import SparkSession
-spark = SparkSession.builder.getOrCreate()
-{% endhighlight %}
-
-
-
-
-This will create a Spark Connect session from your application by reading the
-`SPARK_REMOTE` environment variable we set previously.
 
 ### Specify Spark Connect when creating Spark session
 
@@ -178,9 +157,6 @@ create a Spark session.
 For example, you can launch the PySpark shell with Spark Connect as
 illustrated here.
 
-
-
-
 To launch the PySpark shell with Spark Connect, simply include the `remote`
 parameter and specify the location of your Spark server. We are using 
`localhost`
 in this example to connect to the local Spark server we started previously:
@@ -219,29 +195,175 @@ Now you can run PySpark code in the shell to see Spark 
Connect in action:
 |  2|Maria|
 +---+-+
 {% endhighlight %}
+
 
 
+
+For the Scala shell, we use an Ammonite-based REPL that is currently not 
included in the Apache Spark package.
+
+To set up the new Scala shell, first download and install [Coursier 
CLI](https://get-coursier.io/docs/cli-installation).
+Then, install the REPL using the following command in a terminal window:
+{% highlight bash %}
+cs install –-contrib spark-connect-repl
+{% endhighlight %}
+
+And now you can start the Ammonite-based Scala REPL/shell to connect to your 
Spark server like this:
+
+{% highlight bash %}
+spark-connect-repl
+{% endhighlight %}
+
+A greeting message will appear when the REPL successfully initializes:
+{% highlight bash %}
+Spark session available as 'spark'.
+   _  __  ____
+  / ___/   __/ /__   / /___      ___  _/ /_
+  \__ \/ __ \/ __ `/ ___/ //_/  / /   / __ \/ __ \/ __ \/ _ \/ ___/ __/
+ ___/ / /_/ / /_/ / /  / ,

[spark] branch master updated: [SPARK-44867][CONNECT][DOCS] Refactor Spark Connect Docs to incorporate Scala setup

2023-08-28 Thread hvanhovell
This is an automated email from the ASF dual-hosted git repository.

hvanhovell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new d95e8f3c65e [SPARK-44867][CONNECT][DOCS] Refactor Spark Connect Docs 
to incorporate Scala setup
d95e8f3c65e is described below

commit d95e8f3c65e5ae0bf39c0ccc477b7b0910513066
Author: vicennial 
AuthorDate: Mon Aug 28 16:38:25 2023 +0200

[SPARK-44867][CONNECT][DOCS] Refactor Spark Connect Docs to incorporate 
Scala setup

### What changes were proposed in this pull request?

This PR refactors the Spark Connect overview docs to include an Interactive 
(shell/REPL) section and a Standalone application section as well as 
incorporates new Scala documentation into each of these sections.

### Why are the changes needed?

Currently, there isn't much Scala-relevant documentation available to set 
up the Scala shell/project/application.

### Does this PR introduce _any_ user-facing change?

Yes, the documentation for the Spark Connect 
[overview](https://spark.apache.org/docs/latest/spark-connect-overview.html) 
page is updated.

### How was this patch tested?

Manually generating the docs locally.

Closes #42556 from vicennial/sparkConnectDocs.

Authored-by: vicennial 
Signed-off-by: Herman van Hovell 
---
 docs/spark-connect-overview.md | 204 ++---
 1 file changed, 170 insertions(+), 34 deletions(-)

diff --git a/docs/spark-connect-overview.md b/docs/spark-connect-overview.md
index 1e1464cfba0..0673763f03b 100644
--- a/docs/spark-connect-overview.md
+++ b/docs/spark-connect-overview.md
@@ -113,14 +113,15 @@ Now Spark server is running and ready to accept Spark 
Connect sessions from clie
 applications. In the next section we will walk through how to use Spark Connect
 when writing client applications.
 
-## Use Spark Connect in client applications
+## Use Spark Connect for interactive analysis
+
 
+
 When creating a Spark session, you can specify that you want to use Spark 
Connect
 and there are a few ways to do that outlined as follows.
 
 If you do not use one of the mechanisms outlined here, your Spark session will
-work just like before, without leveraging Spark Connect, and your application 
code
-will run on the Spark driver node.
+work just like before, without leveraging Spark Connect.
 
 ### Set SPARK_REMOTE environment variable
 
@@ -138,9 +139,6 @@ export SPARK_REMOTE="sc://localhost"
 
 And start the Spark shell as usual:
 
-
-
-
 {% highlight bash %}
 ./bin/pyspark
 {% endhighlight %}
@@ -150,25 +148,6 @@ The PySpark shell is now connected to Spark using Spark 
Connect as indicated in
 {% highlight python %}
 Client connected to the Spark Connect server at localhost
 {% endhighlight %}
-
-
-
-
-And if you write your own program, create a Spark session as shown in this 
example:
-
-
-
-
-{% highlight python %}
-from pyspark.sql import SparkSession
-spark = SparkSession.builder.getOrCreate()
-{% endhighlight %}
-
-
-
-
-This will create a Spark Connect session from your application by reading the
-`SPARK_REMOTE` environment variable we set previously.
 
 ### Specify Spark Connect when creating Spark session
 
@@ -178,9 +157,6 @@ create a Spark session.
 For example, you can launch the PySpark shell with Spark Connect as
 illustrated here.
 
-
-
-
 To launch the PySpark shell with Spark Connect, simply include the `remote`
 parameter and specify the location of your Spark server. We are using 
`localhost`
 in this example to connect to the local Spark server we started previously:
@@ -219,29 +195,175 @@ Now you can run PySpark code in the shell to see Spark 
Connect in action:
 |  2|Maria|
 +---+-+
 {% endhighlight %}
+
 
 
+
+For the Scala shell, we use an Ammonite-based REPL that is currently not 
included in the Apache Spark package.
+
+To set up the new Scala shell, first download and install [Coursier 
CLI](https://get-coursier.io/docs/cli-installation).
+Then, install the REPL using the following command in a terminal window:
+{% highlight bash %}
+cs install –-contrib spark-connect-repl
+{% endhighlight %}
+
+And now you can start the Ammonite-based Scala REPL/shell to connect to your 
Spark server like this:
+
+{% highlight bash %}
+spark-connect-repl
+{% endhighlight %}
+
+A greeting message will appear when the REPL successfully initializes:
+{% highlight bash %}
+Spark session available as 'spark'.
+   _  __  ____
+  / ___/   __/ /__   / /___      ___  _/ /_
+  \__ \/ __ \/ __ `/ ___/ //_/  / /   / __ \/ __ \/ __ \/ _ \/ ___/ __/
+ ___/ / /_/ / /_/ / /  / ,

[spark] branch master updated: [SPARK-44983][SQL] Convert binary to string by `to_char` for the formats: `hex`, `base64`, `utf-8`

2023-08-28 Thread maxgekk
This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 4946d025b62 [SPARK-44983][SQL] Convert binary to string by `to_char` 
for the formats: `hex`, `base64`, `utf-8`
4946d025b62 is described below

commit 4946d025b6200ad90dfdfbb1f24526016f810523
Author: Max Gekk 
AuthorDate: Mon Aug 28 16:55:35 2023 +0300

[SPARK-44983][SQL] Convert binary to string by `to_char` for the formats: 
`hex`, `base64`, `utf-8`

### What changes were proposed in this pull request?
In the PR, I propose to re-use the `Hex`, `Base64` and `Decode` expressions 
in the `ToCharacter` (the `to_char`/`to_varchar` functions) when the `format` 
parameter is one of `hex`, `base64` and `utf-8`.

### Why are the changes needed?
To make the migration to Spark SQL easier from the systems like:
- Snowflake: https://docs.snowflake.com/en/sql-reference/functions/to_char
- SAP SQL Anywhere: 
https://help.sap.com/docs/SAP_SQL_Anywhere/93079d4ba8e44920ae63ffb4def91f5b/81fe51196ce21014b9c6cf43b298.html
- Oracle: 
https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/TO_CHAR-number.html#GUID-00DA076D-2468-41AB-A3AC-CC78DBA0D9CB
- Vertica: 
https://www.vertica.com/docs/9.3.x/HTML/Content/Authoring/SQLReferenceManual/Functions/Formatting/TO_CHAR.htm

### Does this PR introduce _any_ user-facing change?
No. This PR extends existing API. It might be considered as an user-facing 
change only if user's code depends on errors in the case of wrong formats.

### How was this patch tested?
By running new examples:
```
$ build/sbt "sql/test:testOnly 
org.apache.spark.sql.expressions.ExpressionInfoSuite"
```
and new tests:
```
$ build/sbt "test:testOnly *.StringFunctionsSuite"
```

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #42632 from MaxGekk/to_char-binary-2.

Authored-by: Max Gekk 
Signed-off-by: Max Gekk 
---
 .../src/main/resources/error/error-classes.json|  5 ++
 ...nditions-invalid-parameter-value-error-class.md |  4 ++
 .../expressions/numberFormatExpressions.scala  | 28 +++--
 .../spark/sql/errors/QueryCompilationErrors.scala  |  9 +++
 .../apache/spark/sql/StringFunctionsSuite.scala| 69 +++---
 5 files changed, 89 insertions(+), 26 deletions(-)

diff --git a/common/utils/src/main/resources/error/error-classes.json 
b/common/utils/src/main/resources/error/error-classes.json
index 632c449b992..53c596c00fc 100644
--- a/common/utils/src/main/resources/error/error-classes.json
+++ b/common/utils/src/main/resources/error/error-classes.json
@@ -1788,6 +1788,11 @@
   "expects a binary value with 16, 24 or 32 bytes, but got 
 bytes."
 ]
   },
+  "BINARY_FORMAT" : {
+"message" : [
+  "expects one of binary formats 'base64', 'hex', 'utf-8', but got 
."
+]
+  },
   "DATETIME_UNIT" : {
 "message" : [
   "expects one of the units without quotes YEAR, QUARTER, MONTH, WEEK, 
DAY, DAYOFYEAR, HOUR, MINUTE, SECOND, MILLISECOND, MICROSECOND, but got the 
string literal ."
diff --git a/docs/sql-error-conditions-invalid-parameter-value-error-class.md 
b/docs/sql-error-conditions-invalid-parameter-value-error-class.md
index 370e6da3362..96829e564aa 100644
--- a/docs/sql-error-conditions-invalid-parameter-value-error-class.md
+++ b/docs/sql-error-conditions-invalid-parameter-value-error-class.md
@@ -37,6 +37,10 @@ supports 16-byte CBC IVs and 12-byte GCM IVs, but got 
`` bytes for
 
 expects a binary value with 16, 24 or 32 bytes, but got `` bytes.
 
+## BINARY_FORMAT
+
+expects one of binary formats 'base64', 'hex', 'utf-8', but got 
``.
+
 ## DATETIME_UNIT
 
 expects one of the units without quotes YEAR, QUARTER, MONTH, WEEK, DAY, 
DAYOFYEAR, HOUR, MINUTE, SECOND, MILLISECOND, MICROSECOND, but got the string 
literal ``.
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/numberFormatExpressions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/numberFormatExpressions.scala
index 3a424ac21c5..7875ed8fe20 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/numberFormatExpressions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/numberFormatExpressions.scala
@@ -26,7 +26,7 @@ import 
org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, CodeGe
 import org.apache.spark.sql.catalyst.expressions.codegen.Block.BlockHelper
 import org.apache.spark.sql.catalyst.util.ToNumberParser
 import org.apache.spark.sql.errors.QueryCompilationErrors
-import org.apache.spark.sql.types.{AbstractDataType, DataType, DatetimeType, 
Decimal, DecimalType, StringType}
+import 

[spark] branch branch-3.5 updated: [SPARK-44974][CONNECT] Null out SparkSession/Dataset/KeyValueGroupedDatset on serialization

2023-08-28 Thread hvanhovell
This is an automated email from the ASF dual-hosted git repository.

hvanhovell pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new c230a5011a6 [SPARK-44974][CONNECT] Null out 
SparkSession/Dataset/KeyValueGroupedDatset on serialization
c230a5011a6 is described below

commit c230a5011a6d45c0f393833995b052930f11c324
Author: Herman van Hovell 
AuthorDate: Mon Aug 28 15:05:18 2023 +0200

[SPARK-44974][CONNECT] Null out SparkSession/Dataset/KeyValueGroupedDatset 
on serialization

### What changes were proposed in this pull request?
This PR changes the serialization for connect `SparkSession`, `Dataset`, 
and `KeyValueGroupedDataset`. While these were marked as serializable they were 
not, because they refer to bits and pieces that are not serializable. Even if 
we were to fix this, then we still have a class clash problem with server side 
classes that have the same name, but have different structure. the latter can 
be fixed with serialization proxies, but I am going to hold that until someone 
actually needs/wants this.

After this PR these classes are serialized as null. This is a somewhat 
suboptimal solution compared to throwing exceptions on serialization, however 
this is more compatible compared to the old situation, and makes accidental 
capture of these classes less of an issue for UDFs.

### Why are the changes needed?
More compatible with the old situation. Improved UX when working with UDFs.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Added tests to `ClientDatasetSuite`, `KeyValueGroupedDatasetE2ETestSuite`, 
`SparkSessionSuite`, and `UserDefinedFunctionE2ETestSuite`.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #42688 from hvanhovell/SPARK-44974.

Authored-by: Herman van Hovell 
Signed-off-by: Herman van Hovell 
(cherry picked from commit f0b04286022e0774d78b9adcf4aeabc181a3ec89)
Signed-off-by: Herman van Hovell 
---
 .../jvm/src/main/scala/org/apache/spark/sql/Dataset.scala |  6 ++
 .../org/apache/spark/sql/KeyValueGroupedDataset.scala |  6 ++
 .../main/scala/org/apache/spark/sql/SparkSession.scala|  6 ++
 .../scala/org/apache/spark/sql/ClientDatasetSuite.scala   |  8 
 .../spark/sql/KeyValueGroupedDatasetE2ETestSuite.scala|  7 +++
 .../scala/org/apache/spark/sql/SparkSessionSuite.scala|  7 +++
 .../spark/sql/UserDefinedFunctionE2ETestSuite.scala   | 15 +++
 7 files changed, 55 insertions(+)

diff --git 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala
 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala
index cb7d2c84df5..bdaa4e28ba8 100644
--- 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala
+++ 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala
@@ -3336,4 +3336,10 @@ class Dataset[T] private[sql] (
   result.close()
 }
   }
+
+  /**
+   * We cannot deserialize a connect [[Dataset]] because of a class clash on 
the server side. We
+   * null out the instance for now.
+   */
+  private def writeReplace(): Any = null
 }
diff --git 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/KeyValueGroupedDataset.scala
 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/KeyValueGroupedDataset.scala
index 202891c66d7..88c8b6a4f8b 100644
--- 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/KeyValueGroupedDataset.scala
+++ 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/KeyValueGroupedDataset.scala
@@ -979,6 +979,12 @@ private class KeyValueGroupedDatasetImpl[K, V, IK, IV](
   outputEncoder = outputEncoder)
 udf.apply(inputEncoders.map(_ => col("*")): 
_*).expr.getCommonInlineUserDefinedFunction
   }
+
+  /**
+   * We cannot deserialize a connect [[KeyValueGroupedDataset]] because of a 
class clash on the
+   * server side. We null out the instance for now.
+   */
+  private def writeReplace(): Any = null
 }
 
 private object KeyValueGroupedDatasetImpl {
diff --git 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala
 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala
index e902e04e246..7882ea64013 100644
--- 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala
+++ 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala
@@ -714,6 +714,12 @@ class SparkSession private[sql] (
   def clearTags(): Unit = {
 client.clearTags()
   }
+
+  /**
+   * We cannot deserialize a connect [[SparkSession]] because of a class clash 
on the server side.
+   * We null out the instance for now.
+   */
+  

[spark] branch master updated: [SPARK-44974][CONNECT] Null out SparkSession/Dataset/KeyValueGroupedDatset on serialization

2023-08-28 Thread hvanhovell
This is an automated email from the ASF dual-hosted git repository.

hvanhovell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new f0b04286022 [SPARK-44974][CONNECT] Null out 
SparkSession/Dataset/KeyValueGroupedDatset on serialization
f0b04286022 is described below

commit f0b04286022e0774d78b9adcf4aeabc181a3ec89
Author: Herman van Hovell 
AuthorDate: Mon Aug 28 15:05:18 2023 +0200

[SPARK-44974][CONNECT] Null out SparkSession/Dataset/KeyValueGroupedDatset 
on serialization

### What changes were proposed in this pull request?
This PR changes the serialization for connect `SparkSession`, `Dataset`, 
and `KeyValueGroupedDataset`. While these were marked as serializable they were 
not, because they refer to bits and pieces that are not serializable. Even if 
we were to fix this, then we still have a class clash problem with server side 
classes that have the same name, but have different structure. the latter can 
be fixed with serialization proxies, but I am going to hold that until someone 
actually needs/wants this.

After this PR these classes are serialized as null. This is a somewhat 
suboptimal solution compared to throwing exceptions on serialization, however 
this is more compatible compared to the old situation, and makes accidental 
capture of these classes less of an issue for UDFs.

### Why are the changes needed?
More compatible with the old situation. Improved UX when working with UDFs.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Added tests to `ClientDatasetSuite`, `KeyValueGroupedDatasetE2ETestSuite`, 
`SparkSessionSuite`, and `UserDefinedFunctionE2ETestSuite`.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #42688 from hvanhovell/SPARK-44974.

Authored-by: Herman van Hovell 
Signed-off-by: Herman van Hovell 
---
 .../jvm/src/main/scala/org/apache/spark/sql/Dataset.scala |  6 ++
 .../org/apache/spark/sql/KeyValueGroupedDataset.scala |  6 ++
 .../main/scala/org/apache/spark/sql/SparkSession.scala|  6 ++
 .../scala/org/apache/spark/sql/ClientDatasetSuite.scala   |  8 
 .../spark/sql/KeyValueGroupedDatasetE2ETestSuite.scala|  7 +++
 .../scala/org/apache/spark/sql/SparkSessionSuite.scala|  7 +++
 .../spark/sql/UserDefinedFunctionE2ETestSuite.scala   | 15 +++
 7 files changed, 55 insertions(+)

diff --git 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala
 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala
index 3c89e649020..1d83f196b53 100644
--- 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala
+++ 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala
@@ -3352,4 +3352,10 @@ class Dataset[T] private[sql] (
   result.close()
 }
   }
+
+  /**
+   * We cannot deserialize a connect [[Dataset]] because of a class clash on 
the server side. We
+   * null out the instance for now.
+   */
+  private def writeReplace(): Any = null
 }
diff --git 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/KeyValueGroupedDataset.scala
 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/KeyValueGroupedDataset.scala
index 202891c66d7..88c8b6a4f8b 100644
--- 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/KeyValueGroupedDataset.scala
+++ 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/KeyValueGroupedDataset.scala
@@ -979,6 +979,12 @@ private class KeyValueGroupedDatasetImpl[K, V, IK, IV](
   outputEncoder = outputEncoder)
 udf.apply(inputEncoders.map(_ => col("*")): 
_*).expr.getCommonInlineUserDefinedFunction
   }
+
+  /**
+   * We cannot deserialize a connect [[KeyValueGroupedDataset]] because of a 
class clash on the
+   * server side. We null out the instance for now.
+   */
+  private def writeReplace(): Any = null
 }
 
 private object KeyValueGroupedDatasetImpl {
diff --git 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala
 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala
index e902e04e246..7882ea64013 100644
--- 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala
+++ 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala
@@ -714,6 +714,12 @@ class SparkSession private[sql] (
   def clearTags(): Unit = {
 client.clearTags()
   }
+
+  /**
+   * We cannot deserialize a connect [[SparkSession]] because of a class clash 
on the server side.
+   * We null out the instance for now.
+   */
+  private def writeReplace(): Any = null
 }
 
 // The minimal builder needed to create a spark session.
diff --git 

[spark] branch master updated: [SPARK-44984][PYTHON][CONNECT] Remove `_get_alias` from DataFrame

2023-08-28 Thread ruifengz
This is an automated email from the ASF dual-hosted git repository.

ruifengz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 474f64a8850 [SPARK-44984][PYTHON][CONNECT] Remove `_get_alias` from 
DataFrame
474f64a8850 is described below

commit 474f64a88502fe242654eb85c7cb5a1514c710e9
Author: Ruifeng Zheng 
AuthorDate: Mon Aug 28 19:44:32 2023 +0800

[SPARK-44984][PYTHON][CONNECT] Remove `_get_alias` from DataFrame

### What changes were proposed in this pull request?
Remove `_get_alias` from DataFrame

### Why are the changes needed?
`_get_alias` was added in the [initial 
PR](https://github.com/apache/spark/commit/6637bbe2b25ff2877b41a9677ce6d75e6996f968),
 but seems unneeded

- field `alias` in `plan.Project` is always `None`;
- `_get_alias` takes no parameter, but is used to replace a specify column 
name, the logic is weird when the column name varies;

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #42698 from zhengruifeng/py_connect_del_alias.

Authored-by: Ruifeng Zheng 
Signed-off-by: Ruifeng Zheng 
---
 python/pyspark/sql/connect/dataframe.py | 15 ++-
 python/pyspark/sql/connect/plan.py  |  1 -
 2 files changed, 2 insertions(+), 14 deletions(-)

diff --git a/python/pyspark/sql/connect/dataframe.py 
b/python/pyspark/sql/connect/dataframe.py
index 365cde59227..94c3ca95956 100644
--- a/python/pyspark/sql/connect/dataframe.py
+++ b/python/pyspark/sql/connect/dataframe.py
@@ -1573,14 +1573,6 @@ class DataFrame:
 
 sampleBy.__doc__ = PySparkDataFrame.sampleBy.__doc__
 
-def _get_alias(self) -> Optional[str]:
-p = self._plan
-while p is not None:
-if isinstance(p, plan.Project) and p.alias:
-return p.alias
-p = p._child
-return None
-
 def __getattr__(self, name: str) -> "Column":
 if self._plan is None:
 raise SparkConnectException("Cannot analyze on empty plan.")
@@ -1607,9 +1599,8 @@ class DataFrame:
 "'%s' object has no attribute '%s'" % 
(self.__class__.__name__, name)
 )
 
-alias = self._get_alias()
 return _to_col_with_plan_id(
-col=alias if alias is not None else name,
+col=name,
 plan_id=self._plan._plan_id,
 )
 
@@ -1625,8 +1616,6 @@ class DataFrame:
 
 def __getitem__(self, item: Union[int, str, Column, List, Tuple]) -> 
Union[Column, "DataFrame"]:
 if isinstance(item, str):
-# Check for alias
-alias = self._get_alias()
 if self._plan is None:
 raise SparkConnectException("Cannot analyze on empty plan.")
 
@@ -1635,7 +1624,7 @@ class DataFrame:
 self.select(item).isLocal()
 
 return _to_col_with_plan_id(
-col=alias if alias is not None else item,
+col=item,
 plan_id=self._plan._plan_id,
 )
 elif isinstance(item, Column):
diff --git a/python/pyspark/sql/connect/plan.py 
b/python/pyspark/sql/connect/plan.py
index 7952d2af999..5e9b4e53dbf 100644
--- a/python/pyspark/sql/connect/plan.py
+++ b/python/pyspark/sql/connect/plan.py
@@ -464,7 +464,6 @@ class Project(LogicalPlan):
 def __init__(self, child: Optional["LogicalPlan"], *columns: 
"ColumnOrName") -> None:
 super().__init__(child)
 self._columns = list(columns)
-self.alias: Optional[str] = None
 self._verify_expressions()
 
 def _verify_expressions(self) -> None:


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.5 updated: [SPARK-44982][CONNECT] Mark Spark Connect server configurations as static

2023-08-28 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new c831bd79fba [SPARK-44982][CONNECT] Mark Spark Connect server 
configurations as static
c831bd79fba is described below

commit c831bd79fba036d121a5d1f24cfc75be4006f4c9
Author: Hyukjin Kwon 
AuthorDate: Mon Aug 28 17:36:20 2023 +0900

[SPARK-44982][CONNECT] Mark Spark Connect server configurations as static

This PR proposes to mark all Spark Connect server configurations as static 
configurations.

They are already static configurations, and cannot be set in runtime 
configuration (by default), see also 
https://github.com/apache/spark/blob/4a4856207d414ba88a8edabeb70e20765460ef1a/sql/core/src/main/scala/org/apache/spark/sql/RuntimeConfig.scala#L164-L167

No, they are already static configurations.

Existing unittests.

No.

Closes #42695 from HyukjinKwon/SPARK-44982.

Authored-by: Hyukjin Kwon 
Signed-off-by: Hyukjin Kwon 
(cherry picked from commit 5b69dfd67e35f8be742a58cbd55f33088b4c7704)
Signed-off-by: Hyukjin Kwon 
---
 .../apache/spark/sql/connect/config/Connect.scala  | 37 +++---
 1 file changed, 18 insertions(+), 19 deletions(-)

diff --git 
a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/config/Connect.scala
 
b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/config/Connect.scala
index 054ccbe6707..7b8b05ce11a 100644
--- 
a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/config/Connect.scala
+++ 
b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/config/Connect.scala
@@ -18,7 +18,6 @@ package org.apache.spark.sql.connect.config
 
 import java.util.concurrent.TimeUnit
 
-import org.apache.spark.internal.config.ConfigBuilder
 import org.apache.spark.network.util.ByteUnit
 import org.apache.spark.sql.connect.common.config.ConnectCommon
 
@@ -26,13 +25,13 @@ object Connect {
   import org.apache.spark.sql.internal.SQLConf.buildStaticConf
 
   val CONNECT_GRPC_BINDING_PORT =
-ConfigBuilder("spark.connect.grpc.binding.port")
+buildStaticConf("spark.connect.grpc.binding.port")
   .version("3.4.0")
   .intConf
   .createWithDefault(ConnectCommon.CONNECT_GRPC_BINDING_PORT)
 
   val CONNECT_GRPC_INTERCEPTOR_CLASSES =
-ConfigBuilder("spark.connect.grpc.interceptor.classes")
+buildStaticConf("spark.connect.grpc.interceptor.classes")
   .doc(
 "Comma separated list of class names that must " +
   "implement the io.grpc.ServerInterceptor interface.")
@@ -41,7 +40,7 @@ object Connect {
   .createOptional
 
   val CONNECT_GRPC_ARROW_MAX_BATCH_SIZE =
-ConfigBuilder("spark.connect.grpc.arrow.maxBatchSize")
+buildStaticConf("spark.connect.grpc.arrow.maxBatchSize")
   .doc(
 "When using Apache Arrow, limit the maximum size of one arrow batch, 
in bytes unless " +
   "otherwise specified, that can be sent from server side to client 
side. Currently, we " +
@@ -51,7 +50,7 @@ object Connect {
   .createWithDefault(4 * 1024 * 1024)
 
   val CONNECT_GRPC_MAX_INBOUND_MESSAGE_SIZE =
-ConfigBuilder("spark.connect.grpc.maxInboundMessageSize")
+buildStaticConf("spark.connect.grpc.maxInboundMessageSize")
   .doc("Sets the maximum inbound message in bytes size for the gRPC 
requests." +
 "Requests with a larger payload will fail.")
   .version("3.4.0")
@@ -59,7 +58,7 @@ object Connect {
   .createWithDefault(ConnectCommon.CONNECT_GRPC_MAX_MESSAGE_SIZE)
 
   val CONNECT_GRPC_MARSHALLER_RECURSION_LIMIT =
-ConfigBuilder("spark.connect.grpc.marshallerRecursionLimit")
+buildStaticConf("spark.connect.grpc.marshallerRecursionLimit")
   .internal()
   .doc("""
   |Sets the recursion limit to grpc protobuf messages.
@@ -69,7 +68,7 @@ object Connect {
   .createWithDefault(1024)
 
   val CONNECT_EXECUTE_MANAGER_DETACHED_TIMEOUT =
-ConfigBuilder("spark.connect.execute.manager.detachedTimeout")
+buildStaticConf("spark.connect.execute.manager.detachedTimeout")
   .internal()
   .doc("Timeout after which executions without an attached RPC will be 
removed.")
   .version("3.5.0")
@@ -77,7 +76,7 @@ object Connect {
   .createWithDefaultString("5m")
 
   val CONNECT_EXECUTE_MANAGER_MAINTENANCE_INTERVAL =
-ConfigBuilder("spark.connect.execute.manager.maintenanceInterval")
+buildStaticConf("spark.connect.execute.manager.maintenanceInterval")
   .internal()
   .doc("Interval at which execution manager will search for abandoned 
executions to remove.")
   .version("3.5.0")
@@ -85,7 +84,7 @@ object Connect {
   .createWithDefaultString("30s")
 
   val CONNECT_EXECUTE_MANAGER_ABANDONED_TOMBSTONES_SIZE =
-

[spark] branch master updated: [SPARK-44982][CONNECT] Mark Spark Connect server configurations as static

2023-08-28 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 5b69dfd67e3 [SPARK-44982][CONNECT] Mark Spark Connect server 
configurations as static
5b69dfd67e3 is described below

commit 5b69dfd67e35f8be742a58cbd55f33088b4c7704
Author: Hyukjin Kwon 
AuthorDate: Mon Aug 28 17:36:20 2023 +0900

[SPARK-44982][CONNECT] Mark Spark Connect server configurations as static

### What changes were proposed in this pull request?

This PR proposes to mark all Spark Connect server configurations as static 
configurations.

### Why are the changes needed?

They are already static configurations, and cannot be set in runtime 
configuration (by default), see also 
https://github.com/apache/spark/blob/4a4856207d414ba88a8edabeb70e20765460ef1a/sql/core/src/main/scala/org/apache/spark/sql/RuntimeConfig.scala#L164-L167

### Does this PR introduce _any_ user-facing change?

No, they are already static configurations.

### How was this patch tested?

Existing unittests.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #42695 from HyukjinKwon/SPARK-44982.

Authored-by: Hyukjin Kwon 
Signed-off-by: Hyukjin Kwon 
---
 .../apache/spark/sql/connect/config/Connect.scala  | 39 +++---
 1 file changed, 19 insertions(+), 20 deletions(-)

diff --git 
a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/config/Connect.scala
 
b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/config/Connect.scala
index 9c03107db27..f7daca8542d 100644
--- 
a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/config/Connect.scala
+++ 
b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/config/Connect.scala
@@ -18,7 +18,6 @@ package org.apache.spark.sql.connect.config
 
 import java.util.concurrent.TimeUnit
 
-import org.apache.spark.internal.config.ConfigBuilder
 import org.apache.spark.network.util.ByteUnit
 import org.apache.spark.sql.connect.common.config.ConnectCommon
 
@@ -26,19 +25,19 @@ object Connect {
   import org.apache.spark.sql.internal.SQLConf.buildStaticConf
 
   val CONNECT_GRPC_BINDING_ADDRESS =
-ConfigBuilder("spark.connect.grpc.binding.address")
+buildStaticConf("spark.connect.grpc.binding.address")
   .version("4.0.0")
   .stringConf
   .createOptional
 
   val CONNECT_GRPC_BINDING_PORT =
-ConfigBuilder("spark.connect.grpc.binding.port")
+buildStaticConf("spark.connect.grpc.binding.port")
   .version("3.4.0")
   .intConf
   .createWithDefault(ConnectCommon.CONNECT_GRPC_BINDING_PORT)
 
   val CONNECT_GRPC_INTERCEPTOR_CLASSES =
-ConfigBuilder("spark.connect.grpc.interceptor.classes")
+buildStaticConf("spark.connect.grpc.interceptor.classes")
   .doc(
 "Comma separated list of class names that must " +
   "implement the io.grpc.ServerInterceptor interface.")
@@ -47,7 +46,7 @@ object Connect {
   .createOptional
 
   val CONNECT_GRPC_ARROW_MAX_BATCH_SIZE =
-ConfigBuilder("spark.connect.grpc.arrow.maxBatchSize")
+buildStaticConf("spark.connect.grpc.arrow.maxBatchSize")
   .doc(
 "When using Apache Arrow, limit the maximum size of one arrow batch, 
in bytes unless " +
   "otherwise specified, that can be sent from server side to client 
side. Currently, we " +
@@ -57,7 +56,7 @@ object Connect {
   .createWithDefault(4 * 1024 * 1024)
 
   val CONNECT_GRPC_MAX_INBOUND_MESSAGE_SIZE =
-ConfigBuilder("spark.connect.grpc.maxInboundMessageSize")
+buildStaticConf("spark.connect.grpc.maxInboundMessageSize")
   .doc("Sets the maximum inbound message in bytes size for the gRPC 
requests." +
 "Requests with a larger payload will fail.")
   .version("3.4.0")
@@ -65,7 +64,7 @@ object Connect {
   .createWithDefault(ConnectCommon.CONNECT_GRPC_MAX_MESSAGE_SIZE)
 
   val CONNECT_GRPC_MARSHALLER_RECURSION_LIMIT =
-ConfigBuilder("spark.connect.grpc.marshallerRecursionLimit")
+buildStaticConf("spark.connect.grpc.marshallerRecursionLimit")
   .internal()
   .doc("""
   |Sets the recursion limit to grpc protobuf messages.
@@ -75,7 +74,7 @@ object Connect {
   .createWithDefault(1024)
 
   val CONNECT_EXECUTE_MANAGER_DETACHED_TIMEOUT =
-ConfigBuilder("spark.connect.execute.manager.detachedTimeout")
+buildStaticConf("spark.connect.execute.manager.detachedTimeout")
   .internal()
   .doc("Timeout after which executions without an attached RPC will be 
removed.")
   .version("3.5.0")
@@ -83,7 +82,7 @@ object Connect {
   .createWithDefaultString("5m")
 
   val CONNECT_EXECUTE_MANAGER_MAINTENANCE_INTERVAL =
-

[spark] branch branch-3.5 updated: [SPARK-44981][PYTHON][CONNECT] Filter out static configurations used in local mode

2023-08-28 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new b86a4a8649a [SPARK-44981][PYTHON][CONNECT] Filter out static 
configurations used in local mode
b86a4a8649a is described below

commit b86a4a8649a3f8a97d5cff28d70282465042a396
Author: Hyukjin Kwon 
AuthorDate: Mon Aug 28 17:34:32 2023 +0900

[SPARK-44981][PYTHON][CONNECT] Filter out static configurations used in 
local mode

### What changes were proposed in this pull request?

This PR is a kind of a followup of 
https://github.com/apache/spark/pull/42548. This PR proposes to filter static 
configurations out in remote=local mode.

### Why are the changes needed?

Otherwise, it shows a bunch of warnings as below:

```
23/08/28 11:39:42 ERROR ErrorUtils: Spark Connect RPC error during: config. 
UserId: hyukjin.kwon. SessionId: 424674ef-af95-4b12-b10e-86479413f9fd.
org.apache.spark.sql.AnalysisException: Cannot modify the value of a static 
config: spark.connect.copyFromLocalToFs.allowDestLocal.
at 
org.apache.spark.sql.errors.QueryCompilationErrors$.cannotModifyValueOfStaticConfigError(QueryCompilationErrors.scala:3227)
at 
org.apache.spark.sql.RuntimeConfig.requireNonStaticConf(RuntimeConfig.scala:162)
at org.apache.spark.sql.RuntimeConfig.set(RuntimeConfig.scala:42)
at 
org.apache.spark.sql.connect.service.SparkConnectConfigHandler.$anonfun$handleSet$1(SparkConnectConfigHandler.scala:67)
at 
org.apache.spark.sql.connect.service.SparkConnectConfigHandler.$anonfun$handleSet$1$adapted(SparkConnectConfigHandler.scala:65)
at scala.collection.Iterator.foreach(Iterator.scala:943)
at scala.collection.Iterator.foreach$(Iterator.scala:943)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
at 
org.apache.spark.sql.connect.service.SparkConnectConfigHandler.handleSet(SparkConnectConfigHandler.scala:65)
at 
org.apache.spark.sql.connect.service.SparkConnectConfigHandler.handle(SparkConnectConfigHandler.scala:40)
at 
org.apache.spark.sql.connect.service.SparkConnectService.config(SparkConnectService.scala:120)
at 
org.apache.spark.connect.proto.SparkConnectServiceGrpc$MethodHandlers.invoke(SparkConnectServiceGrpc.java:751)
at 
org.sparkproject.connect.grpc.io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:182)
at 
org.sparkproject.connect.grpc.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:346)
at 
org.sparkproject.connect.grpc.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:860)
at 
org.sparkproject.connect.grpc.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
at 
org.sparkproject.connect.grpc.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
```

In fact, we do support to set static configurations (and all other 
configurations) when `remote` is specific to `local`.

### Does this PR introduce _any_ user-facing change?

No, the main change has not been released out yet.

### How was this patch tested?

Manually tested.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #42694 from HyukjinKwon/SPARK-44981.

Authored-by: Hyukjin Kwon 
Signed-off-by: Hyukjin Kwon 
(cherry picked from commit 64636aff61aa473c8fc81c0bb3311e1fe824dc20)
Signed-off-by: Hyukjin Kwon 
---
 python/pyspark/sql/connect/session.py | 8 
 1 file changed, 8 insertions(+)

diff --git a/python/pyspark/sql/connect/session.py 
b/python/pyspark/sql/connect/session.py
index 2905f7e4269..628eae20511 100644
--- a/python/pyspark/sql/connect/session.py
+++ b/python/pyspark/sql/connect/session.py
@@ -883,6 +883,14 @@ class SparkSession:
 PySparkSession(
 SparkContext.getOrCreate(create_conf(loadDefaults=True, 
_jvm=SparkContext._jvm))
 )
+
+# Lastly remove all static configurations that are not allowed 
to set in the regular
+# Spark Connect session.
+jvm = SparkContext._jvm
+utl = jvm.org.apache.spark.sql.api.python.PythonSQLUtils  # 
type: ignore[union-attr]
+for conf_set in utl.listStaticSQLConfigs():
+opts.pop(conf_set._1(), None)

[spark] branch master updated: [SPARK-44981][PYTHON][CONNECT] Filter out static configurations used in local mode

2023-08-28 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 64636aff61a [SPARK-44981][PYTHON][CONNECT] Filter out static 
configurations used in local mode
64636aff61a is described below

commit 64636aff61aa473c8fc81c0bb3311e1fe824dc20
Author: Hyukjin Kwon 
AuthorDate: Mon Aug 28 17:34:32 2023 +0900

[SPARK-44981][PYTHON][CONNECT] Filter out static configurations used in 
local mode

### What changes were proposed in this pull request?

This PR is a kind of a followup of 
https://github.com/apache/spark/pull/42548. This PR proposes to filter static 
configurations out in remote=local mode.

### Why are the changes needed?

Otherwise, it shows a bunch of warnings as below:

```
23/08/28 11:39:42 ERROR ErrorUtils: Spark Connect RPC error during: config. 
UserId: hyukjin.kwon. SessionId: 424674ef-af95-4b12-b10e-86479413f9fd.
org.apache.spark.sql.AnalysisException: Cannot modify the value of a static 
config: spark.connect.copyFromLocalToFs.allowDestLocal.
at 
org.apache.spark.sql.errors.QueryCompilationErrors$.cannotModifyValueOfStaticConfigError(QueryCompilationErrors.scala:3227)
at 
org.apache.spark.sql.RuntimeConfig.requireNonStaticConf(RuntimeConfig.scala:162)
at org.apache.spark.sql.RuntimeConfig.set(RuntimeConfig.scala:42)
at 
org.apache.spark.sql.connect.service.SparkConnectConfigHandler.$anonfun$handleSet$1(SparkConnectConfigHandler.scala:67)
at 
org.apache.spark.sql.connect.service.SparkConnectConfigHandler.$anonfun$handleSet$1$adapted(SparkConnectConfigHandler.scala:65)
at scala.collection.Iterator.foreach(Iterator.scala:943)
at scala.collection.Iterator.foreach$(Iterator.scala:943)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
at 
org.apache.spark.sql.connect.service.SparkConnectConfigHandler.handleSet(SparkConnectConfigHandler.scala:65)
at 
org.apache.spark.sql.connect.service.SparkConnectConfigHandler.handle(SparkConnectConfigHandler.scala:40)
at 
org.apache.spark.sql.connect.service.SparkConnectService.config(SparkConnectService.scala:120)
at 
org.apache.spark.connect.proto.SparkConnectServiceGrpc$MethodHandlers.invoke(SparkConnectServiceGrpc.java:751)
at 
org.sparkproject.connect.grpc.io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:182)
at 
org.sparkproject.connect.grpc.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:346)
at 
org.sparkproject.connect.grpc.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:860)
at 
org.sparkproject.connect.grpc.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
at 
org.sparkproject.connect.grpc.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
```

In fact, we do support to set static configurations (and all other 
configurations) when `remote` is specific to `local`.

### Does this PR introduce _any_ user-facing change?

No, the main change has not been released out yet.

### How was this patch tested?

Manually tested.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #42694 from HyukjinKwon/SPARK-44981.

Authored-by: Hyukjin Kwon 
Signed-off-by: Hyukjin Kwon 
---
 python/pyspark/sql/connect/session.py | 8 
 1 file changed, 8 insertions(+)

diff --git a/python/pyspark/sql/connect/session.py 
b/python/pyspark/sql/connect/session.py
index 8e234442c20..6c01aad06c0 100644
--- a/python/pyspark/sql/connect/session.py
+++ b/python/pyspark/sql/connect/session.py
@@ -884,6 +884,14 @@ class SparkSession:
 PySparkSession(
 SparkContext.getOrCreate(create_conf(loadDefaults=True, 
_jvm=SparkContext._jvm))
 )
+
+# Lastly remove all static configurations that are not allowed 
to set in the regular
+# Spark Connect session.
+jvm = SparkContext._jvm
+utl = jvm.org.apache.spark.sql.api.python.PythonSQLUtils  # 
type: ignore[union-attr]
+for conf_set in utl.listStaticSQLConfigs():
+opts.pop(conf_set._1(), None)
+
 finally:
 if origin_remote is not None:
 

svn commit: r63661 - /dev/spark/v3.5.0-rc3-bin/

2023-08-28 Thread liyuanjian
Author: liyuanjian
Date: Mon Aug 28 08:21:10 2023
New Revision: 63661

Log:
Apache Spark v3.5.0-rc3

Added:
dev/spark/v3.5.0-rc3-bin/
dev/spark/v3.5.0-rc3-bin/SparkR_3.5.0.tar.gz   (with props)
dev/spark/v3.5.0-rc3-bin/SparkR_3.5.0.tar.gz.asc
dev/spark/v3.5.0-rc3-bin/SparkR_3.5.0.tar.gz.sha512
dev/spark/v3.5.0-rc3-bin/pyspark-3.5.0.tar.gz   (with props)
dev/spark/v3.5.0-rc3-bin/pyspark-3.5.0.tar.gz.asc
dev/spark/v3.5.0-rc3-bin/pyspark-3.5.0.tar.gz.sha512
dev/spark/v3.5.0-rc3-bin/spark-3.5.0-bin-hadoop3-scala2.13.tgz   (with 
props)
dev/spark/v3.5.0-rc3-bin/spark-3.5.0-bin-hadoop3-scala2.13.tgz.asc
dev/spark/v3.5.0-rc3-bin/spark-3.5.0-bin-hadoop3-scala2.13.tgz.sha512
dev/spark/v3.5.0-rc3-bin/spark-3.5.0-bin-hadoop3.tgz   (with props)
dev/spark/v3.5.0-rc3-bin/spark-3.5.0-bin-hadoop3.tgz.asc
dev/spark/v3.5.0-rc3-bin/spark-3.5.0-bin-hadoop3.tgz.sha512
dev/spark/v3.5.0-rc3-bin/spark-3.5.0-bin-without-hadoop.tgz   (with props)
dev/spark/v3.5.0-rc3-bin/spark-3.5.0-bin-without-hadoop.tgz.asc
dev/spark/v3.5.0-rc3-bin/spark-3.5.0-bin-without-hadoop.tgz.sha512
dev/spark/v3.5.0-rc3-bin/spark-3.5.0.tgz   (with props)
dev/spark/v3.5.0-rc3-bin/spark-3.5.0.tgz.asc
dev/spark/v3.5.0-rc3-bin/spark-3.5.0.tgz.sha512

Added: dev/spark/v3.5.0-rc3-bin/SparkR_3.5.0.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v3.5.0-rc3-bin/SparkR_3.5.0.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v3.5.0-rc3-bin/SparkR_3.5.0.tar.gz.asc
==
--- dev/spark/v3.5.0-rc3-bin/SparkR_3.5.0.tar.gz.asc (added)
+++ dev/spark/v3.5.0-rc3-bin/SparkR_3.5.0.tar.gz.asc Mon Aug 28 08:21:10 2023
@@ -0,0 +1,17 @@
+-BEGIN PGP SIGNATURE-
+
+iQJKBAABCgA0FiEE/Drjp+qhusmHcIQOfhq8xTqqIhYFAmTsWJYWHGxpeXVhbmpp
+YW5AYXBhY2hlLm9yZwAKCRB+GrzFOqoiFhTrEACuJKbQ4KQeyC2rnWxD5X35FRvJ
+K8+Oodzju3zuA1MUSKROOR/d3gWAW5iv+DScqD/30AoEkmR17JwygD25dT7Kj3N6
+EntulHvCCT52bX+iz8f6vNCmC/OsgwUSRbG4Q7zkFThTQAgsyqCPyfMhdPvL5bfP
+dA50A9b6uhromOSKs+4lXAw4CADee0DMMLGGEqk+NiCga55JydbqhV77E6319ANK
+2kGL+pAYKuDnQSYtHzjfooqD5dLTmy5EoaODEdlexH3msqXTC57OczKBXAcSem+d
+AWn/wGDx9ZmC6jJ6PjorgdIOooQw2HOmCtF/03S2eq2K0cp0PUpCLVhfgC3Bnxty
+LlGyr2rcDg7c2qRUzKF+60TiCHsCpukiOqt3CIj8Ey2cD+MGn7WcICvciwXO1pc+
+Xkm5fIOO4jmw+1G5KIpmOcTs7FxayYDevURWfdW3oQAiETFQhjaIN3zWW2TpMuPU
+wQ3yAopyrSZxkfJ6pQ8ekQo93eahe0S6uJG777hbhqNwd1nF8GDHVzdMYALmGAiG
+EZ6HoFxvcl3swydseDMmG8ItZFiJfHv+C+66XfwhPE+eiZuC8jVMXGCwOe/5rqc8
+ZgoSas++PeLnPjU/jrsE8TSfZeG6n59AKP8gmYPiDzdxO+J1bi+5ReRJysY4CNo/
+upeCSJcXESxZLDxzmQ==
+=A7NR
+-END PGP SIGNATURE-

Added: dev/spark/v3.5.0-rc3-bin/SparkR_3.5.0.tar.gz.sha512
==
--- dev/spark/v3.5.0-rc3-bin/SparkR_3.5.0.tar.gz.sha512 (added)
+++ dev/spark/v3.5.0-rc3-bin/SparkR_3.5.0.tar.gz.sha512 Mon Aug 28 08:21:10 2023
@@ -0,0 +1 @@
+e90007efce8d63c10aaa9efc09fb0b63496c6a8e1bbac06941bc48fd5642a8e87da05b14e886f4fc18412ca1d816b455d3b4665b7242ee759c58b3849829cc74
  SparkR_3.5.0.tar.gz

Added: dev/spark/v3.5.0-rc3-bin/pyspark-3.5.0.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v3.5.0-rc3-bin/pyspark-3.5.0.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v3.5.0-rc3-bin/pyspark-3.5.0.tar.gz.asc
==
--- dev/spark/v3.5.0-rc3-bin/pyspark-3.5.0.tar.gz.asc (added)
+++ dev/spark/v3.5.0-rc3-bin/pyspark-3.5.0.tar.gz.asc Mon Aug 28 08:21:10 2023
@@ -0,0 +1,17 @@
+-BEGIN PGP SIGNATURE-
+
+iQJKBAABCgA0FiEE/Drjp+qhusmHcIQOfhq8xTqqIhYFAmTsWJcWHGxpeXVhbmpp
+YW5AYXBhY2hlLm9yZwAKCRB+GrzFOqoiFh9yD/4nQKM0ROSW08uBgcNQIHg7cYjt
+9gdnueZEaIFhEEhBqlCnFsV3IlUenuc6yimqmIoMvRH8Y2zacrG7cu0DhvnUWMQL
+nysi/6MY3FX2iVvsnM0K4GWa25VQ+0XJbEJK7mra9DfGbkA1GiDmUn9VY/lzifjQ
+fKbajPJVCo9j+wDeKHh/tAyD2JHveB8LB5zg+yY9Wgtyw/VgHqSnNRB1wJRreojz
+r54w8NGzO8nj3nu+TAmHjiprqpjWx7MQOaFiCFKgsYpooSBmfDXx38KxBshINIbv
+9ZbiAXwTpIpCr2N2yPAgowtoUvC/Jusl7Tu/ao/4Y8m3ENBl8q0xxhvHjMkZgH5c
+811xswVanFn4FR0e/tQnd3599yf2sFoHo72RVJ1Vac/AopFYFnowgJcxRB6vabkM
+6BOHDffQxg+MVGvypywU60+2TFw0EyZnIvAj1t0qKKXnIgRldXYdVjppDm8OKenv
+X84KryLmoF40uQpInABn4ML8kk5mD+lZt6ckAhZOv1xbLiNIH5uCozo+EoAQLlJK
+T7AF25BXkjqM0cekN9CChqVTpYc/dat9iOtAba3+1ri7SQDJ/HEZpBxL6/JHv5pe
+IHcYfMPBWlGfp2NSKclFy9d10a46enC3hNSli7f+HT0n7A96g1vONTqO9iSkmVsD
+UIRfRlvkVm0tL58uiQ==
+=HA3S
+-END PGP SIGNATURE-

Added: dev/spark/v3.5.0-rc3-bin/pyspark-3.5.0.tar.gz.sha512
==
--- dev/spark/v3.5.0-rc3-bin/pyspark-3.5.0.tar.gz.sha512 (added)
+++ 

[spark] branch branch-3.3 updated: [MINOR][SQL][DOC] Fix incorrect link in sql menu and typo

2023-08-28 Thread yao
This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch branch-3.3
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.3 by this push:
 new 0b8c43f1d14 [MINOR][SQL][DOC] Fix incorrect link in sql menu and typo
0b8c43f1d14 is described below

commit 0b8c43f1d1409dcea8b8648d7e759449b90f467c
Author: wforget <643348...@qq.com>
AuthorDate: Mon Aug 28 16:09:18 2023 +0800

[MINOR][SQL][DOC] Fix incorrect link in sql menu and typo

### What changes were proposed in this pull request?

 Fix incorrect link in sql menu and typo.

### Why are the changes needed?

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

run `SKIP_API=1 bundle exec jekyll build`


![image](https://github.com/apache/spark/assets/17894939/7cc564ec-41cd-4e92-b19e-d33a53188a10)

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #42697 from wForget/doc.

Authored-by: wforget <643348...@qq.com>
Signed-off-by: Kent Yao 
(cherry picked from commit 421ff4e3c047865a1887cae94b85dbf40bb7bac9)
Signed-off-by: Kent Yao 
---
 docs/_data/menu-sql.yaml| 4 ++--
 docs/sql-getting-started.md | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/docs/_data/menu-sql.yaml b/docs/_data/menu-sql.yaml
index 7d9e6f45ec7..d2659d0fb5c 100644
--- a/docs/_data/menu-sql.yaml
+++ b/docs/_data/menu-sql.yaml
@@ -17,8 +17,8 @@
   url: sql-getting-started.html#interoperating-with-rdds
 - text: Scalar Functions
   url: sql-getting-started.html#scalar-functions
-- text: Aggregations
-  url: sql-getting-started.html#aggregations
+- text: Aggregate Functions
+  url: sql-getting-started.html#aggregate-functions
 - text: Data Sources
   url: sql-data-sources.html
   subitems:
diff --git a/docs/sql-getting-started.md b/docs/sql-getting-started.md
index 69396924e35..77d87fcb342 100644
--- a/docs/sql-getting-started.md
+++ b/docs/sql-getting-started.md
@@ -352,7 +352,7 @@ Scalar functions are functions that return a single value 
per row, as opposed to
 
 ## Aggregate Functions
 
-Aggregate functions are functions that return a single value on a group of 
rows. The [Built-in Aggregation 
Functions](sql-ref-functions-builtin.html#aggregate-functions) provide common 
aggregations such as `count()`, `count_distinct()`, `avg()`, `max()`, `min()`, 
etc.
+Aggregate functions are functions that return a single value on a group of 
rows. The [Built-in Aggregate 
Functions](sql-ref-functions-builtin.html#aggregate-functions) provide common 
aggregations such as `count()`, `count_distinct()`, `avg()`, `max()`, `min()`, 
etc.
 Users are not limited to the predefined aggregate functions and can create 
their own. For more details
 about user defined aggregate functions, please refer to the documentation of
 [User Defined Aggregate Functions](sql-ref-functions-udf-aggregate.html).


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.4 updated: [MINOR][SQL][DOC] Fix incorrect link in sql menu and typo

2023-08-28 Thread yao
This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new 64c26b7cb9b [MINOR][SQL][DOC] Fix incorrect link in sql menu and typo
64c26b7cb9b is described below

commit 64c26b7cb9b4c770a3e056404e05f6b6603746ee
Author: wforget <643348...@qq.com>
AuthorDate: Mon Aug 28 16:09:18 2023 +0800

[MINOR][SQL][DOC] Fix incorrect link in sql menu and typo

### What changes were proposed in this pull request?

 Fix incorrect link in sql menu and typo.

### Why are the changes needed?

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

run `SKIP_API=1 bundle exec jekyll build`


![image](https://github.com/apache/spark/assets/17894939/7cc564ec-41cd-4e92-b19e-d33a53188a10)

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #42697 from wForget/doc.

Authored-by: wforget <643348...@qq.com>
Signed-off-by: Kent Yao 
(cherry picked from commit 421ff4e3c047865a1887cae94b85dbf40bb7bac9)
Signed-off-by: Kent Yao 
---
 docs/_data/menu-sql.yaml| 4 ++--
 docs/sql-getting-started.md | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/docs/_data/menu-sql.yaml b/docs/_data/menu-sql.yaml
index bf7a88d90d0..86ab679d6dd 100644
--- a/docs/_data/menu-sql.yaml
+++ b/docs/_data/menu-sql.yaml
@@ -17,8 +17,8 @@
   url: sql-getting-started.html#interoperating-with-rdds
 - text: Scalar Functions
   url: sql-getting-started.html#scalar-functions
-- text: Aggregations
-  url: sql-getting-started.html#aggregations
+- text: Aggregate Functions
+  url: sql-getting-started.html#aggregate-functions
 - text: Data Sources
   url: sql-data-sources.html
   subitems:
diff --git a/docs/sql-getting-started.md b/docs/sql-getting-started.md
index 69396924e35..77d87fcb342 100644
--- a/docs/sql-getting-started.md
+++ b/docs/sql-getting-started.md
@@ -352,7 +352,7 @@ Scalar functions are functions that return a single value 
per row, as opposed to
 
 ## Aggregate Functions
 
-Aggregate functions are functions that return a single value on a group of 
rows. The [Built-in Aggregation 
Functions](sql-ref-functions-builtin.html#aggregate-functions) provide common 
aggregations such as `count()`, `count_distinct()`, `avg()`, `max()`, `min()`, 
etc.
+Aggregate functions are functions that return a single value on a group of 
rows. The [Built-in Aggregate 
Functions](sql-ref-functions-builtin.html#aggregate-functions) provide common 
aggregations such as `count()`, `count_distinct()`, `avg()`, `max()`, `min()`, 
etc.
 Users are not limited to the predefined aggregate functions and can create 
their own. For more details
 about user defined aggregate functions, please refer to the documentation of
 [User Defined Aggregate Functions](sql-ref-functions-udf-aggregate.html).


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.5 updated: [MINOR][SQL][DOC] Fix incorrect link in sql menu and typo

2023-08-28 Thread yao
This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new c9e5e624e04 [MINOR][SQL][DOC] Fix incorrect link in sql menu and typo
c9e5e624e04 is described below

commit c9e5e624e0491610f107ff4cec569aa88779d4fa
Author: wforget <643348...@qq.com>
AuthorDate: Mon Aug 28 16:09:18 2023 +0800

[MINOR][SQL][DOC] Fix incorrect link in sql menu and typo

### What changes were proposed in this pull request?

 Fix incorrect link in sql menu and typo.

### Why are the changes needed?

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

run `SKIP_API=1 bundle exec jekyll build`


![image](https://github.com/apache/spark/assets/17894939/7cc564ec-41cd-4e92-b19e-d33a53188a10)

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #42697 from wForget/doc.

Authored-by: wforget <643348...@qq.com>
Signed-off-by: Kent Yao 
(cherry picked from commit 421ff4e3c047865a1887cae94b85dbf40bb7bac9)
Signed-off-by: Kent Yao 
---
 docs/_data/menu-sql.yaml| 4 ++--
 docs/sql-getting-started.md | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/docs/_data/menu-sql.yaml b/docs/_data/menu-sql.yaml
index 62ad6a3a585..ff93f09a83c 100644
--- a/docs/_data/menu-sql.yaml
+++ b/docs/_data/menu-sql.yaml
@@ -17,8 +17,8 @@
   url: sql-getting-started.html#interoperating-with-rdds
 - text: Scalar Functions
   url: sql-getting-started.html#scalar-functions
-- text: Aggregations
-  url: sql-getting-started.html#aggregations
+- text: Aggregate Functions
+  url: sql-getting-started.html#aggregate-functions
 - text: Data Sources
   url: sql-data-sources.html
   subitems:
diff --git a/docs/sql-getting-started.md b/docs/sql-getting-started.md
index 4fb7e9071ce..ed2678fdcd4 100644
--- a/docs/sql-getting-started.md
+++ b/docs/sql-getting-started.md
@@ -357,7 +357,7 @@ Scalar functions are functions that return a single value 
per row, as opposed to
 
 ## Aggregate Functions
 
-Aggregate functions are functions that return a single value on a group of 
rows. The [Built-in Aggregation 
Functions](sql-ref-functions-builtin.html#aggregate-functions) provide common 
aggregations such as `count()`, `count_distinct()`, `avg()`, `max()`, `min()`, 
etc.
+Aggregate functions are functions that return a single value on a group of 
rows. The [Built-in Aggregate 
Functions](sql-ref-functions-builtin.html#aggregate-functions) provide common 
aggregations such as `count()`, `count_distinct()`, `avg()`, `max()`, `min()`, 
etc.
 Users are not limited to the predefined aggregate functions and can create 
their own. For more details
 about user defined aggregate functions, please refer to the documentation of
 [User Defined Aggregate Functions](sql-ref-functions-udf-aggregate.html).


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [MINOR][SQL][DOC] Fix incorrect link in sql menu and typo

2023-08-28 Thread yao
This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 421ff4e3c04 [MINOR][SQL][DOC] Fix incorrect link in sql menu and typo
421ff4e3c04 is described below

commit 421ff4e3c047865a1887cae94b85dbf40bb7bac9
Author: wforget <643348...@qq.com>
AuthorDate: Mon Aug 28 16:09:18 2023 +0800

[MINOR][SQL][DOC] Fix incorrect link in sql menu and typo

### What changes were proposed in this pull request?

 Fix incorrect link in sql menu and typo.

### Why are the changes needed?

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

run `SKIP_API=1 bundle exec jekyll build`


![image](https://github.com/apache/spark/assets/17894939/7cc564ec-41cd-4e92-b19e-d33a53188a10)

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #42697 from wForget/doc.

Authored-by: wforget <643348...@qq.com>
Signed-off-by: Kent Yao 
---
 docs/_data/menu-sql.yaml| 4 ++--
 docs/sql-getting-started.md | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/docs/_data/menu-sql.yaml b/docs/_data/menu-sql.yaml
index 62ad6a3a585..ff93f09a83c 100644
--- a/docs/_data/menu-sql.yaml
+++ b/docs/_data/menu-sql.yaml
@@ -17,8 +17,8 @@
   url: sql-getting-started.html#interoperating-with-rdds
 - text: Scalar Functions
   url: sql-getting-started.html#scalar-functions
-- text: Aggregations
-  url: sql-getting-started.html#aggregations
+- text: Aggregate Functions
+  url: sql-getting-started.html#aggregate-functions
 - text: Data Sources
   url: sql-data-sources.html
   subitems:
diff --git a/docs/sql-getting-started.md b/docs/sql-getting-started.md
index 4fb7e9071ce..ed2678fdcd4 100644
--- a/docs/sql-getting-started.md
+++ b/docs/sql-getting-started.md
@@ -357,7 +357,7 @@ Scalar functions are functions that return a single value 
per row, as opposed to
 
 ## Aggregate Functions
 
-Aggregate functions are functions that return a single value on a group of 
rows. The [Built-in Aggregation 
Functions](sql-ref-functions-builtin.html#aggregate-functions) provide common 
aggregations such as `count()`, `count_distinct()`, `avg()`, `max()`, `min()`, 
etc.
+Aggregate functions are functions that return a single value on a group of 
rows. The [Built-in Aggregate 
Functions](sql-ref-functions-builtin.html#aggregate-functions) provide common 
aggregations such as `count()`, `count_distinct()`, `avg()`, `max()`, `min()`, 
etc.
 Users are not limited to the predefined aggregate functions and can create 
their own. For more details
 about user defined aggregate functions, please refer to the documentation of
 [User Defined Aggregate Functions](sql-ref-functions-udf-aggregate.html).


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.5 updated: [SPARK-44980][PYTHON][CONNECT] Fix inherited namedtuples to work in createDataFrame

2023-08-28 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new f33a13c4b16 [SPARK-44980][PYTHON][CONNECT] Fix inherited namedtuples 
to work in createDataFrame
f33a13c4b16 is described below

commit f33a13c4b165e4ae5099703c308a2715463a479a
Author: Hyukjin Kwon 
AuthorDate: Mon Aug 28 15:46:57 2023 +0900

[SPARK-44980][PYTHON][CONNECT] Fix inherited namedtuples to work in 
createDataFrame

### What changes were proposed in this pull request?

This PR fixes the bug in createDataFrame with Python Spark Connect client. 
Now it respects inherited namedtuples as below:

```python
from collections import namedtuple
MyTuple = namedtuple("MyTuple", ["zz", "b", "a"])

class MyInheritedTuple(MyTuple):
pass

df = spark.createDataFrame([MyInheritedTuple(1, 2, 3), MyInheritedTuple(11, 
22, 33)])
df.collect()
```

Before:

```
[Row(zz=None, b=None, a=None), Row(zz=None, b=None, a=None)]
```

After:

```
[Row(zz=1, b=2, a=3), Row(zz=11, b=22, a=33)]
```

### Why are the changes needed?

This is already supported without Spark Connect. We should match the 
behaviour for consistent API support.

### Does this PR introduce _any_ user-facing change?

Yes, as described above. It fixes a bug,

### How was this patch tested?

Manually tested as described above, and unittests were added.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #42693 from HyukjinKwon/SPARK-44980.

Authored-by: Hyukjin Kwon 
Signed-off-by: Hyukjin Kwon 
(cherry picked from commit 5291c6c9274aaabd4851d70e4c1baad629e12cca)
Signed-off-by: Hyukjin Kwon 
---
 python/pyspark/sql/connect/conversion.py   | 12 +--
 .../pyspark/sql/tests/connect/test_parity_arrow.py |  3 +++
 python/pyspark/sql/tests/test_arrow.py | 24 ++
 3 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/python/pyspark/sql/connect/conversion.py 
b/python/pyspark/sql/connect/conversion.py
index cdbc3a1e39c..1afeb3dfd44 100644
--- a/python/pyspark/sql/connect/conversion.py
+++ b/python/pyspark/sql/connect/conversion.py
@@ -117,7 +117,11 @@ class LocalDataToArrowConversion:
 ), f"{type(value)} {value}"
 
 _dict = {}
-if not isinstance(value, Row) and hasattr(value, 
"__dict__"):
+if (
+not isinstance(value, Row)
+and not isinstance(value, tuple)  # inherited 
namedtuple
+and hasattr(value, "__dict__")
+):
 value = value.__dict__
 if isinstance(value, dict):
 for i, field in enumerate(field_names):
@@ -274,7 +278,11 @@ class LocalDataToArrowConversion:
 pylist: List[List] = [[] for _ in range(len(column_names))]
 
 for item in data:
-if not isinstance(item, Row) and hasattr(item, "__dict__"):
+if (
+not isinstance(item, Row)
+and not isinstance(item, tuple)  # inherited namedtuple
+and hasattr(item, "__dict__")
+):
 item = item.__dict__
 if isinstance(item, dict):
 for i, col in enumerate(column_names):
diff --git a/python/pyspark/sql/tests/connect/test_parity_arrow.py 
b/python/pyspark/sql/tests/connect/test_parity_arrow.py
index 5f76cafb192..a92ef971cd2 100644
--- a/python/pyspark/sql/tests/connect/test_parity_arrow.py
+++ b/python/pyspark/sql/tests/connect/test_parity_arrow.py
@@ -142,6 +142,9 @@ class ArrowParityTests(ArrowTestsMixin, 
ReusedConnectTestCase, PandasOnSparkTest
 def test_toPandas_udt(self):
 self.check_toPandas_udt(True)
 
+def test_create_dataframe_namedtuples(self):
+self.check_create_dataframe_namedtuples(True)
+
 
 if __name__ == "__main__":
 from pyspark.sql.tests.connect.test_parity_arrow import *  # noqa: F401
diff --git a/python/pyspark/sql/tests/test_arrow.py 
b/python/pyspark/sql/tests/test_arrow.py
index 1b81ed72b22..73b6067373b 100644
--- a/python/pyspark/sql/tests/test_arrow.py
+++ b/python/pyspark/sql/tests/test_arrow.py
@@ -23,6 +23,7 @@ import unittest
 import warnings
 from distutils.version import LooseVersion
 from typing import cast
+from collections import namedtuple
 
 from pyspark import SparkContext, SparkConf
 from pyspark.sql import Row, SparkSession
@@ -1214,6 +1215,29 @@ class ArrowTestsMixin:
 
 assert_frame_equal(pdf, expected)
 
+def test_create_dataframe_namedtuples(self):
+# SPARK-44980: Inherited namedtuples in createDataFrame
+  

[spark] branch master updated (b7788f9b552 -> 5291c6c9274)

2023-08-28 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from b7788f9b552 [SPARK-44960][UI] Unescape and consist error summary 
across UI pages
 add 5291c6c9274 [SPARK-44980][PYTHON][CONNECT] Fix inherited namedtuples 
to work in createDataFrame

No new revisions were added by this update.

Summary of changes:
 python/pyspark/sql/connect/conversion.py   | 12 +--
 .../pyspark/sql/tests/connect/test_parity_arrow.py |  3 +++
 python/pyspark/sql/tests/test_arrow.py | 24 ++
 3 files changed, 37 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org