[spark-docker] branch master updated: [SPARK-42505] Apply entrypoint template change to 3.3.0/3.3.1

2023-02-21 Thread yikun
This is an automated email from the ASF dual-hosted git repository.

yikun pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark-docker.git


The following commit(s) were added to refs/heads/master by this push:
 new 02bc905  [SPARK-42505] Apply entrypoint template change to 3.3.0/3.3.1
02bc905 is described below

commit 02bc9054d757f8defbc2baf6af1d2a9aa84b2b35
Author: Yikun Jiang 
AuthorDate: Tue Feb 21 17:02:29 2023 +0800

[SPARK-42505] Apply entrypoint template change to 3.3.0/3.3.1

### What changes were proposed in this pull request?
Apply entrypoint template change to 3.3.0/3.3.1

### Why are the changes needed?
We remove the redundant PySpark related vars in 
https://github.com/apache/spark-docker/commit/e8f5b0a1151c349d9c7fdb09cf76300b42a6946b
 . This change also should be apply to 3.3.0/3.3.1.

### Does this PR introduce _any_ user-facing change?
No, because the image hasn't plublished yet.

### How was this patch tested?
CI for 3.3.0/3.3.1 passed

Closes #31 from Yikun/SPARK-42505.

Authored-by: Yikun Jiang 
Signed-off-by: Yikun Jiang 
---
 3.3.0/scala2.12-java11-r-ubuntu/entrypoint.sh | 7 ---
 3.3.0/scala2.12-java11-ubuntu/entrypoint.sh   | 7 ---
 3.3.1/scala2.12-java11-r-ubuntu/entrypoint.sh | 7 ---
 3.3.1/scala2.12-java11-ubuntu/entrypoint.sh   | 7 ---
 4 files changed, 28 deletions(-)

diff --git a/3.3.0/scala2.12-java11-r-ubuntu/entrypoint.sh 
b/3.3.0/scala2.12-java11-r-ubuntu/entrypoint.sh
index 4bb1557..159d539 100644
--- a/3.3.0/scala2.12-java11-r-ubuntu/entrypoint.sh
+++ b/3.3.0/scala2.12-java11-r-ubuntu/entrypoint.sh
@@ -45,13 +45,6 @@ if [ -n "$SPARK_EXTRA_CLASSPATH" ]; then
   SPARK_CLASSPATH="$SPARK_CLASSPATH:$SPARK_EXTRA_CLASSPATH"
 fi
 
-if ! [ -z ${PYSPARK_PYTHON+x} ]; then
-export PYSPARK_PYTHON
-fi
-if ! [ -z ${PYSPARK_DRIVER_PYTHON+x} ]; then
-export PYSPARK_DRIVER_PYTHON
-fi
-
 # If HADOOP_HOME is set and SPARK_DIST_CLASSPATH is not set, set it here so 
Hadoop jars are available to the executor.
 # It does not set SPARK_DIST_CLASSPATH if already set, to avoid overriding 
customizations of this value from elsewhere e.g. Docker/K8s.
 if [ -n "${HADOOP_HOME}"  ] && [ -z "${SPARK_DIST_CLASSPATH}"  ]; then
diff --git a/3.3.0/scala2.12-java11-ubuntu/entrypoint.sh 
b/3.3.0/scala2.12-java11-ubuntu/entrypoint.sh
index 4bb1557..159d539 100644
--- a/3.3.0/scala2.12-java11-ubuntu/entrypoint.sh
+++ b/3.3.0/scala2.12-java11-ubuntu/entrypoint.sh
@@ -45,13 +45,6 @@ if [ -n "$SPARK_EXTRA_CLASSPATH" ]; then
   SPARK_CLASSPATH="$SPARK_CLASSPATH:$SPARK_EXTRA_CLASSPATH"
 fi
 
-if ! [ -z ${PYSPARK_PYTHON+x} ]; then
-export PYSPARK_PYTHON
-fi
-if ! [ -z ${PYSPARK_DRIVER_PYTHON+x} ]; then
-export PYSPARK_DRIVER_PYTHON
-fi
-
 # If HADOOP_HOME is set and SPARK_DIST_CLASSPATH is not set, set it here so 
Hadoop jars are available to the executor.
 # It does not set SPARK_DIST_CLASSPATH if already set, to avoid overriding 
customizations of this value from elsewhere e.g. Docker/K8s.
 if [ -n "${HADOOP_HOME}"  ] && [ -z "${SPARK_DIST_CLASSPATH}"  ]; then
diff --git a/3.3.1/scala2.12-java11-r-ubuntu/entrypoint.sh 
b/3.3.1/scala2.12-java11-r-ubuntu/entrypoint.sh
index 4bb1557..159d539 100644
--- a/3.3.1/scala2.12-java11-r-ubuntu/entrypoint.sh
+++ b/3.3.1/scala2.12-java11-r-ubuntu/entrypoint.sh
@@ -45,13 +45,6 @@ if [ -n "$SPARK_EXTRA_CLASSPATH" ]; then
   SPARK_CLASSPATH="$SPARK_CLASSPATH:$SPARK_EXTRA_CLASSPATH"
 fi
 
-if ! [ -z ${PYSPARK_PYTHON+x} ]; then
-export PYSPARK_PYTHON
-fi
-if ! [ -z ${PYSPARK_DRIVER_PYTHON+x} ]; then
-export PYSPARK_DRIVER_PYTHON
-fi
-
 # If HADOOP_HOME is set and SPARK_DIST_CLASSPATH is not set, set it here so 
Hadoop jars are available to the executor.
 # It does not set SPARK_DIST_CLASSPATH if already set, to avoid overriding 
customizations of this value from elsewhere e.g. Docker/K8s.
 if [ -n "${HADOOP_HOME}"  ] && [ -z "${SPARK_DIST_CLASSPATH}"  ]; then
diff --git a/3.3.1/scala2.12-java11-ubuntu/entrypoint.sh 
b/3.3.1/scala2.12-java11-ubuntu/entrypoint.sh
index 4bb1557..159d539 100644
--- a/3.3.1/scala2.12-java11-ubuntu/entrypoint.sh
+++ b/3.3.1/scala2.12-java11-ubuntu/entrypoint.sh
@@ -45,13 +45,6 @@ if [ -n "$SPARK_EXTRA_CLASSPATH" ]; then
   SPARK_CLASSPATH="$SPARK_CLASSPATH:$SPARK_EXTRA_CLASSPATH"
 fi
 
-if ! [ -z ${PYSPARK_PYTHON+x} ]; then
-export PYSPARK_PYTHON
-fi
-if ! [ -z ${PYSPARK_DRIVER_PYTHON+x} ]; then
-export PYSPARK_DRIVER_PYTHON
-fi
-
 # If HADOOP_HOME is set and SPARK_DIST_CLASSPATH is not set, set it here so 
Hadoop jars are available to the executor.
 # It does not set SPARK_DIST_CLASSPATH if already set, to avoid overriding 
customizations of this value from elsewhere e.g. Docker/K8s.
 if [ -n "${HADOOP_HOME}"  ] && [ -z "${SPARK_DIST_CLASSPATH}"  ]; then


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additi

[spark] branch master updated (0e8a20e6da1 -> 0c20263dcd0)

2023-02-21 Thread xinrong
This is an automated email from the ASF dual-hosted git repository.

xinrong pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 0e8a20e6da1 [SPARK-37099][SQL] Introduce the group limit of Window for 
rank-based filter to optimize top-k computation
 add 0c20263dcd0 [SPARK-42507][SQL][TESTS] Simplify ORC schema merging 
conflict error check

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/execution/datasources/orc/OrcSourceSuite.scala   | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.4 updated: [SPARK-42507][SQL][TESTS] Simplify ORC schema merging conflict error check

2023-02-21 Thread xinrong
This is an automated email from the ASF dual-hosted git repository.

xinrong pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new f394322be3b [SPARK-42507][SQL][TESTS] Simplify ORC schema merging 
conflict error check
f394322be3b is described below

commit f394322be3b9a0451e0dff158129b607549b9160
Author: Dongjoon Hyun 
AuthorDate: Tue Feb 21 17:48:09 2023 +0800

[SPARK-42507][SQL][TESTS] Simplify ORC schema merging conflict error check

### What changes were proposed in this pull request?

This PR aims to simplify ORC schema merging conflict error check.

### Why are the changes needed?

Currently, `branch-3.4` CI is broken because the order of partitions.
- https://github.com/apache/spark/runs/11463120795
- https://github.com/apache/spark/runs/11463886897
- https://github.com/apache/spark/runs/11467827738
- https://github.com/apache/spark/runs/11471484144
- https://github.com/apache/spark/runs/11471507531
- https://github.com/apache/spark/runs/11474764316

![Screenshot 2023-02-20 at 12 30 19 
PM](https://user-images.githubusercontent.com/9700541/220193503-6d6ce2ce-3fd6-4b01-b91c-bc1ec1f41c03.png)

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Pass the CIs.

Closes #40101 from dongjoon-hyun/SPARK-42507.

Authored-by: Dongjoon Hyun 
Signed-off-by: Xinrong Meng 
(cherry picked from commit 0c20263dcd0c394f8bfd6fa2bfc62031135de06a)
Signed-off-by: Xinrong Meng 
---
 .../spark/sql/execution/datasources/orc/OrcSourceSuite.scala   | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcSourceSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcSourceSuite.scala
index c821276431e..024f5f6b67e 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcSourceSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcSourceSuite.scala
@@ -455,11 +455,8 @@ abstract class OrcSuite
 throw new UnsupportedOperationException(s"Unknown ORC 
implementation: $impl")
 }
 
-checkError(
-  exception = innerException.asInstanceOf[SparkException],
-  errorClass = "CANNOT_MERGE_INCOMPATIBLE_DATA_TYPE",
-  parameters = Map("left" -> "\"BIGINT\"", "right" -> "\"STRING\"")
-)
+assert(innerException.asInstanceOf[SparkException].getErrorClass ===
+  "CANNOT_MERGE_INCOMPATIBLE_DATA_TYPE")
   }
 
   // it is ok if no schema merging


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] 01/01: Preparing Spark release v3.4.0-rc1

2023-02-21 Thread xinrong
This is an automated email from the ASF dual-hosted git repository.

xinrong pushed a commit to tag v3.4.0-rc1
in repository https://gitbox.apache.org/repos/asf/spark.git

commit e2484f626bb338274665a49078b528365ea18c3b
Author: Xinrong Meng 
AuthorDate: Tue Feb 21 10:39:21 2023 +

Preparing Spark release v3.4.0-rc1
---
 R/pkg/DESCRIPTION  | 2 +-
 assembly/pom.xml   | 2 +-
 common/kvstore/pom.xml | 2 +-
 common/network-common/pom.xml  | 2 +-
 common/network-shuffle/pom.xml | 2 +-
 common/network-yarn/pom.xml| 2 +-
 common/sketch/pom.xml  | 2 +-
 common/tags/pom.xml| 2 +-
 common/unsafe/pom.xml  | 2 +-
 connector/avro/pom.xml | 2 +-
 connector/connect/client/jvm/pom.xml   | 2 +-
 connector/connect/common/pom.xml   | 2 +-
 connector/connect/server/pom.xml   | 2 +-
 connector/docker-integration-tests/pom.xml | 2 +-
 connector/kafka-0-10-assembly/pom.xml  | 2 +-
 connector/kafka-0-10-sql/pom.xml   | 2 +-
 connector/kafka-0-10-token-provider/pom.xml| 2 +-
 connector/kafka-0-10/pom.xml   | 2 +-
 connector/kinesis-asl-assembly/pom.xml | 2 +-
 connector/kinesis-asl/pom.xml  | 2 +-
 connector/protobuf/pom.xml | 2 +-
 connector/spark-ganglia-lgpl/pom.xml   | 2 +-
 core/pom.xml   | 2 +-
 docs/_config.yml   | 6 +++---
 examples/pom.xml   | 2 +-
 graphx/pom.xml | 2 +-
 hadoop-cloud/pom.xml   | 2 +-
 launcher/pom.xml   | 2 +-
 mllib-local/pom.xml| 2 +-
 mllib/pom.xml  | 2 +-
 pom.xml| 2 +-
 python/pyspark/version.py  | 2 +-
 repl/pom.xml   | 2 +-
 resource-managers/kubernetes/core/pom.xml  | 2 +-
 resource-managers/kubernetes/integration-tests/pom.xml | 2 +-
 resource-managers/mesos/pom.xml| 2 +-
 resource-managers/yarn/pom.xml | 2 +-
 sql/catalyst/pom.xml   | 2 +-
 sql/core/pom.xml   | 2 +-
 sql/hive-thriftserver/pom.xml  | 2 +-
 sql/hive/pom.xml   | 2 +-
 streaming/pom.xml  | 2 +-
 tools/pom.xml  | 2 +-
 43 files changed, 45 insertions(+), 45 deletions(-)

diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index fa7028630a8..4a32762b34c 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: SparkR
 Type: Package
-Version: 3.4.1
+Version: 3.4.0
 Title: R Front End for 'Apache Spark'
 Description: Provides an R Front end for 'Apache Spark' 
.
 Authors@R:
diff --git a/assembly/pom.xml b/assembly/pom.xml
index a4111eb64d9..58dd9ef46e0 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.1-SNAPSHOT
+3.4.0
 ../pom.xml
   
 
diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml
index f9ecfb3d692..95ea15552da 100644
--- a/common/kvstore/pom.xml
+++ b/common/kvstore/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.1-SNAPSHOT
+3.4.0
 ../../pom.xml
   
 
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index 22ee65b7d25..e4d98471bf9 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.1-SNAPSHOT
+3.4.0
 ../../pom.xml
   
 
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index 2c67da81ca4..7a6d5aedf65 100644
--- a/common/network-shuffle/pom.xml
+++ b/common/network-shuffle/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.1-SNAPSHOT
+3.4.0
 ../../pom.xml
   
 
diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml
index 219682e047d..1c421754083 100644
--- a/common/network-yarn/pom.xml
+++ b/common/network-yarn/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.1-SNAPSHOT
+3.4.0
 ../../pom.xml
   
 
diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml
index 22ce78c6fd2..2ee25ebfffc 100644

[spark] tag v3.4.0-rc1 created (now e2484f626bb)

2023-02-21 Thread xinrong
This is an automated email from the ASF dual-hosted git repository.

xinrong pushed a change to tag v3.4.0-rc1
in repository https://gitbox.apache.org/repos/asf/spark.git


  at e2484f626bb (commit)
This tag includes the following new commits:

 new e2484f626bb Preparing Spark release v3.4.0-rc1

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.4 updated (f394322be3b -> 63be7fd7334)

2023-02-21 Thread xinrong
This is an automated email from the ASF dual-hosted git repository.

xinrong pushed a change to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


from f394322be3b [SPARK-42507][SQL][TESTS] Simplify ORC schema merging 
conflict error check
 add e2484f626bb Preparing Spark release v3.4.0-rc1
 new 63be7fd7334 Preparing development version 3.4.1-SNAPSHOT

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] 01/01: Preparing development version 3.4.1-SNAPSHOT

2023-02-21 Thread xinrong
This is an automated email from the ASF dual-hosted git repository.

xinrong pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git

commit 63be7fd7334111474e79d88c687d376ede30e37f
Author: Xinrong Meng 
AuthorDate: Tue Feb 21 10:39:26 2023 +

Preparing development version 3.4.1-SNAPSHOT
---
 R/pkg/DESCRIPTION  | 2 +-
 assembly/pom.xml   | 2 +-
 common/kvstore/pom.xml | 2 +-
 common/network-common/pom.xml  | 2 +-
 common/network-shuffle/pom.xml | 2 +-
 common/network-yarn/pom.xml| 2 +-
 common/sketch/pom.xml  | 2 +-
 common/tags/pom.xml| 2 +-
 common/unsafe/pom.xml  | 2 +-
 connector/avro/pom.xml | 2 +-
 connector/connect/client/jvm/pom.xml   | 2 +-
 connector/connect/common/pom.xml   | 2 +-
 connector/connect/server/pom.xml   | 2 +-
 connector/docker-integration-tests/pom.xml | 2 +-
 connector/kafka-0-10-assembly/pom.xml  | 2 +-
 connector/kafka-0-10-sql/pom.xml   | 2 +-
 connector/kafka-0-10-token-provider/pom.xml| 2 +-
 connector/kafka-0-10/pom.xml   | 2 +-
 connector/kinesis-asl-assembly/pom.xml | 2 +-
 connector/kinesis-asl/pom.xml  | 2 +-
 connector/protobuf/pom.xml | 2 +-
 connector/spark-ganglia-lgpl/pom.xml   | 2 +-
 core/pom.xml   | 2 +-
 docs/_config.yml   | 6 +++---
 examples/pom.xml   | 2 +-
 graphx/pom.xml | 2 +-
 hadoop-cloud/pom.xml   | 2 +-
 launcher/pom.xml   | 2 +-
 mllib-local/pom.xml| 2 +-
 mllib/pom.xml  | 2 +-
 pom.xml| 2 +-
 python/pyspark/version.py  | 2 +-
 repl/pom.xml   | 2 +-
 resource-managers/kubernetes/core/pom.xml  | 2 +-
 resource-managers/kubernetes/integration-tests/pom.xml | 2 +-
 resource-managers/mesos/pom.xml| 2 +-
 resource-managers/yarn/pom.xml | 2 +-
 sql/catalyst/pom.xml   | 2 +-
 sql/core/pom.xml   | 2 +-
 sql/hive-thriftserver/pom.xml  | 2 +-
 sql/hive/pom.xml   | 2 +-
 streaming/pom.xml  | 2 +-
 tools/pom.xml  | 2 +-
 43 files changed, 45 insertions(+), 45 deletions(-)

diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index 4a32762b34c..fa7028630a8 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: SparkR
 Type: Package
-Version: 3.4.0
+Version: 3.4.1
 Title: R Front End for 'Apache Spark'
 Description: Provides an R Front end for 'Apache Spark' 
.
 Authors@R:
diff --git a/assembly/pom.xml b/assembly/pom.xml
index 58dd9ef46e0..a4111eb64d9 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.0
+3.4.1-SNAPSHOT
 ../pom.xml
   
 
diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml
index 95ea15552da..f9ecfb3d692 100644
--- a/common/kvstore/pom.xml
+++ b/common/kvstore/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.0
+3.4.1-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index e4d98471bf9..22ee65b7d25 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.0
+3.4.1-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index 7a6d5aedf65..2c67da81ca4 100644
--- a/common/network-shuffle/pom.xml
+++ b/common/network-shuffle/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.0
+3.4.1-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml
index 1c421754083..219682e047d 100644
--- a/common/network-yarn/pom.xml
+++ b/common/network-yarn/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.0
+3.4.1-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml
index 2ee25ebfffc..22ce7

svn commit: r60238 - /dev/spark/v3.4.0-rc1-bin/

2023-02-21 Thread xinrong
Author: xinrong
Date: Tue Feb 21 11:57:55 2023
New Revision: 60238

Log:
Apache Spark v3.4.0-rc1

Added:
dev/spark/v3.4.0-rc1-bin/
dev/spark/v3.4.0-rc1-bin/SparkR_3.4.0.tar.gz   (with props)
dev/spark/v3.4.0-rc1-bin/SparkR_3.4.0.tar.gz.asc
dev/spark/v3.4.0-rc1-bin/SparkR_3.4.0.tar.gz.sha512
dev/spark/v3.4.0-rc1-bin/pyspark-3.4.0.tar.gz   (with props)
dev/spark/v3.4.0-rc1-bin/pyspark-3.4.0.tar.gz.asc
dev/spark/v3.4.0-rc1-bin/pyspark-3.4.0.tar.gz.sha512
dev/spark/v3.4.0-rc1-bin/spark-3.4.0-bin-hadoop3-scala2.13.tgz   (with 
props)
dev/spark/v3.4.0-rc1-bin/spark-3.4.0-bin-hadoop3-scala2.13.tgz.asc
dev/spark/v3.4.0-rc1-bin/spark-3.4.0-bin-hadoop3-scala2.13.tgz.sha512
dev/spark/v3.4.0-rc1-bin/spark-3.4.0-bin-hadoop3.tgz   (with props)
dev/spark/v3.4.0-rc1-bin/spark-3.4.0-bin-hadoop3.tgz.asc
dev/spark/v3.4.0-rc1-bin/spark-3.4.0-bin-hadoop3.tgz.sha512
dev/spark/v3.4.0-rc1-bin/spark-3.4.0-bin-without-hadoop.tgz   (with props)
dev/spark/v3.4.0-rc1-bin/spark-3.4.0-bin-without-hadoop.tgz.asc
dev/spark/v3.4.0-rc1-bin/spark-3.4.0-bin-without-hadoop.tgz.sha512
dev/spark/v3.4.0-rc1-bin/spark-3.4.0.tgz   (with props)
dev/spark/v3.4.0-rc1-bin/spark-3.4.0.tgz.asc
dev/spark/v3.4.0-rc1-bin/spark-3.4.0.tgz.sha512

Added: dev/spark/v3.4.0-rc1-bin/SparkR_3.4.0.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v3.4.0-rc1-bin/SparkR_3.4.0.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v3.4.0-rc1-bin/SparkR_3.4.0.tar.gz.asc
==
--- dev/spark/v3.4.0-rc1-bin/SparkR_3.4.0.tar.gz.asc (added)
+++ dev/spark/v3.4.0-rc1-bin/SparkR_3.4.0.tar.gz.asc Tue Feb 21 11:57:55 2023
@@ -0,0 +1,17 @@
+-BEGIN PGP SIGNATURE-
+
+iQJHBAABCgAxFiEEzGiz0W/jOnZnBRYLp+V5CMek4bEFAmP0sVMTHHhpbnJvbmdA
+YXBhY2hlLm9yZwAKCRCn5XkIx6Thsbk1D/4wKDoCUBbr0bOOPpGKbMyWggJQDdvl
+xCDXR5nFFkLdY6vZFerIp32jX1JFQA2Enr24iCBy00ERszFT9LMRP66nOG3OseU1
+6eI4Y4l5ACAD35qdUjFsuPNPy71Q2HqWrY52isMZWfj8TYY9X3T3w9Wox6KgTOon
+rGoOtj+N6tAF5ACvJIX43li8JPesJQNl1epbu2LtrZa+tFyfgQBowuHmhiQ5PQ/v
+EufANZytLWllzX81EfNbiJ9hN9geqIHgXew6b1rtd8IS05PdDimA/uwtP+LqBBqq
+MKfUA6Tf8T9SpN36ZN6/lfOKVKu0OFXc9qfJIj9cdBfhTcoP1vUGVMqNtWEQQFqo
+DZVRnBrnnx5lQOYry3gm4UgdLtHpwqvOZtqpmbvSHV503+JCqBnFnw8jvGzaVfWZ
+OIPa4AuhjAxqMcnCdLHmpg/QcX07/tPXPO0kpEWz7a1QjF6C+gidtbgIghY/HIzs
+lNfI3TdWop3Wwnpa0kHHlwi15jfeaxnPQDtIw/YRWojbztE0wG8rXycoWl2h0o05
+XQ55Rl9qEviW3GPOW52SGAD47+2j3eU6lFEs+xz85E/jxIneYkuweMJ5Vk1iTdEH
+7yfjQqVozR3QeyaYll9W1ax50LUtrMx5vTMdy82L0yzg0NQctqEa+I3HRQjgxVFB
+7gqTLxqG8bpyPA==
+=+Kud
+-END PGP SIGNATURE-

Added: dev/spark/v3.4.0-rc1-bin/SparkR_3.4.0.tar.gz.sha512
==
--- dev/spark/v3.4.0-rc1-bin/SparkR_3.4.0.tar.gz.sha512 (added)
+++ dev/spark/v3.4.0-rc1-bin/SparkR_3.4.0.tar.gz.sha512 Tue Feb 21 11:57:55 2023
@@ -0,0 +1 @@
+21574f5fb95f397640c896678002559a10b6e264b3887115128bde380682065e8a3883dd94136c318c78f3047a7cd4a2763b617863686329b47532983f171240
  SparkR_3.4.0.tar.gz

Added: dev/spark/v3.4.0-rc1-bin/pyspark-3.4.0.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v3.4.0-rc1-bin/pyspark-3.4.0.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v3.4.0-rc1-bin/pyspark-3.4.0.tar.gz.asc
==
--- dev/spark/v3.4.0-rc1-bin/pyspark-3.4.0.tar.gz.asc (added)
+++ dev/spark/v3.4.0-rc1-bin/pyspark-3.4.0.tar.gz.asc Tue Feb 21 11:57:55 2023
@@ -0,0 +1,17 @@
+-BEGIN PGP SIGNATURE-
+
+iQJHBAABCgAxFiEEzGiz0W/jOnZnBRYLp+V5CMek4bEFAmP0sVUTHHhpbnJvbmdA
+YXBhY2hlLm9yZwAKCRCn5XkIx6ThsWbPD/9dWcxjrRR54QccE8zwX5oaiboVFXuI
+0BLahV54IQi4HZjVgRHzbEWD/qaemW5Brcos003nsaGnXT0m0oi656X2967ZuJTk
+zYanrIafACwplVo7uxcq2VBp6IKcDkWEUL42fAcV5GN1/1NpNHqzZqZMGe5ufKLB
+05Np0ac8L6XXMpIG0to6H1LEmAW7/4PBARpzt6/TgZjoEI7a7YHMUlL0OjmHmP/m
+3Ck8slg+Osk2opYJL4AXycFh36Ns43OG3TnhfLYyDG0jtiXpWBZ4Yt2bin55j0f/
+yrDe1lDlRJ14pXay2f/s5eFrz16qHfRluWZzxcEyJjZva1AD5V1XMh/zsRGDfvUZ
+BkEM2GHYn3gZH9uuGfYbqL+pcZgrmVjZMgcZfhjyxLrRW8WBFr9g5lCIQF+4lpU8
+JwM4W3eOLyaC3wpVTfPU8rJfGExeBLhJ7zAyw65+yUx27KMUWatzGuQSA63iE1bg
+FIruQABSDsenFARnLybB8l41t0PTGlWU9+g5E4BlU/+GbnxaQEuOTSnZOenhPOGe
+n2g4Yfr81aYqVX8VKL0wzYXeB39SaXrtGhUaWVjFookNb42SNB1IPG2xQ+qQtcMw
+jv1m+1BIMWXDLZcLlrIViEzoyNhIy83CipDujJpoh4tlXb3OHOJqYuIZjMPhgVcB
+vtJFP8xIOdwRIg==
+=058e
+-END PGP SIGNATURE-

Added: dev/spark/v3.4.0-rc1-bin/pyspark-3.4.0.tar.gz.sha512
==
--- dev/spark/v3.4.0-rc1-bin/pyspark-3.4.0.tar.gz.sha512 (added)
+++ dev/spark/v

[spark] branch master updated (0c20263dcd0 -> 088ebdeea67)

2023-02-21 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 0c20263dcd0 [SPARK-42507][SQL][TESTS] Simplify ORC schema merging 
conflict error check
 add 088ebdeea67 [MINOR][TESTS] Avoid NPE in an anonym SparkListener in 
DataFrameReaderWriterSuite

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala   | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.4 updated: [MINOR][TESTS] Avoid NPE in an anonym SparkListener in DataFrameReaderWriterSuite

2023-02-21 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new 6119ff51b7e [MINOR][TESTS] Avoid NPE in an anonym SparkListener in 
DataFrameReaderWriterSuite
6119ff51b7e is described below

commit 6119ff51b7ebc5e5fe2f1282223d59ec32bb8cf1
Author: Kent Yao 
AuthorDate: Tue Feb 21 21:07:52 2023 +0900

[MINOR][TESTS] Avoid NPE in an anonym SparkListener in 
DataFrameReaderWriterSuite

### What changes were proposed in this pull request?

Avoid the following NPE in an anonym SparkListener in 
DataFrameReaderWriterSuite, as job desc may be absent

```
java.lang.NullPointerException
at 
java.util.concurrent.ConcurrentLinkedQueue.checkNotNull(ConcurrentLinkedQueue.java:920)
at 
java.util.concurrent.ConcurrentLinkedQueue.offer(ConcurrentLinkedQueue.java:327)
at 
java.util.concurrent.ConcurrentLinkedQueue.add(ConcurrentLinkedQueue.java:297)
at 
org.apache.spark.sql.test.DataFrameReaderWriterSuite$$anon$2.onJobStart(DataFrameReaderWriterSuite.scala:1151)
at 
org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:37)
at 
org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28)
at 
org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
at 
org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
at 
org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:117)
at 
org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:101)
at 
org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:105)
at 
org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:105)
at 
scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
at 
org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:100)
at 
org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:96)
at 
org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1462)
at 
org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:96)

```

### Why are the changes needed?

Test Improvement

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

existing tests

Closes #40102 from yaooqinn/test-minor.

Authored-by: Kent Yao 
Signed-off-by: Hyukjin Kwon 
(cherry picked from commit 088ebdeea67dd509048a7559f1c92a3636e18ce6)
Signed-off-by: Hyukjin Kwon 
---
 .../scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala   | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala
index a9836d281f0..bf53ffba222 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala
@@ -1148,7 +1148,8 @@ class DataFrameReaderWriterSuite extends QueryTest with 
SharedSparkSession with
 val jobDescriptions = new ConcurrentLinkedQueue[String]()
 val jobListener = new SparkListener {
   override def onJobStart(jobStart: SparkListenerJobStart): Unit = {
-
jobDescriptions.add(jobStart.properties.getProperty(SparkContext.SPARK_JOB_DESCRIPTION))
+val desc = 
jobStart.properties.getProperty(SparkContext.SPARK_JOB_DESCRIPTION)
+if (desc != null) jobDescriptions.add(desc)
   }
 }
 sparkContext.addSparkListener(jobListener)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.3 updated: [MINOR][TESTS] Avoid NPE in an anonym SparkListener in DataFrameReaderWriterSuite

2023-02-21 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.3
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.3 by this push:
 new 9b0d49b02e9 [MINOR][TESTS] Avoid NPE in an anonym SparkListener in 
DataFrameReaderWriterSuite
9b0d49b02e9 is described below

commit 9b0d49b02e959cfa36dc3b0870f6381a730fed2e
Author: Kent Yao 
AuthorDate: Tue Feb 21 21:07:52 2023 +0900

[MINOR][TESTS] Avoid NPE in an anonym SparkListener in 
DataFrameReaderWriterSuite

### What changes were proposed in this pull request?

Avoid the following NPE in an anonym SparkListener in 
DataFrameReaderWriterSuite, as job desc may be absent

```
java.lang.NullPointerException
at 
java.util.concurrent.ConcurrentLinkedQueue.checkNotNull(ConcurrentLinkedQueue.java:920)
at 
java.util.concurrent.ConcurrentLinkedQueue.offer(ConcurrentLinkedQueue.java:327)
at 
java.util.concurrent.ConcurrentLinkedQueue.add(ConcurrentLinkedQueue.java:297)
at 
org.apache.spark.sql.test.DataFrameReaderWriterSuite$$anon$2.onJobStart(DataFrameReaderWriterSuite.scala:1151)
at 
org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:37)
at 
org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28)
at 
org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
at 
org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
at 
org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:117)
at 
org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:101)
at 
org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:105)
at 
org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:105)
at 
scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
at 
org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:100)
at 
org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:96)
at 
org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1462)
at 
org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:96)

```

### Why are the changes needed?

Test Improvement

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

existing tests

Closes #40102 from yaooqinn/test-minor.

Authored-by: Kent Yao 
Signed-off-by: Hyukjin Kwon 
(cherry picked from commit 088ebdeea67dd509048a7559f1c92a3636e18ce6)
Signed-off-by: Hyukjin Kwon 
---
 .../scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala   | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala
index dabd9c001eb..c933ab50d21 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala
@@ -1132,7 +1132,8 @@ class DataFrameReaderWriterSuite extends QueryTest with 
SharedSparkSession with
 val jobDescriptions = new ConcurrentLinkedQueue[String]()
 val jobListener = new SparkListener {
   override def onJobStart(jobStart: SparkListenerJobStart): Unit = {
-
jobDescriptions.add(jobStart.properties.getProperty(SparkContext.SPARK_JOB_DESCRIPTION))
+val desc = 
jobStart.properties.getProperty(SparkContext.SPARK_JOB_DESCRIPTION)
+if (desc != null) jobDescriptions.add(desc)
   }
 }
 sparkContext.addSparkListener(jobListener)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.2 updated: [MINOR][TESTS] Avoid NPE in an anonym SparkListener in DataFrameReaderWriterSuite

2023-02-21 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.2 by this push:
 new 358ee499749 [MINOR][TESTS] Avoid NPE in an anonym SparkListener in 
DataFrameReaderWriterSuite
358ee499749 is described below

commit 358ee499749712a15776507a20cfb9b619b7d5d0
Author: Kent Yao 
AuthorDate: Tue Feb 21 21:07:52 2023 +0900

[MINOR][TESTS] Avoid NPE in an anonym SparkListener in 
DataFrameReaderWriterSuite

### What changes were proposed in this pull request?

Avoid the following NPE in an anonym SparkListener in 
DataFrameReaderWriterSuite, as job desc may be absent

```
java.lang.NullPointerException
at 
java.util.concurrent.ConcurrentLinkedQueue.checkNotNull(ConcurrentLinkedQueue.java:920)
at 
java.util.concurrent.ConcurrentLinkedQueue.offer(ConcurrentLinkedQueue.java:327)
at 
java.util.concurrent.ConcurrentLinkedQueue.add(ConcurrentLinkedQueue.java:297)
at 
org.apache.spark.sql.test.DataFrameReaderWriterSuite$$anon$2.onJobStart(DataFrameReaderWriterSuite.scala:1151)
at 
org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:37)
at 
org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28)
at 
org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
at 
org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
at 
org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:117)
at 
org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:101)
at 
org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:105)
at 
org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:105)
at 
scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
at 
org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:100)
at 
org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:96)
at 
org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1462)
at 
org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:96)

```

### Why are the changes needed?

Test Improvement

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

existing tests

Closes #40102 from yaooqinn/test-minor.

Authored-by: Kent Yao 
Signed-off-by: Hyukjin Kwon 
(cherry picked from commit 088ebdeea67dd509048a7559f1c92a3636e18ce6)
Signed-off-by: Hyukjin Kwon 
---
 .../scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala   | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala
index ea007c149dd..ccdba809292 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala
@@ -1132,7 +1132,8 @@ class DataFrameReaderWriterSuite extends QueryTest with 
SharedSparkSession with
 val jobDescriptions = new ConcurrentLinkedQueue[String]()
 val jobListener = new SparkListener {
   override def onJobStart(jobStart: SparkListenerJobStart): Unit = {
-
jobDescriptions.add(jobStart.properties.getProperty(SparkContext.SPARK_JOB_DESCRIPTION))
+val desc = 
jobStart.properties.getProperty(SparkContext.SPARK_JOB_DESCRIPTION)
+if (desc != null) jobDescriptions.add(desc)
   }
 }
 sparkContext.addSparkListener(jobListener)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



svn commit: r60241 - in /dev/spark/v3.4.0-rc1-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/R/articles/ _site/api/R/deps/ _site/api/R/deps/bootstrap-5.2.2/ _site/api/R/deps/jquery-3.6.0/ _site/api

2023-02-21 Thread xinrong
Author: xinrong
Date: Tue Feb 21 13:34:14 2023
New Revision: 60241

Log:
Apache Spark v3.4.0-rc1 docs


[This commit notification would consist of 2806 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-41765][SQL] Pull out v1 write metrics to WriteFiles

2023-02-21 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new a111a02de1a [SPARK-41765][SQL] Pull out v1 write metrics to WriteFiles
a111a02de1a is described below

commit a111a02de1a814c5f335e0bcac4cffb0515557dc
Author: ulysses-you 
AuthorDate: Tue Feb 21 21:48:00 2023 +0800

[SPARK-41765][SQL] Pull out v1 write metrics to WriteFiles

### What changes were proposed in this pull request?

Pull out the metrics which is in `InsertIntoHiveTable` and 
`InsertIntoHadoopFsRelationCommand` to `WriteFiles`.

### Why are the changes needed?

Move metrics to the right place.

### Does this PR introduce _any_ user-facing change?

yes, the SQL UI metrics changed from `V1WriteCommand` to `WriteFiles`

with `spark.sql.optimizer.plannedWrite.enabled` disable and enable:

// disable
https://user-images.githubusercontent.com/12025282/220267296-62d2deef-f8d8-4e71-adc6-23e416f0777c.png";>

// enable
https://user-images.githubusercontent.com/12025282/220267151-dd1e7ed9-eb92-44a5-abdc-75f204ecf97e.png";>

### How was this patch tested?

fix and improve test

Closes #39428 from ulysses-you/SPARK-41765.

Authored-by: ulysses-you 
Signed-off-by: Wenchen Fan 
---
 .../sql/execution/command/DataWritingCommand.scala | 47 +++
 .../datasources/BasicWriteStatsTracker.scala   | 17 +-
 .../execution/datasources/FileFormatWriter.scala   | 69 +-
 .../sql/execution/datasources/WriteFiles.scala |  3 +
 .../BasicWriteJobStatsTrackerMetricSuite.scala | 19 +++---
 .../sql/execution/metric/SQLMetricsSuite.scala | 28 +
 .../sql/execution/metric/SQLMetricsTestUtils.scala | 21 ++-
 .../hive/execution/InsertIntoHiveDirCommand.scala  |  7 +++
 .../spark/sql/hive/execution/SQLMetricsSuite.scala | 42 -
 9 files changed, 159 insertions(+), 94 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/DataWritingCommand.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/DataWritingCommand.scala
index 338ce8cac42..58c3fca4ad7 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/DataWritingCommand.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/DataWritingCommand.scala
@@ -21,14 +21,13 @@ import java.net.URI
 
 import org.apache.hadoop.conf.Configuration
 
-import org.apache.spark.SparkContext
 import org.apache.spark.sql.{Row, SaveMode, SparkSession}
 import org.apache.spark.sql.catalyst.expressions.Attribute
 import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, UnaryCommand}
 import org.apache.spark.sql.errors.QueryCompilationErrors
-import org.apache.spark.sql.execution.{SparkPlan, SQLExecution}
+import org.apache.spark.sql.execution.SparkPlan
 import org.apache.spark.sql.execution.datasources.BasicWriteJobStatsTracker
-import org.apache.spark.sql.execution.metric.{SQLMetric, SQLMetrics}
+import org.apache.spark.sql.execution.metric.SQLMetric
 import org.apache.spark.sql.internal.SQLConf
 import org.apache.spark.util.SerializableConfiguration
 
@@ -52,11 +51,19 @@ trait DataWritingCommand extends UnaryCommand {
   def outputColumns: Seq[Attribute] =
 DataWritingCommand.logicalPlanOutputWithNames(query, outputColumnNames)
 
-  lazy val metrics: Map[String, SQLMetric] = BasicWriteJobStatsTracker.metrics
+  lazy val metrics: Map[String, SQLMetric] = {
+// If planned write is enable, we have pulled out write files metrics from 
`V1WriteCommand`
+// to `WriteFiles`. `DataWritingCommand` should only holds the task commit 
metric and driver
+// commit metric.
+if (conf.getConf(SQLConf.PLANNED_WRITE_ENABLED)) {
+  BasicWriteJobStatsTracker.writeCommitMetrics
+} else {
+  BasicWriteJobStatsTracker.metrics
+}
+  }
 
   def basicWriteJobStatsTracker(hadoopConf: Configuration): 
BasicWriteJobStatsTracker = {
-val serializableHadoopConf = new SerializableConfiguration(hadoopConf)
-new BasicWriteJobStatsTracker(serializableHadoopConf, metrics)
+DataWritingCommand.basicWriteJobStatsTracker(metrics, hadoopConf)
   }
 
   def run(sparkSession: SparkSession, child: SparkPlan): Seq[Row]
@@ -79,27 +86,6 @@ object DataWritingCommand {
 }
   }
 
-  /**
-   * When execute CTAS operators, Spark will use 
[[InsertIntoHadoopFsRelationCommand]]
-   * or [[InsertIntoHiveTable]] command to write data, they both inherit 
metrics from
-   * [[DataWritingCommand]], but after running 
[[InsertIntoHadoopFsRelationCommand]]
-   * or [[InsertIntoHiveTable]], we only update metrics in these two command 
through
-   * [[BasicWriteJobStatsTracker]], we also need to propogate metrics to the 
command
-   * that actually calls [[InsertIntoHadoopFsRel

[spark] branch branch-3.4 updated: [SPARK-41765][SQL] Pull out v1 write metrics to WriteFiles

2023-02-21 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new 6fda64a27c3 [SPARK-41765][SQL] Pull out v1 write metrics to WriteFiles
6fda64a27c3 is described below

commit 6fda64a27c3dd65b191a973aaf24ca96a65f97b9
Author: ulysses-you 
AuthorDate: Tue Feb 21 21:48:00 2023 +0800

[SPARK-41765][SQL] Pull out v1 write metrics to WriteFiles

### What changes were proposed in this pull request?

Pull out the metrics which is in `InsertIntoHiveTable` and 
`InsertIntoHadoopFsRelationCommand` to `WriteFiles`.

### Why are the changes needed?

Move metrics to the right place.

### Does this PR introduce _any_ user-facing change?

yes, the SQL UI metrics changed from `V1WriteCommand` to `WriteFiles`

with `spark.sql.optimizer.plannedWrite.enabled` disable and enable:

// disable
https://user-images.githubusercontent.com/12025282/220267296-62d2deef-f8d8-4e71-adc6-23e416f0777c.png";>

// enable
https://user-images.githubusercontent.com/12025282/220267151-dd1e7ed9-eb92-44a5-abdc-75f204ecf97e.png";>

### How was this patch tested?

fix and improve test

Closes #39428 from ulysses-you/SPARK-41765.

Authored-by: ulysses-you 
Signed-off-by: Wenchen Fan 
(cherry picked from commit a111a02de1a814c5f335e0bcac4cffb0515557dc)
Signed-off-by: Wenchen Fan 
---
 .../sql/execution/command/DataWritingCommand.scala | 47 +++
 .../datasources/BasicWriteStatsTracker.scala   | 17 +-
 .../execution/datasources/FileFormatWriter.scala   | 69 +-
 .../sql/execution/datasources/WriteFiles.scala |  3 +
 .../BasicWriteJobStatsTrackerMetricSuite.scala | 19 +++---
 .../sql/execution/metric/SQLMetricsSuite.scala | 28 +
 .../sql/execution/metric/SQLMetricsTestUtils.scala | 21 ++-
 .../hive/execution/InsertIntoHiveDirCommand.scala  |  7 +++
 .../spark/sql/hive/execution/SQLMetricsSuite.scala | 42 -
 9 files changed, 159 insertions(+), 94 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/DataWritingCommand.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/DataWritingCommand.scala
index 338ce8cac42..58c3fca4ad7 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/DataWritingCommand.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/DataWritingCommand.scala
@@ -21,14 +21,13 @@ import java.net.URI
 
 import org.apache.hadoop.conf.Configuration
 
-import org.apache.spark.SparkContext
 import org.apache.spark.sql.{Row, SaveMode, SparkSession}
 import org.apache.spark.sql.catalyst.expressions.Attribute
 import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, UnaryCommand}
 import org.apache.spark.sql.errors.QueryCompilationErrors
-import org.apache.spark.sql.execution.{SparkPlan, SQLExecution}
+import org.apache.spark.sql.execution.SparkPlan
 import org.apache.spark.sql.execution.datasources.BasicWriteJobStatsTracker
-import org.apache.spark.sql.execution.metric.{SQLMetric, SQLMetrics}
+import org.apache.spark.sql.execution.metric.SQLMetric
 import org.apache.spark.sql.internal.SQLConf
 import org.apache.spark.util.SerializableConfiguration
 
@@ -52,11 +51,19 @@ trait DataWritingCommand extends UnaryCommand {
   def outputColumns: Seq[Attribute] =
 DataWritingCommand.logicalPlanOutputWithNames(query, outputColumnNames)
 
-  lazy val metrics: Map[String, SQLMetric] = BasicWriteJobStatsTracker.metrics
+  lazy val metrics: Map[String, SQLMetric] = {
+// If planned write is enable, we have pulled out write files metrics from 
`V1WriteCommand`
+// to `WriteFiles`. `DataWritingCommand` should only holds the task commit 
metric and driver
+// commit metric.
+if (conf.getConf(SQLConf.PLANNED_WRITE_ENABLED)) {
+  BasicWriteJobStatsTracker.writeCommitMetrics
+} else {
+  BasicWriteJobStatsTracker.metrics
+}
+  }
 
   def basicWriteJobStatsTracker(hadoopConf: Configuration): 
BasicWriteJobStatsTracker = {
-val serializableHadoopConf = new SerializableConfiguration(hadoopConf)
-new BasicWriteJobStatsTracker(serializableHadoopConf, metrics)
+DataWritingCommand.basicWriteJobStatsTracker(metrics, hadoopConf)
   }
 
   def run(sparkSession: SparkSession, child: SparkPlan): Seq[Row]
@@ -79,27 +86,6 @@ object DataWritingCommand {
 }
   }
 
-  /**
-   * When execute CTAS operators, Spark will use 
[[InsertIntoHadoopFsRelationCommand]]
-   * or [[InsertIntoHiveTable]] command to write data, they both inherit 
metrics from
-   * [[DataWritingCommand]], but after running 
[[InsertIntoHadoopFsRelationCommand]]
-   * or [[InsertIntoHiveTable]], we only update metrics in these two command 
through
-   * [[BasicWriteJobSt

[spark] branch master updated (b36d1484c1a -> 5097c669ffa)

2023-02-21 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from b36d1484c1a [SPARK-42495][CONNECT] Scala Client add Misc, String, and 
Date/Time functions
 add 5097c669ffa [SPARK-42002][CONNECT][FOLLOW-UP] Add Required/Optional 
notions to writer v2 proto

No new revisions were added by this update.

Summary of changes:
 .../connect/common/src/main/protobuf/spark/connect/commands.proto| 5 +++--
 python/pyspark/sql/connect/proto/commands_pb2.pyi| 5 +++--
 2 files changed, 6 insertions(+), 4 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.4 updated: [SPARK-42002][CONNECT][FOLLOW-UP] Add Required/Optional notions to writer v2 proto

2023-02-21 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new 9a7881a0192 [SPARK-42002][CONNECT][FOLLOW-UP] Add Required/Optional 
notions to writer v2 proto
9a7881a0192 is described below

commit 9a7881a019225564439adfdcf556b0fab9928709
Author: Rui Wang 
AuthorDate: Wed Feb 22 08:39:51 2023 +0900

[SPARK-42002][CONNECT][FOLLOW-UP] Add Required/Optional notions to writer 
v2 proto

### What changes were proposed in this pull request?

Follow existing proto style guide, we should always add `Required/Optional` 
to proto documentation.

### Why are the changes needed?

Improve documentation.

### Does this PR introduce _any_ user-facing change?

NO

### How was this patch tested?

N/A

Closes #40106 from amaliujia/rw-fix-proto.

Authored-by: Rui Wang 
Signed-off-by: Hyukjin Kwon 
(cherry picked from commit 5097c669ffae23997db00b8f2eec89abb4f33cfc)
Signed-off-by: Hyukjin Kwon 
---
 .../connect/common/src/main/protobuf/spark/connect/commands.proto| 5 +++--
 python/pyspark/sql/connect/proto/commands_pb2.pyi| 5 +++--
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git 
a/connector/connect/common/src/main/protobuf/spark/connect/commands.proto 
b/connector/connect/common/src/main/protobuf/spark/connect/commands.proto
index 88d7e81beec..7567b0e3d7c 100644
--- a/connector/connect/common/src/main/protobuf/spark/connect/commands.proto
+++ b/connector/connect/common/src/main/protobuf/spark/connect/commands.proto
@@ -123,10 +123,10 @@ message WriteOperationV2 {
   // (Required) The output of the `input` relation will be persisted according 
to the options.
   Relation input = 1;
 
-  // The destination of the write operation must be either a path or a table.
+  // (Required) The destination of the write operation must be either a path 
or a table.
   string table_name = 2;
 
-  // A provider for the underlying output data source. Spark's default catalog 
supports
+  // (Optional) A provider for the underlying output data source. Spark's 
default catalog supports
   // "parquet", "json", etc.
   string provider = 3;
 
@@ -140,6 +140,7 @@ message WriteOperationV2 {
   // (Optional) A list of table properties.
   map table_properties = 6;
 
+  // (Required) Write mode.
   Mode mode = 7;
 
   enum Mode {
diff --git a/python/pyspark/sql/connect/proto/commands_pb2.pyi 
b/python/pyspark/sql/connect/proto/commands_pb2.pyi
index 46d1921efc2..c102624ca44 100644
--- a/python/pyspark/sql/connect/proto/commands_pb2.pyi
+++ b/python/pyspark/sql/connect/proto/commands_pb2.pyi
@@ -474,9 +474,9 @@ class WriteOperationV2(google.protobuf.message.Message):
 def input(self) -> pyspark.sql.connect.proto.relations_pb2.Relation:
 """(Required) The output of the `input` relation will be persisted 
according to the options."""
 table_name: builtins.str
-"""The destination of the write operation must be either a path or a 
table."""
+"""(Required) The destination of the write operation must be either a path 
or a table."""
 provider: builtins.str
-"""A provider for the underlying output data source. Spark's default 
catalog supports
+"""(Optional) A provider for the underlying output data source. Spark's 
default catalog supports
 "parquet", "json", etc.
 """
 @property
@@ -497,6 +497,7 @@ class WriteOperationV2(google.protobuf.message.Message):
 ) -> google.protobuf.internal.containers.ScalarMap[builtins.str, 
builtins.str]:
 """(Optional) A list of table properties."""
 mode: global___WriteOperationV2.Mode.ValueType
+"""(Required) Write mode."""
 @property
 def overwrite_condition(self) -> 
pyspark.sql.connect.proto.expressions_pb2.Expression:
 """(Optional) A condition for overwrite saving mode"""


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-42514][CONNECT] Scala Client add partition transforms functions

2023-02-21 Thread hvanhovell
This is an automated email from the ASF dual-hosted git repository.

hvanhovell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 4f3afdcd756 [SPARK-42514][CONNECT] Scala Client add partition 
transforms functions
4f3afdcd756 is described below

commit 4f3afdcd7561db38f3c0427d31db4f27fa94a83c
Author: yangjie01 
AuthorDate: Tue Feb 21 20:47:42 2023 -0400

[SPARK-42514][CONNECT] Scala Client add partition transforms functions

### What changes were proposed in this pull request?
This PR aims add the partition transforms functions to the Scala spark 
connect client.

### Why are the changes needed?
Provide same APIs in the Scala spark connect client as in the original 
Dataset API.

### Does this PR introduce _any_ user-facing change?
Yes, it adds new for functions to the Spark Connect Scala client.

### How was this patch tested?

- Add new test
- Manual checked `connect-client-jvm` and `connect` with Scala-2.13

Closes #40105 from LuciferYang/partition-transforms-functions.

Authored-by: yangjie01 
Signed-off-by: Herman van Hovell 
---
 .../scala/org/apache/spark/sql/functions.scala | 58 ++
 .../org/apache/spark/sql/FunctionTestSuite.scala   |  1 +
 .../apache/spark/sql/PlanGenerationTestSuite.scala | 20 
 .../explain-results/function_bucket.explain|  2 +
 .../explain-results/function_days.explain  |  2 +
 .../explain-results/function_hours.explain |  2 +
 .../explain-results/function_months.explain|  2 +
 .../explain-results/function_years.explain |  2 +
 .../query-tests/queries/function_bucket.json   | 23 +
 .../query-tests/queries/function_bucket.proto.bin  |  5 ++
 .../query-tests/queries/function_days.json | 19 +++
 .../query-tests/queries/function_days.proto.bin|  4 ++
 .../query-tests/queries/function_hours.json| 19 +++
 .../query-tests/queries/function_hours.proto.bin   |  4 ++
 .../query-tests/queries/function_months.json   | 19 +++
 .../query-tests/queries/function_months.proto.bin  |  4 ++
 .../query-tests/queries/function_years.json| 19 +++
 .../query-tests/queries/function_years.proto.bin   |  4 ++
 18 files changed, 209 insertions(+)

diff --git 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/functions.scala
 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/functions.scala
index 6dffa8d3ea1..4996b5033e3 100644
--- 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/functions.scala
+++ 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/functions.scala
@@ -3591,6 +3591,64 @@ object functions {
*/
   def timestamp_seconds(e: Column): Column = Column.fn("timestamp_seconds", e)
 
+  
//
+  // Partition Transforms functions
+  
//
+
+  /**
+   * A transform for timestamps and dates to partition data into years.
+   *
+   * @group partition_transforms
+   * @since 3.4.0
+   */
+  def years(e: Column): Column =
+Column.fn("years", e)
+
+  /**
+   * A transform for timestamps and dates to partition data into months.
+   *
+   * @group partition_transforms
+   * @since 3.4.0
+   */
+  def months(e: Column): Column =
+Column.fn("months", e)
+
+  /**
+   * A transform for timestamps and dates to partition data into days.
+   *
+   * @group partition_transforms
+   * @since 3.4.0
+   */
+  def days(e: Column): Column =
+Column.fn("days", e)
+
+  /**
+   * A transform for timestamps to partition data into hours.
+   *
+   * @group partition_transforms
+   * @since 3.4.0
+   */
+  def hours(e: Column): Column =
+Column.fn("hours", e)
+
+  /**
+   * A transform for any type that partitions by a hash of the input column.
+   *
+   * @group partition_transforms
+   * @since 3.4.0
+   */
+  def bucket(numBuckets: Column, e: Column): Column =
+Column.fn("bucket", numBuckets, e)
+
+  /**
+   * A transform for any type that partitions by a hash of the input column.
+   *
+   * @group partition_transforms
+   * @since 3.4.0
+   */
+  def bucket(numBuckets: Int, e: Column): Column =
+Column.fn("bucket", lit(numBuckets), e)
+
   
//
   // Scala UDF functions
   
//
diff --git 
a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/FunctionTestSuite.scala
 
b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/FunctionTestSuite.scala
index 1e4960ef9b2..d600ac432a2 100644
--- 
a/connector/connect/client/jvm

[spark] branch branch-3.4 updated: [SPARK-42514][CONNECT] Scala Client add partition transforms functions

2023-02-21 Thread hvanhovell
This is an automated email from the ASF dual-hosted git repository.

hvanhovell pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new 5c8ddb448b3 [SPARK-42514][CONNECT] Scala Client add partition 
transforms functions
5c8ddb448b3 is described below

commit 5c8ddb448b3a9ec42a80a3f23a1516cc3d177b23
Author: yangjie01 
AuthorDate: Tue Feb 21 20:47:42 2023 -0400

[SPARK-42514][CONNECT] Scala Client add partition transforms functions

### What changes were proposed in this pull request?
This PR aims add the partition transforms functions to the Scala spark 
connect client.

### Why are the changes needed?
Provide same APIs in the Scala spark connect client as in the original 
Dataset API.

### Does this PR introduce _any_ user-facing change?
Yes, it adds new for functions to the Spark Connect Scala client.

### How was this patch tested?

- Add new test
- Manual checked `connect-client-jvm` and `connect` with Scala-2.13

Closes #40105 from LuciferYang/partition-transforms-functions.

Authored-by: yangjie01 
Signed-off-by: Herman van Hovell 
(cherry picked from commit 4f3afdcd7561db38f3c0427d31db4f27fa94a83c)
Signed-off-by: Herman van Hovell 
---
 .../scala/org/apache/spark/sql/functions.scala | 58 ++
 .../org/apache/spark/sql/FunctionTestSuite.scala   |  1 +
 .../apache/spark/sql/PlanGenerationTestSuite.scala | 20 
 .../explain-results/function_bucket.explain|  2 +
 .../explain-results/function_days.explain  |  2 +
 .../explain-results/function_hours.explain |  2 +
 .../explain-results/function_months.explain|  2 +
 .../explain-results/function_years.explain |  2 +
 .../query-tests/queries/function_bucket.json   | 23 +
 .../query-tests/queries/function_bucket.proto.bin  |  5 ++
 .../query-tests/queries/function_days.json | 19 +++
 .../query-tests/queries/function_days.proto.bin|  4 ++
 .../query-tests/queries/function_hours.json| 19 +++
 .../query-tests/queries/function_hours.proto.bin   |  4 ++
 .../query-tests/queries/function_months.json   | 19 +++
 .../query-tests/queries/function_months.proto.bin  |  4 ++
 .../query-tests/queries/function_years.json| 19 +++
 .../query-tests/queries/function_years.proto.bin   |  4 ++
 18 files changed, 209 insertions(+)

diff --git 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/functions.scala
 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/functions.scala
index 6dffa8d3ea1..4996b5033e3 100644
--- 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/functions.scala
+++ 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/functions.scala
@@ -3591,6 +3591,64 @@ object functions {
*/
   def timestamp_seconds(e: Column): Column = Column.fn("timestamp_seconds", e)
 
+  
//
+  // Partition Transforms functions
+  
//
+
+  /**
+   * A transform for timestamps and dates to partition data into years.
+   *
+   * @group partition_transforms
+   * @since 3.4.0
+   */
+  def years(e: Column): Column =
+Column.fn("years", e)
+
+  /**
+   * A transform for timestamps and dates to partition data into months.
+   *
+   * @group partition_transforms
+   * @since 3.4.0
+   */
+  def months(e: Column): Column =
+Column.fn("months", e)
+
+  /**
+   * A transform for timestamps and dates to partition data into days.
+   *
+   * @group partition_transforms
+   * @since 3.4.0
+   */
+  def days(e: Column): Column =
+Column.fn("days", e)
+
+  /**
+   * A transform for timestamps to partition data into hours.
+   *
+   * @group partition_transforms
+   * @since 3.4.0
+   */
+  def hours(e: Column): Column =
+Column.fn("hours", e)
+
+  /**
+   * A transform for any type that partitions by a hash of the input column.
+   *
+   * @group partition_transforms
+   * @since 3.4.0
+   */
+  def bucket(numBuckets: Column, e: Column): Column =
+Column.fn("bucket", numBuckets, e)
+
+  /**
+   * A transform for any type that partitions by a hash of the input column.
+   *
+   * @group partition_transforms
+   * @since 3.4.0
+   */
+  def bucket(numBuckets: Int, e: Column): Column =
+Column.fn("bucket", lit(numBuckets), e)
+
   
//
   // Scala UDF functions
   
//
diff --git 
a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/FunctionTestSuite.scala
 
b/connector/connect/client/jvm/src/test/scala

[spark] branch master updated: [SPARK-42406][SQL] Fix check for missing required fields of to_protobuf

2023-02-21 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new fb5647732fa [SPARK-42406][SQL] Fix check for missing required fields 
of to_protobuf
fb5647732fa is described below

commit fb5647732fa2f49838f803f67ea11b20fc14282b
Author: Raghu Angadi 
AuthorDate: Tue Feb 21 17:34:16 2023 -0800

[SPARK-42406][SQL] Fix check for missing required fields of to_protobuf

### What changes were proposed in this pull request?

Protobuf serializer (used in `to_protobuf()`) should error if non-nullable 
fields (i.e. protobuf `required` fields) are present in the schema of the 
catalyst record being converted to a protobuf.

But `isNullable()` method used for this check returns opposite (see PR 
comment in the diff).  As a result, Serializer incorrectly requires the fields 
that are optional. This PR fixes this check (see PR comment in the diff).

This also requires corresponding fix for couple of unit tests. In order use 
a Protobuf message with a `required` field, Protobuf version 2 file 
`proto2_messages.proto` is added.
Two tests are updated to verify missing required fields results in an error.

### Why are the changes needed?

This is need to fix a bug where we were incorrectly enforcing a schema 
check on optional fields rather than on required fields.

### Does this PR introduce _any_ user-facing change?

It fixes a bug, and gives more flexibility for user queries.

### How was this patch tested?
 - Updated unit tests

Closes #40080 from rangadi/fix-required-field-check.

Authored-by: Raghu Angadi 
Signed-off-by: Gengliang Wang 
---
 .../spark/sql/protobuf/utils/ProtobufUtils.scala   |  7 +--
 .../test/resources/protobuf/proto2_messages.desc   |  8 +++
 .../protobuf/proto2_messages.proto}| 28 +--
 .../src/test/resources/protobuf/serde_suite.proto  | 11 
 .../spark/sql/protobuf/ProtobufSerdeSuite.scala| 58 --
 .../spark/sql/protobuf/ProtobufTestBase.scala  |  4 ++
 6 files changed, 69 insertions(+), 47 deletions(-)

diff --git 
a/connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/ProtobufUtils.scala
 
b/connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/ProtobufUtils.scala
index 49313a3ce91..bf207d6068f 100644
--- 
a/connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/ProtobufUtils.scala
+++ 
b/connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/ProtobufUtils.scala
@@ -100,7 +100,7 @@ private[sql] object ProtobufUtils extends Logging {
  */
 def validateNoExtraRequiredProtoFields(): Unit = {
   val extraFields = protoFieldArray.toSet -- 
matchedFields.map(_.fieldDescriptor)
-  extraFields.filterNot(isNullable).foreach { extraField =>
+  extraFields.filter(_.isRequired).foreach { extraField =>
 throw QueryCompilationErrors.cannotFindProtobufFieldInCatalystError(
   toFieldStr(protoPath :+ extraField.getName()))
   }
@@ -283,9 +283,4 @@ private[sql] object ProtobufUtils extends Logging {
 case Seq() => "top-level record"
 case n => s"field '${n.mkString(".")}'"
   }
-
-  /** Return true if `fieldDescriptor` is optional. */
-  private[protobuf] def isNullable(fieldDescriptor: FieldDescriptor): Boolean =
-!fieldDescriptor.isOptional
-
 }
diff --git 
a/connector/protobuf/src/test/resources/protobuf/proto2_messages.desc 
b/connector/protobuf/src/test/resources/protobuf/proto2_messages.desc
new file mode 100644
index 000..a9e4099a7f2
--- /dev/null
+++ b/connector/protobuf/src/test/resources/protobuf/proto2_messages.desc
@@ -0,0 +1,8 @@
+
+�
+proto2_messages.proto$org.apache.spark.sql.protobuf.protos"@
+FoobarWithRequiredFieldBar
+foo (  Rfoo
+bar (Rbar"�
+ NestedFoobarWithRequiredFieldBare
+nested_foobar (
2...@.org.apache.spark.sql.protobuf.protos.FoobarWithRequiredFieldBarR
nestedFoobarBBProto2Messages
\ No newline at end of file
diff --git 
a/connector/protobuf/src/test/scala/org/apache/spark/sql/protobuf/ProtobufTestBase.scala
 b/connector/protobuf/src/test/resources/protobuf/proto2_messages.proto
similarity index 57%
copy from 
connector/protobuf/src/test/scala/org/apache/spark/sql/protobuf/ProtobufTestBase.scala
copy to connector/protobuf/src/test/resources/protobuf/proto2_messages.proto
index 831b4a26c06..a5d09df8514 100644
--- 
a/connector/protobuf/src/test/scala/org/apache/spark/sql/protobuf/ProtobufTestBase.scala
+++ b/connector/protobuf/src/test/resources/protobuf/proto2_messages.proto
@@ -15,23 +15,19 @@
  * limitations under the License.
  */
 
-package org.apache.spark.sql.protobuf
+syntax = "proto2";
 
-import org.apache.spark.sql.test.SQLTestUtils
+package org.apache.spark.sql.protobuf.protos;
+op

[spark] branch branch-3.4 updated: [SPARK-42406][SQL] Fix check for missing required fields of to_protobuf

2023-02-21 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new 655992841e6 [SPARK-42406][SQL] Fix check for missing required fields 
of to_protobuf
655992841e6 is described below

commit 655992841e67f91e30bf59bd6cc48c9d4698a0c7
Author: Raghu Angadi 
AuthorDate: Tue Feb 21 17:34:16 2023 -0800

[SPARK-42406][SQL] Fix check for missing required fields of to_protobuf

### What changes were proposed in this pull request?

Protobuf serializer (used in `to_protobuf()`) should error if non-nullable 
fields (i.e. protobuf `required` fields) are present in the schema of the 
catalyst record being converted to a protobuf.

But `isNullable()` method used for this check returns opposite (see PR 
comment in the diff).  As a result, Serializer incorrectly requires the fields 
that are optional. This PR fixes this check (see PR comment in the diff).

This also requires corresponding fix for couple of unit tests. In order use 
a Protobuf message with a `required` field, Protobuf version 2 file 
`proto2_messages.proto` is added.
Two tests are updated to verify missing required fields results in an error.

### Why are the changes needed?

This is need to fix a bug where we were incorrectly enforcing a schema 
check on optional fields rather than on required fields.

### Does this PR introduce _any_ user-facing change?

It fixes a bug, and gives more flexibility for user queries.

### How was this patch tested?
 - Updated unit tests

Closes #40080 from rangadi/fix-required-field-check.

Authored-by: Raghu Angadi 
Signed-off-by: Gengliang Wang 
(cherry picked from commit fb5647732fa2f49838f803f67ea11b20fc14282b)
Signed-off-by: Gengliang Wang 
---
 .../spark/sql/protobuf/utils/ProtobufUtils.scala   |  7 +--
 .../test/resources/protobuf/proto2_messages.desc   |  8 +++
 .../protobuf/proto2_messages.proto}| 28 +--
 .../src/test/resources/protobuf/serde_suite.proto  | 11 
 .../spark/sql/protobuf/ProtobufSerdeSuite.scala| 58 --
 .../spark/sql/protobuf/ProtobufTestBase.scala  |  4 ++
 6 files changed, 69 insertions(+), 47 deletions(-)

diff --git 
a/connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/ProtobufUtils.scala
 
b/connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/ProtobufUtils.scala
index 49313a3ce91..bf207d6068f 100644
--- 
a/connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/ProtobufUtils.scala
+++ 
b/connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/ProtobufUtils.scala
@@ -100,7 +100,7 @@ private[sql] object ProtobufUtils extends Logging {
  */
 def validateNoExtraRequiredProtoFields(): Unit = {
   val extraFields = protoFieldArray.toSet -- 
matchedFields.map(_.fieldDescriptor)
-  extraFields.filterNot(isNullable).foreach { extraField =>
+  extraFields.filter(_.isRequired).foreach { extraField =>
 throw QueryCompilationErrors.cannotFindProtobufFieldInCatalystError(
   toFieldStr(protoPath :+ extraField.getName()))
   }
@@ -283,9 +283,4 @@ private[sql] object ProtobufUtils extends Logging {
 case Seq() => "top-level record"
 case n => s"field '${n.mkString(".")}'"
   }
-
-  /** Return true if `fieldDescriptor` is optional. */
-  private[protobuf] def isNullable(fieldDescriptor: FieldDescriptor): Boolean =
-!fieldDescriptor.isOptional
-
 }
diff --git 
a/connector/protobuf/src/test/resources/protobuf/proto2_messages.desc 
b/connector/protobuf/src/test/resources/protobuf/proto2_messages.desc
new file mode 100644
index 000..a9e4099a7f2
--- /dev/null
+++ b/connector/protobuf/src/test/resources/protobuf/proto2_messages.desc
@@ -0,0 +1,8 @@
+
+�
+proto2_messages.proto$org.apache.spark.sql.protobuf.protos"@
+FoobarWithRequiredFieldBar
+foo (  Rfoo
+bar (Rbar"�
+ NestedFoobarWithRequiredFieldBare
+nested_foobar (
2...@.org.apache.spark.sql.protobuf.protos.FoobarWithRequiredFieldBarR
nestedFoobarBBProto2Messages
\ No newline at end of file
diff --git 
a/connector/protobuf/src/test/scala/org/apache/spark/sql/protobuf/ProtobufTestBase.scala
 b/connector/protobuf/src/test/resources/protobuf/proto2_messages.proto
similarity index 57%
copy from 
connector/protobuf/src/test/scala/org/apache/spark/sql/protobuf/ProtobufTestBase.scala
copy to connector/protobuf/src/test/resources/protobuf/proto2_messages.proto
index 831b4a26c06..a5d09df8514 100644
--- 
a/connector/protobuf/src/test/scala/org/apache/spark/sql/protobuf/ProtobufTestBase.scala
+++ b/connector/protobuf/src/test/resources/protobuf/proto2_messages.proto
@@ -15,23 +15,19 @@
  * limitations under the License.
  */
 
-package org.apache.spark.sql.protobuf
+sy

[spark] branch master updated: [SPARK-41775][PYTHON][FOLLOW-UP] Updating docs for readability

2023-02-21 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new ba24dcec42b [SPARK-41775][PYTHON][FOLLOW-UP] Updating docs for 
readability
ba24dcec42b is described below

commit ba24dcec42bcd45caee5a4866137bc352cba02ef
Author: Rithwik Ediga Lakhamsani 
AuthorDate: Wed Feb 22 11:30:43 2023 +0900

[SPARK-41775][PYTHON][FOLLOW-UP] Updating docs for readability

### What changes were proposed in this pull request?

Added minor UI fixes.

https://user-images.githubusercontent.com/81988348/220488925-eda62d80-d54d-41e9-a9ec-53d02b6fb94d.png";>
https://user-images.githubusercontent.com/81988348/220488948-929b1c35-4da7-4317-9883-078c2a57896a.png";>
https://user-images.githubusercontent.com/81988348/220488975-fdc34ae5-a539-4557-993c-d740232b29b5.png";>

### Why are the changes needed?

For easy to read documentation.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

N/A

Closes #40110 from rithwik-db/docs-update-2.

Authored-by: Rithwik Ediga Lakhamsani 
Signed-off-by: Hyukjin Kwon 
---
 python/pyspark/ml/torch/distributor.py | 16 +++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/python/pyspark/ml/torch/distributor.py 
b/python/pyspark/ml/torch/distributor.py
index b062d743646..a0a9c5aa932 100644
--- a/python/pyspark/ml/torch/distributor.py
+++ b/python/pyspark/ml/torch/distributor.py
@@ -263,9 +263,23 @@ class TorchDistributor(Distributor):
 
 .. versionadded:: 3.4.0
 
+Parameters
+--
+num_processes : int, optional
+An integer that determines how many different concurrent
+tasks are allowed. We expect spark.task.gpus = 1 for GPU-enabled 
training. Default
+should be 1; we don't want to invoke multiple cores/gpus without 
explicit mention.
+local_mode : bool, optional
+A boolean that determines whether we are using the driver
+node for training. Default should be false; we don't want to invoke 
executors without
+explicit mention.
+use_gpu : bool, optional
+A boolean that indicates whether or not we are doing training
+on the GPU. Note that there are differences in how GPU-enabled code 
looks like and
+how CPU-specific code looks like.
+
 Examples
 
-
 Run PyTorch Training locally on GPU (using a PyTorch native function)
 
 >>> def train(learning_rate):


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.4 updated: [SPARK-41775][PYTHON][FOLLOW-UP] Updating docs for readability

2023-02-21 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new df800d01679 [SPARK-41775][PYTHON][FOLLOW-UP] Updating docs for 
readability
df800d01679 is described below

commit df800d016794f51ee18ccb6bfbab4ee5e8cae796
Author: Rithwik Ediga Lakhamsani 
AuthorDate: Wed Feb 22 11:30:43 2023 +0900

[SPARK-41775][PYTHON][FOLLOW-UP] Updating docs for readability

### What changes were proposed in this pull request?

Added minor UI fixes.

https://user-images.githubusercontent.com/81988348/220488925-eda62d80-d54d-41e9-a9ec-53d02b6fb94d.png";>
https://user-images.githubusercontent.com/81988348/220488948-929b1c35-4da7-4317-9883-078c2a57896a.png";>
https://user-images.githubusercontent.com/81988348/220488975-fdc34ae5-a539-4557-993c-d740232b29b5.png";>

### Why are the changes needed?

For easy to read documentation.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

N/A

Closes #40110 from rithwik-db/docs-update-2.

Authored-by: Rithwik Ediga Lakhamsani 
Signed-off-by: Hyukjin Kwon 
(cherry picked from commit ba24dcec42bcd45caee5a4866137bc352cba02ef)
Signed-off-by: Hyukjin Kwon 
---
 python/pyspark/ml/torch/distributor.py | 16 +++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/python/pyspark/ml/torch/distributor.py 
b/python/pyspark/ml/torch/distributor.py
index b062d743646..a0a9c5aa932 100644
--- a/python/pyspark/ml/torch/distributor.py
+++ b/python/pyspark/ml/torch/distributor.py
@@ -263,9 +263,23 @@ class TorchDistributor(Distributor):
 
 .. versionadded:: 3.4.0
 
+Parameters
+--
+num_processes : int, optional
+An integer that determines how many different concurrent
+tasks are allowed. We expect spark.task.gpus = 1 for GPU-enabled 
training. Default
+should be 1; we don't want to invoke multiple cores/gpus without 
explicit mention.
+local_mode : bool, optional
+A boolean that determines whether we are using the driver
+node for training. Default should be false; we don't want to invoke 
executors without
+explicit mention.
+use_gpu : bool, optional
+A boolean that indicates whether or not we are doing training
+on the GPU. Note that there are differences in how GPU-enabled code 
looks like and
+how CPU-specific code looks like.
+
 Examples
 
-
 Run PyTorch Training locally on GPU (using a PyTorch native function)
 
 >>> def train(learning_rate):


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (ba24dcec42b -> d09742b9557)

2023-02-21 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from ba24dcec42b [SPARK-41775][PYTHON][FOLLOW-UP] Updating docs for 
readability
 add d09742b9557 [SPARK-42524][BUILD] Upgrade numpy and pandas in the 
release Dockerfile

No new revisions were added by this update.

Summary of changes:
 dev/create-release/spark-rm/Dockerfile | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.4 updated: [SPARK-42524][BUILD] Upgrade numpy and pandas in the release Dockerfile

2023-02-21 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new d89481a710f [SPARK-42524][BUILD] Upgrade numpy and pandas in the 
release Dockerfile
d89481a710f is described below

commit d89481a710fbbbd1b2c846e67d0f3b1a31530601
Author: Xinrong Meng 
AuthorDate: Wed Feb 22 11:38:23 2023 +0900

[SPARK-42524][BUILD] Upgrade numpy and pandas in the release Dockerfile

### What changes were proposed in this pull request?
Upgrade pandas from 1.1.5 to 1.5.3, numpy from 1.19.4 to 1.20.3 in the 
Dockerfile used for Spark releases.

They are also what we use to cut `v3.4.0-rc1`.

### Why are the changes needed?
Otherwise, errors are raised as shown below when building release docs.
```
ImportError: Warning: Latest version of pandas (1.5.3) is required to 
generate the documentation; however, your version was 1.1.5

ImportError: this version of pandas is incompatible with numpy < 1.20.3
your numpy version is 1.19.4.
Please upgrade numpy to >= 1.20.3 to use this pandas version
```

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Manual tests.

Closes #40111 from xinrong-meng/docker_lib.

Authored-by: Xinrong Meng 
Signed-off-by: Hyukjin Kwon 
(cherry picked from commit d09742b955782fc9717aaa0a76f067ccdf241010)
Signed-off-by: Hyukjin Kwon 
---
 dev/create-release/spark-rm/Dockerfile | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/dev/create-release/spark-rm/Dockerfile 
b/dev/create-release/spark-rm/Dockerfile
index 38c64601882..6995928beae 100644
--- a/dev/create-release/spark-rm/Dockerfile
+++ b/dev/create-release/spark-rm/Dockerfile
@@ -42,8 +42,8 @@ ARG APT_INSTALL="apt-get install --no-install-recommends -y"
 #   We should use the latest Sphinx version once this is fixed.
 # TODO(SPARK-35375): Jinja2 3.0.0+ causes error when building with Sphinx.
 #   See also https://issues.apache.org/jira/browse/SPARK-35375.
-ARG PIP_PKGS="sphinx==3.0.4 mkdocs==1.1.2 numpy==1.19.4 
pydata_sphinx_theme==0.4.1 ipython==7.19.0 nbsphinx==0.8.0 numpydoc==1.1.0 
jinja2==2.11.3 twine==3.4.1 sphinx-plotly-directive==0.1.3 pandas==1.1.5 
pyarrow==3.0.0 plotly==5.4.0 markupsafe==2.0.1 docutils<0.17 grpcio==1.48.1 
protobuf==4.21.6 grpcio-status==1.48.1 googleapis-common-protos==1.56.4"
-ARG GEM_PKGS="bundler:2.2.9"
+ARG PIP_PKGS="sphinx==3.0.4 mkdocs==1.1.2 numpy==1.20.3 
pydata_sphinx_theme==0.4.1 ipython==7.19.0 nbsphinx==0.8.0 numpydoc==1.1.0 
jinja2==2.11.3 twine==3.4.1 sphinx-plotly-directive==0.1.3 pandas==1.5.3 
pyarrow==3.0.0 plotly==5.4.0 markupsafe==2.0.1 docutils<0.17 grpcio==1.48.1 
protobuf==4.21.6 grpcio-status==1.48.1 googleapis-common-protos==1.56.4"
+ARG GEM_PKGS="bundler:2.3.8"
 
 # Install extra needed repos and refresh.
 # - CRAN repo


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



svn commit: r60249 - /dev/spark/KEYS

2023-02-21 Thread xinrong
Author: xinrong
Date: Wed Feb 22 03:51:47 2023
New Revision: 60249

Log:
Update KEYS

Modified:
dev/spark/KEYS

Modified: dev/spark/KEYS
==
--- dev/spark/KEYS (original)
+++ dev/spark/KEYS Wed Feb 22 03:51:47 2023
@@ -1848,4 +1848,61 @@ P+3d/bY7eHLaFnkIuQR2dzaJti/nf2b/7VQHLm6H
 Y2wH1LgDJJsoBLPFNxhgTLjMlErwsZlacmXyogrmOS+ZvgQz/LZ1mIryTAkd1Gym
 JznYPjY83fSKkeCh
 =3Ggj
--END PGP PUBLIC KEY BLOCK-
\ No newline at end of file
+-END PGP PUBLIC KEY BLOCK-
+
+pub   rsa4096 2022-08-16 [SC]
+  0C33D35E1A9296B32CF31005ACD84F20930B47E8
+uid   [ultimate] Xinrong Meng (CODE SIGNING KEY) 
+sub   rsa4096 2022-08-16 [E]
+-BEGIN PGP PUBLIC KEY BLOCK-
+
+mQINBGL64s8BEADCeefEm9XB63o/xIGpnwurEL24h5LsZdA7k7juZ5C1Fu6m5amT
+0A1n49YncYv6jDQD8xh+eiZ11+mYEAzkmGD+aVEMQA0/Zrp0rMe22Ymq5fQHfRCO
+88sQl4PvmqaElcAswFz7RP+55GWSIfEbZIJhZQdukaVCZuC+Xpb68TAj2OSXZ+Mt
+m8RdJXIJpmD0P6R7bvY4LPZL8tY7wtnxUj1I9wRnXc0AnbPfI6gGyF+b0x54b4Ey
+2+sZ6tNH501I9hgdEOWj+nqQFZTTzZQPI1r3nPIA28T9VDOKi5dmoI6iXFjCWZ2N
+dmsw8GN+45V1udOgylE2Mop7URzOQYlqaFnJvXzO/nZhAqbetrMmZ6jmlbqLEq/D
+C8cgYFuMwER3oAC0OwpSz2HLCya95xHDdPqX+Iag0h0bbFBxSNpgzQiUk1mvSYXa
++7HGQ3rIfy7+87hA1BIHaN0L1oOw37UWk2IGDvS29JlGJ3SJDX5Ir5uBvW6k9So6
+xG9vT+l+R878rLcjJLJT4Me4pk4z8O4Uo+IY0uptiTYnvYRXBOw9wk9KpSckbr+s
+I2keVwa+0fui4c1ESwNHR8HviALho9skvwaCAP3TUZ43SHeDU840M9LwDWc6VNc1
+x30YbgYeKtyU1deh7pcBhykUJPrZ457OllG8SbnhAncwmf8TaJjUkQARAQAB
+tDRYaW5yb25nIE1lbmcgKENPREUgU0lHTklORyBLRVkpIDx4aW5yb25nQGFwYWNo
+ZS5vcmc+iQJOBBMBCAA4FiEEDDPTXhqSlrMs8xAFrNhPIJMLR+gFAmL64s8CGwMF
+CwkIBwIGFQoJCAsCBBYCAwECHgECF4AACgkQrNhPIJMLR+gNSRAAkhNM7vAFRwaX
+MachhS97+L2ZklerzeZuCP0zeYZ9gZloGUx+eM3MWOglUcKH0f6DjPitMMCr1Qbo
+OsENANTS5ZOp4r4rhbbNhYbA8Wbx8H+ZABmCuUNJMjmeVh3qL1WmHclApegqxiSH
+uc9xXB1RZOJH2pS2v7UXW2c/Y745oT/YxWX9hBeJUPWmg6M6jn1/osnqmUngXSvB
+HNzxzHT1gJJNEcRU3r5bKAJlLWBZzLO4pIgtFqIfpS79ieG54OwedrW3oqOheFKa
+LTYInFAdscmZwIo8jHakqf+UMu3H5dzABBRATDvcci7nBPi+J8F7qLvklzb1zd0L
+Ir/QnAy3zFUYUbwwRXDy0Gi0HsU5xP9QYT3pmtW3I+Xlwpso417XoE+1DYtizjbx
+FuJaSNs7K7VPaELezdvtFL0SGYNkpxz7EiVcW6TxmLsLBoNAeaKhHYtwhblQKznv
+6mEbjmiAo3oB68ghI+3xW2mZ+T+t3sgl5aNWiZ6RQx5v4liYc4vShmewcKGWvN7T
+RC5Ert0GxMJGsx7fIRAgWDOI1aMj5bx9H23d3RKxJWrRCXhSlg1lyzVj+GCrhYAy
+16/JH5ph0m+FCVwAP0GhHsZCQV1AT+YL7lgEZvmGq0ucDShc69lLh7qsxMg7zckk
+l66F14Imuz0EasVCdI3IwkuTFch9Quu5Ag0EYvrizwEQANpINEPd+Vio1D0opPBO
+Sa4keWk5IvvGETt6jUBemQten1gOB89Zba3E8ZgJpPobaThFrpsQJ9wNM8+KBHGm
+U+DTP+JC+65J9Eq6KA8qcH2jn3xKBWipWUACKUCvpFSNq63f3+RVbAyTYdykRhEU
+Ih+7eFtl3X0Q6v92TMZL26euXqt73UoOsoulKEmfSyhiQBQX7WNCtq3JR/mZ4+OA
+/N3J7qw+emvKG3t8h3/5CtpZWEMaJwaGyyENScsw5KEOYjl9o11mMeYRYfZ0n0h7
+DA8BmBl/k71+UvdopdzuwjRib02uZfdCC15tltLpoVeL/pa0GRmTRuCJARwjDD95
+xbrrYYqw2wD6l3Mtv/EooIBdzGpP15VnD4DFC5W9vxnxuEfSnX0DxCObsd6MCzZw
+GOiF4HudfFzB2SiE/OXNaAxdpSD9C8n0Y3ac74dk6uamzCkSnCjzzAOytFZY18fi
+N5ihDA9+2TeEOL0RVrQw0Mdc4X80A1dlCJ6Gh1Py4WOtDxB5UmSY2olvV6p5pRRD
+1HEnM9bivPdEErYpUI72K4L5feXFxt/obQ0rZMmmnYMldAcPcqsTMVgPWZICK/z0
+X/SrOR0YEa28XA+V69o4TwPR77oUK6t3SiFzAi3VmQtAP6NkqL+FNMa0V1ZiEPse
+lZhKVziNh5Jb8bnkQA6+9Md3ABEBAAGJAjYEGAEIACAWIQQMM9NeGpKWsyzzEAWs
+2E8gkwtH6AUCYvrizwIbDAAKCRCs2E8gkwtH6OYIEACtPjMCg+x+vxVU8KhqwxpA
+UyDOuNbzB2TSMmETgGqHDqk/F4eSlMvZTukGlo5yPDYXhd7vUT45mrlRq8ljzBLr
+NkX2mkGgocdjAjSF2rgugMb+APpKNFxZtUPKosyyOPS9z4+4tjxfCpj2u2hZy8PD
+C3/6dz9Yga0kgWu2GWFZFFZiGxPyUCkjnUBWz53dT/1JwWt3W81bihVfhLX9CVgO
+KPEoZ96BaEucAHY0r/yq0zAq/+DCTYRrDLkeuZaDTB1RThWOrW+GCoPcIxbLi4/j
+/YkIGQCaYvpVsuacklwqhSxhucqctRklGHLrjLdxrqcS1pIfraCsRJazUoO1Uu7n
+DQ/aF9fczzX9nKv7t341lGn+Ujv5EEuaA/y38XSffsHxCmpEcvjGAH0NZsjHbYd/
+abeFTAnMV1r2r9/UcyuosEsaRyjW4Ljd51wWyGVv4Ky40HJYRmtefJX+1QDAntPJ
+lVPHQCa2B/YIDrFeokXFxDqONkA+fFm+lDb83lhAAhjxCwfbytZqJFTvYh7TQTLx
+3+ZA1BoFhxIHnR2mrFK+yqny9w6YAeZ8YMG5edH1EKoNVfic7OwwId1eQL6FCKCv
+F3sNZiCC3i7P6THg9hZSF1eNbfiuZuMxUbw3OZgYhyXLB023vEZ1mUQCAcbfsQxU
+sw6Rs2zVSxvPcg5CN8APig==
+=fujW
+-END PGP PUBLIC KEY BLOCK-



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-36124][SQL] Support subqueries with correlation through INTERSECT/EXCEPT

2023-02-21 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new ffc8ac935e2 [SPARK-36124][SQL] Support subqueries with correlation 
through INTERSECT/EXCEPT
ffc8ac935e2 is described below

commit ffc8ac935e24c7d15a700034bd556c2f4b8271ee
Author: Jack Chen 
AuthorDate: Wed Feb 22 13:22:59 2023 +0800

[SPARK-36124][SQL] Support subqueries with correlation through 
INTERSECT/EXCEPT

## What changes were proposed in this pull request?

Adds support for subquery decorrelation with INTERSECT and EXCEPT operators 
on the correlation paths. For example:
```
SELECT t1a, (
  SELECT avg(b) FROM (
SELECT t2b as b FROM t2 WHERE t2a = t1a
INTERSECT
SELECT t3b as b FROM t3 WHERE t3a = t1a
))
FROM t1
```

This uses the same logic as for UNION decorrelation added in 
https://github.com/apache/spark/pull/39375. The only real change is logic added 
to handle INTERSECT/EXCEPT DISTINCT, which are rewritten to semi/anti join and 
require extra logic in rewriteDomainJoins.

[This 
doc](https://docs.google.com/document/d/11b9ClCF2jYGU7vU2suOT7LRswYkg6tZ8_6xJbvxfh2I/edit#)
 describes how the decorrelation rewrite works for set operations and the code 
changes for it - see the INTERSECT/EXCEPT section in particular.

In this PR, we always add DomainJoins for correlation through 
INTERSECT/EXCEPT, and never do direct substitution of the outer refs. That can 
also be added as an optimization in a follow-up - it only affects performance, 
not surface area coverage.

### Why are the changes needed?
To improve subquery support in Spark.

### Does this PR introduce _any_ user-facing change?
Before this change, queries like this would return an error like: 
`Decorrelate inner query through Intersect is not supported.`

After this PR, this query can run successfully.

### How was this patch tested?
Unit tests and SQL query tests.

Moved the UNION decorrelation SQL tests to file scalar-subquery-set-op.sql 
and duplicated them to test each of [UNION/INTERSECT/EXCEPT] [ALL/DISTINCT]

Factors tested included:
- Subquery type:
  - Eligible for DecorrelateInnerQuery: Scalar, lateral join
  - Not supported: IN, EXISTS
- UNION inside and outside subquery
- Correlation in where, project, group by, aggregates, or no correlation
- Project, Aggregate, Window under the Union
- COUNT bug

Closes #39759 from jchen5/subq-intersect.

Authored-by: Jack Chen 
Signed-off-by: Wenchen Fan 
---
 .../sql/catalyst/analysis/CheckAnalysis.scala  |2 +-
 .../catalyst/optimizer/DecorrelateInnerQuery.scala |  143 ++-
 .../optimizer/DecorrelateInnerQuerySuite.scala |  112 ++-
 .../resources/sql-tests/inputs/join-lateral.sql|  150 ++-
 .../exists-subquery/exists-joins-and-set-ops.sql   |   86 +-
 .../subquery/in-subquery/in-set-operations.sql |  109 ++
 .../scalar-subquery/scalar-subquery-select.sql |  105 --
 .../scalar-subquery/scalar-subquery-set-op.sql |  621 
 .../sql-tests/results/join-lateral.sql.out |  249 -
 .../exists-joins-and-set-ops.sql.out   |  216 +++-
 .../subquery/in-subquery/in-set-operations.sql.out |  348 +++
 .../scalar-subquery/scalar-subquery-select.sql.out |  195 
 .../scalar-subquery/scalar-subquery-set-op.sql.out | 1043 
 .../scala/org/apache/spark/sql/SubquerySuite.scala |   78 +-
 14 files changed, 3085 insertions(+), 372 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
index 77948735dbe..fafdd679aa5 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
@@ -1231,7 +1231,7 @@ trait CheckAnalysis extends PredicateHelper with 
LookupCatalog with QueryErrorsB
 case p @ (_: ResolvedHint | _: LeafNode | _: Repartition | _: 
SubqueryAlias) =>
   p.children.foreach(child => checkPlan(child, aggregated, 
canContainOuter))
 
-case p @ (_ : Union) =>
+case p @ (_ : Union | _: SetOperation) =>
   // Set operations (e.g. UNION) containing correlated values are only 
supported
   // with DecorrelateInnerQuery framework.
   val childCanContainOuter = (canContainOuter
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/DecorrelateInnerQuery.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/DecorrelateInnerQuery.scala
index 069279a7a04..01029fe1af0 100644
--- 
a/sql/catalyst/src/main/scal

[spark] branch master updated (ffc8ac935e2 -> 58efc4b469c)

2023-02-21 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from ffc8ac935e2 [SPARK-36124][SQL] Support subqueries with correlation 
through INTERSECT/EXCEPT
 add 58efc4b469c [SPARK-41933][FOLLOWUP][CONNECT] Correct an error message

No new revisions were added by this update.

Summary of changes:
 python/pyspark/sql/session.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.4 updated: [SPARK-41933][FOLLOWUP][CONNECT] Correct an error message

2023-02-21 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new 6e6993a7e3c [SPARK-41933][FOLLOWUP][CONNECT] Correct an error message
6e6993a7e3c is described below

commit 6e6993a7e3c9ecdfe89718934b59baed5a2efead
Author: itholic 
AuthorDate: Wed Feb 22 15:35:54 2023 +0900

[SPARK-41933][FOLLOWUP][CONNECT] Correct an error message

### What changes were proposed in this pull request?

This PR follow-ups for https://github.com/apache/spark/pull/39441 to fix 
the wrong error message.

### Why are the changes needed?

Error message correction.

### Does this PR introduce _any_ user-facing change?

No, but it's just about error message.

### How was this patch tested?

The existing CI should pass

Closes #40112 from itholic/SPARK-41933-followup.

Lead-authored-by: itholic 
Co-authored-by: Haejoon Lee <44108233+itho...@users.noreply.github.com>
Signed-off-by: Hyukjin Kwon 
(cherry picked from commit 58efc4b469c229cd649fe28e5f201824cc3cfc07)
Signed-off-by: Hyukjin Kwon 
---
 python/pyspark/sql/session.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/python/pyspark/sql/session.py b/python/pyspark/sql/session.py
index de2eb4970f7..2ba1361cca0 100644
--- a/python/pyspark/sql/session.py
+++ b/python/pyspark/sql/session.py
@@ -458,7 +458,7 @@ class SparkSession(SparkConversionMixin):
 else:
 raise RuntimeError(
 "Cannot start a remote Spark session because 
there "
-"is a regular Spark Connect already running."
+"is a regular Spark session already running."
 )
 
 session = SparkSession._instantiatedSession


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



svn commit: r60251 - /dev/spark/KEYS

2023-02-21 Thread xinrong
Author: xinrong
Date: Wed Feb 22 07:01:27 2023
New Revision: 60251

Log:
Update KEYS

Modified:
dev/spark/KEYS

Modified: dev/spark/KEYS
==
--- dev/spark/KEYS (original)
+++ dev/spark/KEYS Wed Feb 22 07:01:27 2023
@@ -1850,59 +1850,59 @@ JznYPjY83fSKkeCh
 =3Ggj
 -END PGP PUBLIC KEY BLOCK-
 
-pub   rsa4096 2022-08-16 [SC]
-  0C33D35E1A9296B32CF31005ACD84F20930B47E8
-uid   [ultimate] Xinrong Meng (CODE SIGNING KEY) 
-sub   rsa4096 2022-08-16 [E]
+pub   rsa4096 2023-02-21 [SC]
+  CC68B3D16FE33A766705160BA7E57908C7A4E1B1
+uid   [ultimate] Xinrong Meng (RELEASE SIGNING KEY) 

+sub   rsa4096 2023-02-21 [E]
 -BEGIN PGP PUBLIC KEY BLOCK-
 
-mQINBGL64s8BEADCeefEm9XB63o/xIGpnwurEL24h5LsZdA7k7juZ5C1Fu6m5amT
-0A1n49YncYv6jDQD8xh+eiZ11+mYEAzkmGD+aVEMQA0/Zrp0rMe22Ymq5fQHfRCO
-88sQl4PvmqaElcAswFz7RP+55GWSIfEbZIJhZQdukaVCZuC+Xpb68TAj2OSXZ+Mt
-m8RdJXIJpmD0P6R7bvY4LPZL8tY7wtnxUj1I9wRnXc0AnbPfI6gGyF+b0x54b4Ey
-2+sZ6tNH501I9hgdEOWj+nqQFZTTzZQPI1r3nPIA28T9VDOKi5dmoI6iXFjCWZ2N
-dmsw8GN+45V1udOgylE2Mop7URzOQYlqaFnJvXzO/nZhAqbetrMmZ6jmlbqLEq/D
-C8cgYFuMwER3oAC0OwpSz2HLCya95xHDdPqX+Iag0h0bbFBxSNpgzQiUk1mvSYXa
-+7HGQ3rIfy7+87hA1BIHaN0L1oOw37UWk2IGDvS29JlGJ3SJDX5Ir5uBvW6k9So6
-xG9vT+l+R878rLcjJLJT4Me4pk4z8O4Uo+IY0uptiTYnvYRXBOw9wk9KpSckbr+s
-I2keVwa+0fui4c1ESwNHR8HviALho9skvwaCAP3TUZ43SHeDU840M9LwDWc6VNc1
-x30YbgYeKtyU1deh7pcBhykUJPrZ457OllG8SbnhAncwmf8TaJjUkQARAQAB
-tDRYaW5yb25nIE1lbmcgKENPREUgU0lHTklORyBLRVkpIDx4aW5yb25nQGFwYWNo
-ZS5vcmc+iQJOBBMBCAA4FiEEDDPTXhqSlrMs8xAFrNhPIJMLR+gFAmL64s8CGwMF
-CwkIBwIGFQoJCAsCBBYCAwECHgECF4AACgkQrNhPIJMLR+gNSRAAkhNM7vAFRwaX
-MachhS97+L2ZklerzeZuCP0zeYZ9gZloGUx+eM3MWOglUcKH0f6DjPitMMCr1Qbo
-OsENANTS5ZOp4r4rhbbNhYbA8Wbx8H+ZABmCuUNJMjmeVh3qL1WmHclApegqxiSH
-uc9xXB1RZOJH2pS2v7UXW2c/Y745oT/YxWX9hBeJUPWmg6M6jn1/osnqmUngXSvB
-HNzxzHT1gJJNEcRU3r5bKAJlLWBZzLO4pIgtFqIfpS79ieG54OwedrW3oqOheFKa
-LTYInFAdscmZwIo8jHakqf+UMu3H5dzABBRATDvcci7nBPi+J8F7qLvklzb1zd0L
-Ir/QnAy3zFUYUbwwRXDy0Gi0HsU5xP9QYT3pmtW3I+Xlwpso417XoE+1DYtizjbx
-FuJaSNs7K7VPaELezdvtFL0SGYNkpxz7EiVcW6TxmLsLBoNAeaKhHYtwhblQKznv
-6mEbjmiAo3oB68ghI+3xW2mZ+T+t3sgl5aNWiZ6RQx5v4liYc4vShmewcKGWvN7T
-RC5Ert0GxMJGsx7fIRAgWDOI1aMj5bx9H23d3RKxJWrRCXhSlg1lyzVj+GCrhYAy
-16/JH5ph0m+FCVwAP0GhHsZCQV1AT+YL7lgEZvmGq0ucDShc69lLh7qsxMg7zckk
-l66F14Imuz0EasVCdI3IwkuTFch9Quu5Ag0EYvrizwEQANpINEPd+Vio1D0opPBO
-Sa4keWk5IvvGETt6jUBemQten1gOB89Zba3E8ZgJpPobaThFrpsQJ9wNM8+KBHGm
-U+DTP+JC+65J9Eq6KA8qcH2jn3xKBWipWUACKUCvpFSNq63f3+RVbAyTYdykRhEU
-Ih+7eFtl3X0Q6v92TMZL26euXqt73UoOsoulKEmfSyhiQBQX7WNCtq3JR/mZ4+OA
-/N3J7qw+emvKG3t8h3/5CtpZWEMaJwaGyyENScsw5KEOYjl9o11mMeYRYfZ0n0h7
-DA8BmBl/k71+UvdopdzuwjRib02uZfdCC15tltLpoVeL/pa0GRmTRuCJARwjDD95
-xbrrYYqw2wD6l3Mtv/EooIBdzGpP15VnD4DFC5W9vxnxuEfSnX0DxCObsd6MCzZw
-GOiF4HudfFzB2SiE/OXNaAxdpSD9C8n0Y3ac74dk6uamzCkSnCjzzAOytFZY18fi
-N5ihDA9+2TeEOL0RVrQw0Mdc4X80A1dlCJ6Gh1Py4WOtDxB5UmSY2olvV6p5pRRD
-1HEnM9bivPdEErYpUI72K4L5feXFxt/obQ0rZMmmnYMldAcPcqsTMVgPWZICK/z0
-X/SrOR0YEa28XA+V69o4TwPR77oUK6t3SiFzAi3VmQtAP6NkqL+FNMa0V1ZiEPse
-lZhKVziNh5Jb8bnkQA6+9Md3ABEBAAGJAjYEGAEIACAWIQQMM9NeGpKWsyzzEAWs
-2E8gkwtH6AUCYvrizwIbDAAKCRCs2E8gkwtH6OYIEACtPjMCg+x+vxVU8KhqwxpA
-UyDOuNbzB2TSMmETgGqHDqk/F4eSlMvZTukGlo5yPDYXhd7vUT45mrlRq8ljzBLr
-NkX2mkGgocdjAjSF2rgugMb+APpKNFxZtUPKosyyOPS9z4+4tjxfCpj2u2hZy8PD
-C3/6dz9Yga0kgWu2GWFZFFZiGxPyUCkjnUBWz53dT/1JwWt3W81bihVfhLX9CVgO
-KPEoZ96BaEucAHY0r/yq0zAq/+DCTYRrDLkeuZaDTB1RThWOrW+GCoPcIxbLi4/j
-/YkIGQCaYvpVsuacklwqhSxhucqctRklGHLrjLdxrqcS1pIfraCsRJazUoO1Uu7n
-DQ/aF9fczzX9nKv7t341lGn+Ujv5EEuaA/y38XSffsHxCmpEcvjGAH0NZsjHbYd/
-abeFTAnMV1r2r9/UcyuosEsaRyjW4Ljd51wWyGVv4Ky40HJYRmtefJX+1QDAntPJ
-lVPHQCa2B/YIDrFeokXFxDqONkA+fFm+lDb83lhAAhjxCwfbytZqJFTvYh7TQTLx
-3+ZA1BoFhxIHnR2mrFK+yqny9w6YAeZ8YMG5edH1EKoNVfic7OwwId1eQL6FCKCv
-F3sNZiCC3i7P6THg9hZSF1eNbfiuZuMxUbw3OZgYhyXLB023vEZ1mUQCAcbfsQxU
-sw6Rs2zVSxvPcg5CN8APig==
-=fujW
+mQINBGP0Hf0BEACyHWHb/DyfpkIC64sJQKR7GGLBicFOxsVNYrxxcZJvdnfjFnHC
+ajib6m6dIQ5g+YgH23U/jIpHhZbXLWrQkyuYW4JbaG8uobK5S7crAqpYjtwRJHRe
+R4f8DO6nWUNxZGHYFU46zvt7GuBjN005u+X2Oxq9xau+CVgkS1r/vbykxDwGOcYM
+/vmgITo+Zk2zs2Krea+ul0aVZRvhGB8ZHHSdz83NTDm0DwlzALFodLWIRvSblqtZ
+SPVKntzmN6OYjVjPMK6HgLlVlH2WqOIexuZnbadioM6+Hg/eihXQVLU7wpBBliFA
+KTUnCNRRxEF8M7zPKEpyQbV2KJqMLdGLpE+ZEfzOKUxbCBmzF1MQ5Pxm4mm8RlvA
+DDoOI/I3IstoizsxI6hV7U3w22R4c++qmFtX/lzgDnCKfISBTQaofiVlvMg7fx+f
+7bA1oJxlMJMpjNO9s3qudMAxtrSzHUnIt2ThsxcsL+wfu/HxvR1+PfX6eCCXaVjN
+/ii0EkWbHBq6Jb1IDzKuU02oX0TWQisDqn+IHq8/Q46PH3H2nF6hfg8zJXMkTusc
+T8AmCoQCeVEPMbnVTWW9sVJC2gQPrCQJHEUbu5OHb9REtJ3GqtRw+mogTrpO5ads
+PO61a94fJQcTDgR59hShrXiXxUK07C/rXqexcVnXEZyfn/5ZnqmgdVNt2wARAQAB
+tDdYaW5yb25nIE1lbmcgKFJFTEVBU0UgU0lHTklORyBLRVkpIDx4aW5yb25nQGFw
+YWNoZS5vcmc+iQJOBBMBCgA4FiEEzGiz0W/jOnZnBRYLp+V5CMek4bEFAmP0Hf0C
+GwMFCwkIBwIGFQoJCAsCBBYCAwECHgECF4AACgkQp+V5CMek4bFlWg//YIN9HNQ2
+yj3gW9lXVTWtSzJvlnwZr5V9JBGevpWMNF3U38Dk0nlQUiSvHdpfQjIyITOYR9Iv
+GxuZCp5szVaRc00pfQWFy684zLvwqrjKekLzCpkqTOGXHO2RxeJH2ZBqcI9OSpR5
+B2J94dlQItM/bKsXhMNOwmVtS6kSW36aN/0Nd9ZQF/TWKIb

[spark] branch master updated: [SPARK-42520][CONNECT] Support basic Window API in Scala client

2023-02-21 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 054522b6762 [SPARK-42520][CONNECT] Support basic Window API in Scala 
client
054522b6762 is described below

commit 054522b67626aa1515b8f3f164ba7c063c38e5b8
Author: Rui Wang 
AuthorDate: Wed Feb 22 15:19:00 2023 +0800

[SPARK-42520][CONNECT] Support basic Window API in Scala client

### What changes were proposed in this pull request?

Support Window orderby, partitionby, rowsbetween/rangebetween.

### Why are the changes needed?

API coverage

### Does this PR introduce _any_ user-facing change?

NO

### How was this patch tested?

UT

Closes #40107 from amaliujia/rw-window-2.

Authored-by: Rui Wang 
Signed-off-by: Wenchen Fan 
---
 .../main/scala/org/apache/spark/sql/Column.scala   |  33 +++
 .../org/apache/spark/sql/expressions/Window.scala  | 241 +
 .../apache/spark/sql/expressions/WindowSpec.scala  | 240 
 .../apache/spark/sql/PlanGenerationTestSuite.scala |  11 +
 .../query-tests/explain-results/window.explain |   8 +
 .../test/resources/query-tests/queries/window.json | 205 ++
 .../resources/query-tests/queries/window.proto.bin |  43 
 7 files changed, 781 insertions(+)

diff --git 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Column.scala 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Column.scala
index c3e1113aa45..fde17963bfd 100644
--- 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Column.scala
+++ 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Column.scala
@@ -24,6 +24,7 @@ import 
org.apache.spark.connect.proto.Expression.SortOrder.SortDirection
 import org.apache.spark.internal.Logging
 import org.apache.spark.sql.catalyst.parser.CatalystSqlParser
 import org.apache.spark.sql.connect.common.DataTypeProtoConverter
+import org.apache.spark.sql.expressions.Window
 import org.apache.spark.sql.functions.lit
 import org.apache.spark.sql.types.{DataType, Metadata}
 
@@ -1233,6 +1234,38 @@ class Column private[sql] (private[sql] val expr: 
proto.Expression) extends Logg
* @since 3.4.0
*/
   def bitwiseXOR(other: Any): Column = fn("^", other)
+
+  /**
+   * Defines a windowing column.
+   *
+   * {{{
+   *   val w = Window.partitionBy("name").orderBy("id")
+   *   df.select(
+   * sum("price").over(w.rangeBetween(Window.unboundedPreceding, 2)),
+   * avg("price").over(w.rowsBetween(Window.currentRow, 4))
+   *   )
+   * }}}
+   *
+   * @group expr_ops
+   * @since 3.4.0
+   */
+  def over(window: expressions.WindowSpec): Column = window.withAggregate(this)
+
+  /**
+   * Defines an empty analytic clause. In this case the analytic function is 
applied and presented
+   * for all rows in the result set.
+   *
+   * {{{
+   *   df.select(
+   * sum("price").over(),
+   * avg("price").over()
+   *   )
+   * }}}
+   *
+   * @group expr_ops
+   * @since 3.4.0
+   */
+  def over(): Column = over(Window.spec)
 }
 
 private[sql] object Column {
diff --git 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/expressions/Window.scala
 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/expressions/Window.scala
new file mode 100644
index 000..c85e7bc9c5c
--- /dev/null
+++ 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/expressions/Window.scala
@@ -0,0 +1,241 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.expressions
+
+import org.apache.spark.annotation.Stable
+import org.apache.spark.sql.Column
+
+/**
+ * Utility functions for defining window in DataFrames.
+ *
+ * {{{
+ *   // PARTITION BY country ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING 
AND CURRENT ROW
+ *   Window.partitionBy("country").orderBy("date")
+ * .rowsBetween(Window.unboundedPreceding, Window.currentRow)
+ *
+ *   // PARTITION BY country ORDER BY date ROWS BETWEEN 3 PRECEDING AND 3 
FOLLOW

[spark] branch branch-3.4 updated: [SPARK-42520][CONNECT] Support basic Window API in Scala client

2023-02-21 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new c123c85aa88 [SPARK-42520][CONNECT] Support basic Window API in Scala 
client
c123c85aa88 is described below

commit c123c85aa88ad10e04ab9b53f6bab20e510175ac
Author: Rui Wang 
AuthorDate: Wed Feb 22 15:19:00 2023 +0800

[SPARK-42520][CONNECT] Support basic Window API in Scala client

### What changes were proposed in this pull request?

Support Window orderby, partitionby, rowsbetween/rangebetween.

### Why are the changes needed?

API coverage

### Does this PR introduce _any_ user-facing change?

NO

### How was this patch tested?

UT

Closes #40107 from amaliujia/rw-window-2.

Authored-by: Rui Wang 
Signed-off-by: Wenchen Fan 
(cherry picked from commit 054522b67626aa1515b8f3f164ba7c063c38e5b8)
Signed-off-by: Wenchen Fan 
---
 .../main/scala/org/apache/spark/sql/Column.scala   |  33 +++
 .../org/apache/spark/sql/expressions/Window.scala  | 241 +
 .../apache/spark/sql/expressions/WindowSpec.scala  | 240 
 .../apache/spark/sql/PlanGenerationTestSuite.scala |  11 +
 .../query-tests/explain-results/window.explain |   8 +
 .../test/resources/query-tests/queries/window.json | 205 ++
 .../resources/query-tests/queries/window.proto.bin |  43 
 7 files changed, 781 insertions(+)

diff --git 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Column.scala 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Column.scala
index c3e1113aa45..fde17963bfd 100644
--- 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Column.scala
+++ 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Column.scala
@@ -24,6 +24,7 @@ import 
org.apache.spark.connect.proto.Expression.SortOrder.SortDirection
 import org.apache.spark.internal.Logging
 import org.apache.spark.sql.catalyst.parser.CatalystSqlParser
 import org.apache.spark.sql.connect.common.DataTypeProtoConverter
+import org.apache.spark.sql.expressions.Window
 import org.apache.spark.sql.functions.lit
 import org.apache.spark.sql.types.{DataType, Metadata}
 
@@ -1233,6 +1234,38 @@ class Column private[sql] (private[sql] val expr: 
proto.Expression) extends Logg
* @since 3.4.0
*/
   def bitwiseXOR(other: Any): Column = fn("^", other)
+
+  /**
+   * Defines a windowing column.
+   *
+   * {{{
+   *   val w = Window.partitionBy("name").orderBy("id")
+   *   df.select(
+   * sum("price").over(w.rangeBetween(Window.unboundedPreceding, 2)),
+   * avg("price").over(w.rowsBetween(Window.currentRow, 4))
+   *   )
+   * }}}
+   *
+   * @group expr_ops
+   * @since 3.4.0
+   */
+  def over(window: expressions.WindowSpec): Column = window.withAggregate(this)
+
+  /**
+   * Defines an empty analytic clause. In this case the analytic function is 
applied and presented
+   * for all rows in the result set.
+   *
+   * {{{
+   *   df.select(
+   * sum("price").over(),
+   * avg("price").over()
+   *   )
+   * }}}
+   *
+   * @group expr_ops
+   * @since 3.4.0
+   */
+  def over(): Column = over(Window.spec)
 }
 
 private[sql] object Column {
diff --git 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/expressions/Window.scala
 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/expressions/Window.scala
new file mode 100644
index 000..c85e7bc9c5c
--- /dev/null
+++ 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/expressions/Window.scala
@@ -0,0 +1,241 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.expressions
+
+import org.apache.spark.annotation.Stable
+import org.apache.spark.sql.Column
+
+/**
+ * Utility functions for defining window in DataFrames.
+ *
+ * {{{
+ *   // PARTITION BY country ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING 
AND CURRENT ROW
+ *   Window.partitionBy("country").orderBy("date")
+ * .rowsBetween(Window.unboundedPre