date:20230619

[spark] branch master updated: [SPARK-43944][SQL][CONNECT][PYTHON][FOLLOW-UP] Make `startswith` & `endswith` support binary type

2023-06-19 Thread ruifengz

This is an automated email from the ASF dual-hosted git repository.

ruifengz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 6b36a9368d6 [SPARK-43944][SQL][CONNECT][PYTHON][FOLLOW-UP] Make 
`startswith` & `endswith` support binary type
6b36a9368d6 is described below

commit 6b36a9368d6e97f7f1f94c4ca7f6ee76dcd0015f
Author: Ruifeng Zheng 
AuthorDate: Tue Jun 20 14:08:56 2023 +0800

[SPARK-43944][SQL][CONNECT][PYTHON][FOLLOW-UP] Make `startswith` & 
`endswith` support binary type

### What changes were proposed in this pull request?
Make `startswith`, `endswith` support binary type:
1, in Connect API, `startswith` & `endswith` actually already support 
binary type;
2, in vanilla API, support binary type via `call_udf`

### Why are the changes needed?
for parity

### Does this PR introduce _any_ user-facing change?
yes

### How was this patch tested?
added ut

Closes #41659 from zhengruifeng/sql_func_sw.

Lead-authored-by: Ruifeng Zheng 
Co-authored-by: Ruifeng Zheng 
Signed-off-by: Ruifeng Zheng 
---
 .../scala/org/apache/spark/sql/functions.scala | 14 +++--
 python/pyspark/sql/functions.py| 36 --
 .../scala/org/apache/spark/sql/functions.scala | 24 ++-
 .../apache/spark/sql/StringFunctionsSuite.scala| 14 +++--
 4 files changed, 52 insertions(+), 36 deletions(-)

diff --git 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/functions.scala
 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/functions.scala
index 93cf8f521b2..2ac20bd5911 100644
--- 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/functions.scala
+++ 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/functions.scala
@@ -3945,11 +3945,8 @@ object functions {
 
   /**
* Returns a boolean. The value is True if str ends with suffix. Returns 
NULL if either input
-   * expression is NULL. Otherwise, returns False. Both str or suffix must be 
of STRING type.
-   *
-   * @note
-   *   Only STRING type is supported in this function, while `endswith` in SQL 
supports both
-   *   STRING and BINARY.
+   * expression is NULL. Otherwise, returns False. Both str or suffix must be 
of STRING or BINARY
+   * type.
*
* @group string_funcs
* @since 3.5.0
@@ -3959,11 +3956,8 @@ object functions {
 
   /**
* Returns a boolean. The value is True if str starts with prefix. Returns 
NULL if either input
-   * expression is NULL. Otherwise, returns False. Both str or prefix must be 
of STRING type.
-   *
-   * @note
-   *   Only STRING type is supported in this function, while `startswith` in 
SQL supports both
-   *   STRING and BINARY.
+   * expression is NULL. Otherwise, returns False. Both str or prefix must be 
of STRING or BINARY
+   * type.
*
* @group string_funcs
* @since 3.5.0
diff --git a/python/pyspark/sql/functions.py b/python/pyspark/sql/functions.py
index 3eaccdc1ea1..0cfc19615be 100644
--- a/python/pyspark/sql/functions.py
+++ b/python/pyspark/sql/functions.py
@@ -9660,11 +9660,6 @@ def endswith(str: "ColumnOrName", suffix: 
"ColumnOrName") -> Column:
 
 .. versionadded:: 3.5.0
 
-Notes
--
-Only STRING type is supported in this function,
-while `startswith` in SQL supports both STRING and BINARY.
-
 Parameters
 --
 str : :class:`~pyspark.sql.Column` or str
@@ -9677,6 +9672,19 @@ def endswith(str: "ColumnOrName", suffix: 
"ColumnOrName") -> Column:
 >>> df = spark.createDataFrame([("Spark SQL", "Spark",)], ["a", "b"])
 >>> df.select(endswith(df.a, df.b).alias('r')).collect()
 [Row(r=False)]
+
+>>> df = spark.createDataFrame([("414243", "4243",)], ["e", "f"])
+>>> df = df.select(to_binary("e").alias("e"), to_binary("f").alias("f"))
+>>> df.printSchema()
+root
+ |-- e: binary (nullable = true)
+ |-- f: binary (nullable = true)
+>>> df.select(endswith("e", "f"), endswith("f", "e")).show()
++--+--+
+|endswith(e, f)|endswith(f, e)|
++--+--+
+|  true| false|
++--+--+
 """
 return _invoke_function_over_columns("endswith", str, suffix)
 
@@ -9690,11 +9698,6 @@ def startswith(str: "ColumnOrName", prefix: 
"ColumnOrName") -> Column:
 
 .. versionadded:: 3.5.0
 
-Notes
--
-Only STRING type is supported in this function,
-while `startswith` in SQL supports both STRING and BINARY.
-
 Parameters
 --
 str : :class:`~pyspark.sql.Column` or str
@@ -9707,6 +9710,19 @@ def startswith(str: "ColumnOrName", prefix: 
"ColumnOrName") -> Column:
 >>> df = spark.createDataFrame([("Spark SQL", "Spark",)], ["a", "b"])
 >>> df.select(startswith(df.a, df.b).

[spark] branch master updated: [SPARK-44073][SQL][PYTHON][CONNECT] Add date time functions to Scala, Python and Connect - part 2

2023-06-19 Thread ruifengz

This is an automated email from the ASF dual-hosted git repository.

ruifengz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 7a8f2ec53fd [SPARK-44073][SQL][PYTHON][CONNECT] Add date time 
functions to Scala, Python and Connect - part 2
7a8f2ec53fd is described below

commit 7a8f2ec53fdeee4119c62ed8e1143987ff751482
Author: Jiaan Geng 
AuthorDate: Tue Jun 20 13:32:02 2023 +0800

[SPARK-44073][SQL][PYTHON][CONNECT] Add date time functions to Scala, 
Python and Connect - part 2

### What changes were proposed in this pull request?
This PR want add some date time functions to Scala, Python and Connect API. 
These functions show below.

- weekday
- convert_timezone
- now
- timestamp_micros
- timestamp_millis

The origin plan also contains function `extract`. You can see this PR 
exclude it, since we can't get the data type for unresolved expressions. Please 
refer

https://github.com/apache/spark/blob/b97ce8b9a99c570fc57dec967e7e9db3d115c1db/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala#L2835
 and 
https://github.com/apache/spark/blob/b97ce8b9a99c570fc57dec967e7e9db3d115c1db/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala#L2922

### Why are the changes needed?
Add date time functions to Scala, Python and Connect API.

### Does this PR introduce _any_ user-facing change?
'No'.
New feature.

### How was this patch tested?
New test cases.

Closes #41651 from beliefer/SPARK-44073.

Authored-by: Jiaan Geng 
Signed-off-by: Ruifeng Zheng 
---
 .../scala/org/apache/spark/sql/functions.scala |  74 -
 .../apache/spark/sql/PlanGenerationTestSuite.scala |  24 +++
 ..._convert_timezone_with_source_time_zone.explain |   2 +
 ...nvert_timezone_without_source_time_zone.explain |   2 +
 .../explain-results/function_now.explain   |   2 +
 .../function_timestamp_micros.explain  |   2 +
 .../function_timestamp_millis.explain  |   2 +
 .../explain-results/function_weekday.explain   |   2 +
 ...ion_convert_timezone_with_source_time_zone.json |  33 
 ...onvert_timezone_with_source_time_zone.proto.bin | Bin 0 -> 170 bytes
 ..._convert_timezone_without_source_time_zone.json |  29 
 ...ert_timezone_without_source_time_zone.proto.bin | Bin 0 -> 150 bytes
 .../query-tests/queries/function_now.json  |  20 +++
 .../query-tests/queries/function_now.proto.bin | Bin 0 -> 110 bytes
 .../queries/function_timestamp_micros.json |  25 +++
 .../queries/function_timestamp_micros.proto.bin| Bin 0 -> 130 bytes
 .../queries/function_timestamp_millis.json |  25 +++
 .../queries/function_timestamp_millis.proto.bin| Bin 0 -> 130 bytes
 .../query-tests/queries/function_weekday.json  |  25 +++
 .../query-tests/queries/function_weekday.proto.bin | Bin 0 -> 121 bytes
 .../source/reference/pyspark.sql/functions.rst |   5 +
 python/pyspark/sql/connect/functions.py|  40 +
 python/pyspark/sql/functions.py| 176 +
 .../scala/org/apache/spark/sql/functions.scala |  64 
 .../org/apache/spark/sql/DateFunctionsSuite.scala  |  53 +++
 25 files changed, 597 insertions(+), 8 deletions(-)

diff --git 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/functions.scala
 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/functions.scala
index ccd46c2d267..93cf8f521b2 100644
--- 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/functions.scala
+++ 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/functions.scala
@@ -4198,6 +4198,14 @@ object functions {
*/
   def current_timestamp(): Column = Column.fn("current_timestamp")
 
+  /**
+   * Returns the current timestamp at the start of query evaluation.
+   *
+   * @group datetime_funcs
+   * @since 3.5.0
+   */
+  def now(): Column = Column.fn("now")
+
   /**
* Returns the current timestamp without time zone at the start of query 
evaluation as a
* timestamp without time zone column. All calls of localtimestamp within 
the same query return
@@ -4459,6 +4467,14 @@ object functions {
*/
   def minute(e: Column): Column = Column.fn("minute", e)
 
+  /**
+   * Returns the day of the week for date/timestamp (0 = Monday, 1 = Tuesday, 
..., 6 = Sunday).
+   *
+   * @group datetime_funcs
+   * @since 3.5.0
+   */
+  def weekday(e: Column): Column = Column.fn("weekday", e)
+
   /**
* @return
*   A date created from year, month and day fields.
@@ -5072,6 +5088,22 @@ object functions {
*/
   def timestamp_seconds(e: Column): Column = Column.fn("timestamp_seconds", e)
 
+  /**
+   * Creates timestamp from the number of milliseconds since UTC epoch.

[spark] branch master updated: [SPARK-44074][CORE][SQL][TESTS] Fix loglevel restore behavior of `SparkFunSuite#withLogAppender` and re-enable UT `Logging plan changes for execution`

2023-06-19 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 0df4266ff61 [SPARK-44074][CORE][SQL][TESTS] Fix loglevel restore 
behavior of `SparkFunSuite#withLogAppender` and re-enable UT `Logging plan 
changes for execution`
0df4266ff61 is described below

commit 0df4266ff6153d249add266b60eea5d5391001cc
Author: yangjie01 
AuthorDate: Mon Jun 19 21:40:27 2023 -0700

[SPARK-44074][CORE][SQL][TESTS] Fix loglevel restore behavior of 
`SparkFunSuite#withLogAppender` and re-enable UT `Logging plan changes for 
execution`

### What changes were proposed in this pull request?
The main change of this pr is to add a call of 
`SparkFunSuite#wupdateLoggers` after restore loglevel when 'level' of 
`withLogAppender` function is not `None`, and under the premise of this change, 
the UT `Logging plan changes for execution` disabled in 
https://github.com/apache/spark/pull/41638 can be re-enabled.

### Why are the changes needed?
- Fix bug of `SparkFunSuite#withLogAppender` when 'level' is not None
- Re-enable UT `Logging plan changes for execution`

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
- Pass GitHub Actions
- Manual test

```
build/sbt "sql/testOnly org.apache.spark.sql.JoinHintSuite 
org.apache.spark.sql.execution.QueryExecutionSuite"
```

**Before**

```
[info] - Logging plan changes for execution *** FAILED *** (36 milliseconds)
[info]   testAppender.loggingEvents.exists(((x$10: 
org.apache.logging.log4j.core.LogEvent) => 
x$10.getMessage().getFormattedMessage().contains(expectedMsg))) was false 
(QueryExecutionSuite.scala:232)
[info]   org.scalatest.exceptions.TestFailedException:
[info]   at 
org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
[info]   at 
org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
[info]   at 
org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231)
[info]   at 
org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295)
[info]   at 
org.apache.spark.sql.execution.QueryExecutionSuite.$anonfun$new$34(QueryExecutionSuite.scala:232)
[info]   at scala.collection.immutable.List.foreach(List.scala:431)
[info]   at 
org.apache.spark.sql.execution.QueryExecutionSuite.$anonfun$new$31(QueryExecutionSuite.scala:231)
[info]   at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
[info]   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
[info]   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
[info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
[info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
...

```

The failure reason is `withLogAppender(hintAppender, level = 
Some(Level.WARN))` used in `JoinHintSuite`, but `SparkFunSuite#wupdateLoggers` 
doesn't have the correct restore Loglevel.

The test was successful before SPARK-44034 due to there was 
`AdaptiveQueryExecSuite` between `JoinHintSuite` and `QueryExecutionSuite`, and 
`AdaptiveQueryExecSuite` called `withLogAppender(hintAppender, level = 
Some(Level.DEBUG))`, but `AdaptiveQueryExecSuite` move to `slow sql` test group 
after SPARK-44034

**After**

```
[info] Run completed in 7 seconds, 485 milliseconds.
[info] Total number of tests run: 32
[info] Suites: completed 2, aborted 0
[info] Tests: succeeded 32, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
```

Closes #41663 from LuciferYang/SPARK-44074.

Authored-by: yangjie01 
Signed-off-by: Dongjoon Hyun 
---
 core/src/test/scala/org/apache/spark/SparkFunSuite.scala   | 1 +
 .../scala/org/apache/spark/sql/execution/QueryExecutionSuite.scala | 3 +--
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/core/src/test/scala/org/apache/spark/SparkFunSuite.scala 
b/core/src/test/scala/org/apache/spark/SparkFunSuite.scala
index 692e3215aef..f5819b95087 100644
--- a/core/src/test/scala/org/apache/spark/SparkFunSuite.scala
+++ b/core/src/test/scala/org/apache/spark/SparkFunSuite.scala
@@ -297,6 +297,7 @@ abstract class SparkFunSuite
   logger.asInstanceOf[Logger].setLevel(restoreLevels(i))
   logger.asInstanceOf[Logger].get().setLevel(restoreLevels(i))
 }
+
LogManager.getContext(false).asInstanceOf[LoggerContext].updateLoggers()
   }
 }
   }
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/QueryExecutionSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/QueryExecutionSuite.scala
index 0a22efcb34d..d2a101b2395 100644
--- 
a/sql/core/src/test/scala/org/apache/

[spark] branch master updated: [SPARK-43942][SQL][CONNECT][PYTHON][FOLLOW-UP] Make contains support binary type

2023-06-19 Thread ruifengz

This is an automated email from the ASF dual-hosted git repository.

ruifengz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new d6380ad4d02 [SPARK-43942][SQL][CONNECT][PYTHON][FOLLOW-UP] Make 
contains support binary type
d6380ad4d02 is described below

commit d6380ad4d02cb3b04ccd83c40f3d32e063627735
Author: panbingkun 
AuthorDate: Tue Jun 20 11:44:55 2023 +0800

[SPARK-43942][SQL][CONNECT][PYTHON][FOLLOW-UP] Make contains support binary 
type

What changes were proposed in this pull request?
Make contains support binary type:
- in Connect API, contains actually already support binary type;
- in vanilla API, support binary type via call_udf

Why are the changes needed?
for parity

Does this PR introduce any user-facing change?
yes

How was this patch tested?
added ut

Closes #41665 from panbingkun/SPARK-43942_FOLLOWUP.

Authored-by: panbingkun 
Signed-off-by: Ruifeng Zheng 
---
 .../main/scala/org/apache/spark/sql/functions.scala  |  8 ++--
 python/pyspark/sql/functions.py  | 20 ++--
 .../main/scala/org/apache/spark/sql/functions.scala  | 12 +---
 .../org/apache/spark/sql/StringFunctionsSuite.scala  |  8 +++-
 4 files changed, 28 insertions(+), 20 deletions(-)

diff --git 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/functions.scala
 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/functions.scala
index c12bb23f850..ccd46c2d267 100644
--- 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/functions.scala
+++ 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/functions.scala
@@ -4025,12 +4025,8 @@ object functions {
 
   /**
* Returns a boolean. The value is True if right is found inside left. 
Returns NULL if either
-   * input expression is NULL. Otherwise, returns False. Both left or right 
must be of STRING
-   * type.
-   *
-   * @note
-   *   Only STRING type is supported in this function, while `contains` in SQL 
supports both
-   *   STRING and BINARY.
+   * input expression is NULL. Otherwise, returns False. Both left or right 
must be of STRING or
+   * BINARY type.
*
* @group string_funcs
* @since 3.5.0
diff --git a/python/pyspark/sql/functions.py b/python/pyspark/sql/functions.py
index adef14de454..1a5633e3c5e 100644
--- a/python/pyspark/sql/functions.py
+++ b/python/pyspark/sql/functions.py
@@ -9711,15 +9711,10 @@ def contains(left: "ColumnOrName", right: 
"ColumnOrName") -> Column:
 """
 Returns a boolean. The value is True if right is found inside left.
 Returns NULL if either input expression is NULL. Otherwise, returns False.
-Both left or right must be of STRING.
+Both left or right must be of STRING or BINARY type.
 
 .. versionadded:: 3.5.0
 
-Notes
--
-Only STRING type is supported in this function,
-while `contains` in SQL supports both STRING and BINARY.
-
 Parameters
 --
 left : :class:`~pyspark.sql.Column` or str
@@ -9732,6 +9727,19 @@ def contains(left: "ColumnOrName", right: 
"ColumnOrName") -> Column:
 >>> df = spark.createDataFrame([("Spark SQL", "Spark")], ['a', 'b'])
 >>> df.select(contains(df.a, df.b).alias('r')).collect()
 [Row(r=True)]
+
+>>> df = spark.createDataFrame([("414243", "4243",)], ["c", "d"])
+>>> df = df.select(to_binary("c").alias("c"), to_binary("d").alias("d"))
+>>> df.printSchema()
+root
+ |-- c: binary (nullable = true)
+ |-- d: binary (nullable = true)
+>>> df.select(contains("c", "d"), contains("d", "c")).show()
++--+--+
+|contains(c, d)|contains(d, c)|
++--+--+
+|  true| false|
++--+--+
 """
 return _invoke_function_over_columns("contains", left, right)
 
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/functions.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/functions.scala
index e7e14e30477..582e3b9e363 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/functions.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/functions.scala
@@ -4144,17 +4144,15 @@ object functions {
   /**
* Returns a boolean. The value is True if right is found inside left.
* Returns NULL if either input expression is NULL. Otherwise, returns False.
-   * Both left or right must be of STRING type.
-   *
-   * @note
-   *   Only STRING type is supported in this function, while `contains` in SQL 
supports both
-   *   STRING and BINARY.
+   * Both left or right must be of STRING or BINARY type.
*
* @group string_funcs
* @since 3.5.0
*/
-  def contains(left: Column, right: Column): Column = withExpr {
-Contains(left.expr, right.expr)
+  def contains(left: Column, right:

[spark] branch master updated: [SPARK-44099][INFRA] Support both java-8-openjdk-amd64 and java-8-openjdk-arm64 in `spark-rm` Dockerfile

2023-06-19 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 469197279dc [SPARK-44099][INFRA] Support both java-8-openjdk-amd64 and 
java-8-openjdk-arm64 in `spark-rm` Dockerfile
469197279dc is described below

commit 469197279dc2d9aa18560cb7c1eff04b6122e584
Author: Dongjoon Hyun 
AuthorDate: Mon Jun 19 18:49:12 2023 -0700

[SPARK-44099][INFRA] Support both java-8-openjdk-amd64 and 
java-8-openjdk-arm64 in `spark-rm` Dockerfile

### What changes were proposed in this pull request?

This PR aims to support both amd64 and arm64 Java8 installation in 
`spark-rm` image.

### Why are the changes needed?

The `spark-rm` image uses a hard-coded 
`/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java` currently.

If we extend it to accept a different arch installation like the following. 
This helps the release manager test steps.
```
root34fa8aae9142:/# ls -al /usr/lib/jvm/
total 20
drwxr-xr-x 3 root root 4096 Jun 19 20:47 .
drwxr-xr-x 1 root root 4096 Jun 19 20:46 ..
-rw-r--r-- 1 root root 2670 May  6 10:53 .java-1.8.0-openjdk-arm64.jinfo
lrwxrwxrwx 1 root root   20 May  6 10:53 java-1.8.0-openjdk-arm64 -> 
java-8-openjdk-arm64
drwxr-xr-x 7 root root 4096 Jun 19 20:47 java-8-openjdk-arm64
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Closes #41664 from dongjoon-hyun/SPARK-44099.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 dev/create-release/spark-rm/Dockerfile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/dev/create-release/spark-rm/Dockerfile 
b/dev/create-release/spark-rm/Dockerfile
index 6995928beae..8f198a420bc 100644
--- a/dev/create-release/spark-rm/Dockerfile
+++ b/dev/create-release/spark-rm/Dockerfile
@@ -63,7 +63,7 @@ RUN apt-get clean && apt-get update && $APT_INSTALL gnupg 
ca-certificates && \
   apt-get update && \
   # Install openjdk 8.
   $APT_INSTALL openjdk-8-jdk && \
-  update-alternatives --set java 
/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java && \
+  update-alternatives --set java $(ls 
/usr/lib/jvm/java-8-openjdk-*/jre/bin/java) && \
   # Install build / source control tools
   $APT_INSTALL curl wget git maven ivy subversion make gcc lsof libffi-dev \
 pandoc pandoc-citeproc libssl-dev libcurl4-openssl-dev libxml2-dev && \


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r62501 - in /dev/spark/v3.4.1-rc1-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/R/articles/ _site/api/R/deps/ _site/api/R/deps/bootstrap-5.2.2/ _site/api/R/deps/jquery-3.6.0/ _site/api

2023-06-19 Thread dongjoon

Author: dongjoon
Date: Tue Jun 20 00:42:59 2023
New Revision: 62501

Log:
Add v3.4.1-rc1-docs


[This commit notification would consist of 2789 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-44036][FOLLOWUP][CONNECT][TESTS] Consolidate remaining tickets

2023-06-19 Thread ruifengz

This is an automated email from the ASF dual-hosted git repository.

ruifengz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 4dc0e98367a [SPARK-44036][FOLLOWUP][CONNECT][TESTS] Consolidate 
remaining tickets
4dc0e98367a is described below

commit 4dc0e98367a105b55f1ca716d7785a022e430dbf
Author: itholic 
AuthorDate: Tue Jun 20 08:16:48 2023 +0800

[SPARK-44036][FOLLOWUP][CONNECT][TESTS] Consolidate remaining tickets

### What changes were proposed in this pull request?

This is follow-up for https://github.com/apache/spark/pull/41566.

### Why are the changes needed?

To simplify tasks by consolidating tickets that has same cause.

### Does this PR introduce _any_ user-facing change?

No, this is a handling intended to facilitate the task.

### How was this patch tested?

The existing tests should pass.

Closes #41661 from itholic/44036-followup.

Authored-by: itholic 
Signed-off-by: Ruifeng Zheng 
---
 .../pandas/tests/connect/data_type_ops/test_parity_categorical_ops.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git 
a/python/pyspark/pandas/tests/connect/data_type_ops/test_parity_categorical_ops.py
 
b/python/pyspark/pandas/tests/connect/data_type_ops/test_parity_categorical_ops.py
index 44243192d50..b680e5b3d79 100644
--- 
a/python/pyspark/pandas/tests/connect/data_type_ops/test_parity_categorical_ops.py
+++ 
b/python/pyspark/pandas/tests/connect/data_type_ops/test_parity_categorical_ops.py
@@ -34,11 +34,11 @@ class CategoricalOpsParityTests(
 def test_astype(self):
 super().test_astype()
 
-@unittest.skip("TODO(SPARK-43670): Enable CategoricalOps.eq to work with 
Spark Connect.")
+@unittest.skip("TODO(SPARK-43620): Support `Column` for 
SparkConnectColumn.__getitem__.")
 def test_eq(self):
 super().test_eq()
 
-@unittest.skip("TODO(SPARK-43675): Enable CategoricalOps.ne to work with 
Spark Connect.")
+@unittest.skip("TODO(SPARK-43620): Support `Column` for 
SparkConnectColumn.__getitem__.")
 def test_ne(self):
 super().test_ne()
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-43624][PS][CONNECT] Add `EWM` to SparkConnectPlanner

2023-06-19 Thread ruifengz

This is an automated email from the ASF dual-hosted git repository.

ruifengz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new b4f4c372317 [SPARK-43624][PS][CONNECT] Add `EWM` to 
SparkConnectPlanner
b4f4c372317 is described below

commit b4f4c37231752d2eb6688b05e21410b3e823b427
Author: itholic 
AuthorDate: Tue Jun 20 08:15:24 2023 +0800

[SPARK-43624][PS][CONNECT] Add `EWM` to SparkConnectPlanner

### What changes were proposed in this pull request?

This PR proposes to add `EWM` for SparkConnectPlanner.

### Why are the changes needed?

To increase pandas API coverage

### Does this PR introduce _any_ user-facing change?

No, we added `EWM` to SparkConnectPlanner, but there is still unresolved 
`AnalysisException` issues(SPARK-43611) that need to be addressed in follow-up 
work.

### How was this patch tested?

Manually checked the plan was created with EWM properly.

Closes #41660 from itholic/EWM.

Authored-by: itholic 
Signed-off-by: Ruifeng Zheng 
---
 .../sql/connect/planner/SparkConnectPlanner.scala| 11 +++
 .../pyspark/pandas/tests/connect/test_parity_ewm.py  |  8 ++--
 python/pyspark/pandas/window.py  | 20 ++--
 3 files changed, 35 insertions(+), 4 deletions(-)

diff --git 
a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala
 
b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala
index b02b49d00dc..dc819fb4020 100644
--- 
a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala
+++ 
b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala
@@ -1682,6 +1682,12 @@ class SparkConnectPlanner(val sessionHolder: 
SessionHolder) extends Logging {
 val ignoreNA = extractBoolean(children(1), "ignoreNA")
 Some(aggregate.PandasMode(children(0), 
ignoreNA).toAggregateExpression(false))
 
+  case "ewm" if fun.getArgumentsCount == 3 =>
+val children = fun.getArgumentsList.asScala.map(transformExpression)
+val alpha = extractDouble(children(1), "alpha")
+val ignoreNA = extractBoolean(children(2), "ignoreNA")
+Some(EWM(children(0), alpha, ignoreNA))
+
   // ML-specific functions
   case "vector_to_array" if fun.getArgumentsCount == 2 =>
 val expr = transformExpression(fun.getArguments(0))
@@ -1742,6 +1748,11 @@ class SparkConnectPlanner(val sessionHolder: 
SessionHolder) extends Logging {
 case other => throw InvalidPlanInput(s"$field should be a literal boolean, 
but got $other")
   }
 
+  private def extractDouble(expr: Expression, field: String): Double = expr 
match {
+case Literal(double: Double, DoubleType) => double
+case other => throw InvalidPlanInput(s"$field should be a literal double, 
but got $other")
+  }
+
   private def extractInteger(expr: Expression, field: String): Int = expr 
match {
 case Literal(int: Int, IntegerType) => int
 case other => throw InvalidPlanInput(s"$field should be a literal integer, 
but got $other")
diff --git a/python/pyspark/pandas/tests/connect/test_parity_ewm.py 
b/python/pyspark/pandas/tests/connect/test_parity_ewm.py
index 0e13306fd79..e079f847296 100644
--- a/python/pyspark/pandas/tests/connect/test_parity_ewm.py
+++ b/python/pyspark/pandas/tests/connect/test_parity_ewm.py
@@ -22,11 +22,15 @@ from pyspark.testing.pandasutils import 
PandasOnSparkTestUtils, TestUtils
 
 
 class EWMParityTests(EWMTestsMixin, PandasOnSparkTestUtils, 
ReusedConnectTestCase, TestUtils):
-@unittest.skip("TODO(SPARK-43624): Enable ExponentialMovingLike.mean with 
Spark Connect.")
+@unittest.skip(
+"TODO(SPARK-43611): Fix unexpected `AnalysisException` from Spark 
Connect client."
+)
 def test_ewm_mean(self):
 super().test_ewm_mean()
 
-@unittest.skip("TODO(SPARK-43624): Enable ExponentialMovingLike.mean with 
Spark Connect.")
+@unittest.skip(
+"TODO(SPARK-43611): Fix unexpected `AnalysisException` from Spark 
Connect client."
+)
 def test_groupby_ewm_func(self):
 super().test_groupby_ewm_func()
 
diff --git a/python/pyspark/pandas/window.py b/python/pyspark/pandas/window.py
index 316a4af92dd..8d09dd132ca 100644
--- a/python/pyspark/pandas/window.py
+++ b/python/pyspark/pandas/window.py
@@ -44,6 +44,7 @@ from pyspark.sql.types import (
 DoubleType,
 )
 from pyspark.sql.window import WindowSpec
+from pyspark.sql.utils import is_remote
 
 
 class RollingAndExpanding(Generic[FrameLike], metaclass=ABCMeta):
@@ -2448,11 +2449,26 @@ class ExponentialMovingLike(Generic[FrameLike], 
metaclass=ABCMeta):
 unified_alpha = self._compute_unified_alpha()
 
 def mean(scol: Column

svn commit: r62499 - /dev/spark/v3.4.1-rc1-bin/

2023-06-19 Thread dongjoon

Author: dongjoon
Date: Mon Jun 19 23:25:34 2023
New Revision: 62499

Log:
Apache Spark v3.4.1-rc1

Added:
dev/spark/v3.4.1-rc1-bin/
dev/spark/v3.4.1-rc1-bin/SparkR_3.4.1.tar.gz   (with props)
dev/spark/v3.4.1-rc1-bin/SparkR_3.4.1.tar.gz.asc
dev/spark/v3.4.1-rc1-bin/SparkR_3.4.1.tar.gz.sha512
dev/spark/v3.4.1-rc1-bin/pyspark-3.4.1.tar.gz   (with props)
dev/spark/v3.4.1-rc1-bin/pyspark-3.4.1.tar.gz.asc
dev/spark/v3.4.1-rc1-bin/pyspark-3.4.1.tar.gz.sha512
dev/spark/v3.4.1-rc1-bin/spark-3.4.1-bin-hadoop3-scala2.13.tgz   (with 
props)
dev/spark/v3.4.1-rc1-bin/spark-3.4.1-bin-hadoop3-scala2.13.tgz.asc
dev/spark/v3.4.1-rc1-bin/spark-3.4.1-bin-hadoop3-scala2.13.tgz.sha512
dev/spark/v3.4.1-rc1-bin/spark-3.4.1-bin-hadoop3.tgz   (with props)
dev/spark/v3.4.1-rc1-bin/spark-3.4.1-bin-hadoop3.tgz.asc
dev/spark/v3.4.1-rc1-bin/spark-3.4.1-bin-hadoop3.tgz.sha512
dev/spark/v3.4.1-rc1-bin/spark-3.4.1-bin-without-hadoop.tgz   (with props)
dev/spark/v3.4.1-rc1-bin/spark-3.4.1-bin-without-hadoop.tgz.asc
dev/spark/v3.4.1-rc1-bin/spark-3.4.1-bin-without-hadoop.tgz.sha512
dev/spark/v3.4.1-rc1-bin/spark-3.4.1.tgz   (with props)
dev/spark/v3.4.1-rc1-bin/spark-3.4.1.tgz.asc
dev/spark/v3.4.1-rc1-bin/spark-3.4.1.tgz.sha512

Added: dev/spark/v3.4.1-rc1-bin/SparkR_3.4.1.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v3.4.1-rc1-bin/SparkR_3.4.1.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v3.4.1-rc1-bin/SparkR_3.4.1.tar.gz.asc
==
--- dev/spark/v3.4.1-rc1-bin/SparkR_3.4.1.tar.gz.asc (added)
+++ dev/spark/v3.4.1-rc1-bin/SparkR_3.4.1.tar.gz.asc Mon Jun 19 23:25:34 2023
@@ -0,0 +1,17 @@
+-BEGIN PGP SIGNATURE-
+
+iQJIBAABCgAyFiEE8oycklwYjDXjRWFN7aAM6DTw/FwFAmSQ44oUHGRvbmdqb29u
+QGFwYWNoZS5vcmcACgkQ7aAM6DTw/Fzq5BAAwP8SC9oE+j4plKbDkS1AUFLNbWxP
+xxno29cy0FSOpcI2potXuBV3QuqocT8A1Q1lZFvet7XeOitMIY51anb5x7UMm/Jn
+deQsG36MK2vS+ESgh+ZM635sQ0BMQUQ1TqC3vRZQGDE6I7YSIAOzV4ikrR5TMXpf
+eCjlgStbA8wjO+jBzFJNgJ3Qa7qxnHl2z44b5c3eCpplqFnm2sb0Js6BbvNDM1H+
+QHsAR7mwLG+eTVyP9HpCEkCPs/vVCaCgRyHcs5a1N0+H39YxKW4y7RVQUfVDeFZO
+02NN7AHKuUokrCBdAdbOOzU6RkW/2w5Hzw0sNwFFKrXinrVWorDnNqar2eI3nDcG
+7Z1UgaGZueLRCGK/j667F0lNeWfIJwJof0wsbTrDBRwaCyENkSqMOwmCDETGwWEv
+beSUfsahuSfX8phm1ioZ37syig4qbPYNF6rd88UqeAFYa73tsbE8NbrtNwS57FKz
+/tY0N/U+pVf4GwIrlRjmCmzvlpmsKv45qhxGUmdp1vXVQXJXCqEDiDSyvwrZTgnc
+mAg9qk6jvykJQuVU0rYyFR9xEBZx1G3kOHwJDyUpFjLNgzZkwsnqr6y6btDEoGI1
+rpCpS/HNYEghCzI9m7/8hJtne9ZVHVrFfm1GglqdI6TBNcO39E1fjOhhm+4299Ns
+Kp8GNdI1JR4ZkcU=
+=oMnc
+-END PGP SIGNATURE-

Added: dev/spark/v3.4.1-rc1-bin/SparkR_3.4.1.tar.gz.sha512
==
--- dev/spark/v3.4.1-rc1-bin/SparkR_3.4.1.tar.gz.sha512 (added)
+++ dev/spark/v3.4.1-rc1-bin/SparkR_3.4.1.tar.gz.sha512 Mon Jun 19 23:25:34 2023
@@ -0,0 +1 @@
+4c90eebd88721353cec96f5c14c3589b6477fdebd8c988176122b43572b90d9138f10b990fd21a7cedb07503764a1a803aab0bd22fa0def8a84bcf9eec2e038a
  SparkR_3.4.1.tar.gz

Added: dev/spark/v3.4.1-rc1-bin/pyspark-3.4.1.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v3.4.1-rc1-bin/pyspark-3.4.1.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v3.4.1-rc1-bin/pyspark-3.4.1.tar.gz.asc
==
--- dev/spark/v3.4.1-rc1-bin/pyspark-3.4.1.tar.gz.asc (added)
+++ dev/spark/v3.4.1-rc1-bin/pyspark-3.4.1.tar.gz.asc Mon Jun 19 23:25:34 2023
@@ -0,0 +1,17 @@
+-BEGIN PGP SIGNATURE-
+
+iQJIBAABCgAyFiEE8oycklwYjDXjRWFN7aAM6DTw/FwFAmSQ44sUHGRvbmdqb29u
+QGFwYWNoZS5vcmcACgkQ7aAM6DTw/FyNCA/7BmPPTsXE+Ufcrlblrgjh7fqrplNc
+n3ynKPmF7Ho6DuZUNDiVzafNFhqnHYoLHBHghEfQRjaTtrxkaRRiaTlTj9Kte38/
+00RWWRf224a/A8VCc2fnQ3a+Y97KloCCVuFafVbrgG3O1iEWUsJ+yJcKygarJ2Av
+o1x1jXhV56YmxphILT5aHFkWJXiorjDuawdwqCZ9gvgxpDNDWvDd3v55dikmjGTC
+pm7eZ80fCMIiOE5fR7njxLQrQuU/0XewRtp2oP3fnugIfqZbbZ4RnohGmVKo2T8h
+DUOlJgtzPwFUym+zGVTpniiWjp9pSOT7p67NJI2SJq8pcqZKcnGHhfVE0IWI9OA/
+kAY+L1hm1xRvGq6emoNUI7D4v31fwS/jSzBoSVVZ8R2uCQHHdA+I8osoMXsnAGcG
+bMKgNTsRumqvO91SWUCmIuhH/K4O+8uAtS2SL4+6egsuIc9BAqrNAtMS0EOZa9AO
+gtNzdEy59ZHYyGWYuUPMIknitiWEBQSqbgYbmEorgJAjvCAb3egZ68TYDFtX0PkV
+fc2Sw1h5AcusbPeVywFmnvkfKivnxMtTBZkLKq1bPbvd6XDBBa9p9cKTeZFLB1so
+fhlNV16nVCx8cg2FksT+9/lCXYFTtuFlLCvmu3QINQ808r4QaTRy0q5QE1/ZL0wz
+NkDO5YN9DcGr7Ys=
+=hB/+
+-END PGP SIGNATURE-

Added: dev/spark/v3.4.1-rc1-bin/pyspark-3.4.1.tar.gz.sha512
==
--- dev/spark/v3.4.1-rc1-bin/pyspark-3.4.1.tar.gz.sha512 (added)
+++ dev/spark/

[spark] branch master updated: [SPARK-43969][SQL] Refactor & Assign names to the error class _LEGACY_ERROR_TEMP_1170

2023-06-19 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new f3db20c17df [SPARK-43969][SQL] Refactor & Assign names to the error 
class _LEGACY_ERROR_TEMP_1170
f3db20c17df is described below

commit f3db20c17dfdc1cb5daa42c154afa732e5e3800b
Author: panbingkun 
AuthorDate: Tue Jun 20 01:43:32 2023 +0300

[SPARK-43969][SQL] Refactor & Assign names to the error class 
_LEGACY_ERROR_TEMP_1170

### What changes were proposed in this pull request?
The pr aims to:
- Refactor `PreWriteCheck` to use error framework.
- Make `INSERT_COLUMN_ARITY_MISMATCH` more generic & avoiding to embed 
error's text in source code.
- Assign name to _LEGACY_ERROR_TEMP_1170.
- In `INSERT_PARTITION_COLUMN_ARITY_MISMATCH` error message, replace '' 
with `toSQLId` for table column name.

### Why are the changes needed?
The changes improve the error framework.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
- Manually test.
- Pass GA.

Closes #41458 from panbingkun/refactor_PreWriteCheck.

Lead-authored-by: panbingkun 
Co-authored-by: panbingkun <84731...@qq.com>
Signed-off-by: Max Gekk 
---
 core/src/main/resources/error/error-classes.json   |  62 ---
 python/pyspark/sql/tests/test_readwriter.py|   4 +-
 .../spark/sql/catalyst/analysis/Analyzer.scala |   2 +-
 .../catalyst/analysis/ResolveInsertionBase.scala   |  13 ++-
 .../catalyst/analysis/TableOutputResolver.scala|   4 +-
 .../spark/sql/errors/QueryCompilationErrors.scala  |  40 +++
 .../catalyst/analysis/V2WriteAnalysisSuite.scala   |  48 +---
 .../spark/sql/execution/datasources/rules.scala|  32 --
 .../analyzer-results/postgreSQL/numeric.sql.out|   7 +-
 .../sql-tests/results/postgreSQL/numeric.sql.out   |   7 +-
 .../org/apache/spark/sql/DataFrameSuite.scala  |  33 --
 .../org/apache/spark/sql/SQLInsertTestSuite.scala  |  31 --
 .../spark/sql/connector/InsertIntoTests.scala  |  34 --
 .../apache/spark/sql/execution/SQLViewSuite.scala  |  11 +-
 .../spark/sql/execution/command/DDLSuite.scala |  54 +
 .../org/apache/spark/sql/sources/InsertSuite.scala | 122 +
 .../spark/sql/hive/thriftserver/CliSuite.scala |   2 +-
 .../org/apache/spark/sql/hive/InsertSuite.scala|  11 +-
 18 files changed, 324 insertions(+), 193 deletions(-)

diff --git a/core/src/main/resources/error/error-classes.json 
b/core/src/main/resources/error/error-classes.json
index 54b920cc36f..d9e729effeb 100644
--- a/core/src/main/resources/error/error-classes.json
+++ b/core/src/main/resources/error/error-classes.json
@@ -888,10 +888,24 @@
   },
   "INSERT_COLUMN_ARITY_MISMATCH" : {
 "message" : [
-  "Cannot write to '', :",
-  "Table columns: .",
-  "Data columns: ."
+  "Cannot write to , the reason is"
 ],
+"subClass" : {
+  "NOT_ENOUGH_DATA_COLUMNS" : {
+"message" : [
+  "not enough data columns:",
+  "Table columns: .",
+  "Data columns: ."
+]
+  },
+  "TOO_MANY_DATA_COLUMNS" : {
+"message" : [
+  "too many data columns:",
+  "Table columns: .",
+  "Data columns: ."
+]
+  }
+},
 "sqlState" : "21S01"
   },
   "INSERT_PARTITION_COLUMN_ARITY_MISMATCH" : {
@@ -1715,6 +1729,11 @@
 ],
 "sqlState" : "46110"
   },
+  "NOT_SUPPORTED_COMMAND_WITHOUT_HIVE_SUPPORT" : {
+"message" : [
+  " is not supported, if you want to enable it, please set 
\"spark.sql.catalogImplementation\" to \"hive\"."
+]
+  },
   "NOT_SUPPORTED_IN_JDBC_CATALOG" : {
 "message" : [
   "Not supported command in JDBC catalog:"
@@ -2464,6 +2483,33 @@
   "grouping()/grouping_id() can only be used with 
GroupingSets/Cube/Rollup."
 ]
   },
+  "UNSUPPORTED_INSERT" : {
+"message" : [
+  "Can't insert into the target."
+],
+"subClass" : {
+  "NOT_ALLOWED" : {
+"message" : [
+  "The target relation  does not allow insertion."
+]
+  },
+  "NOT_PARTITIONED" : {
+"message" : [
+  "The target relation  is not partitioned."
+]
+  },
+  "RDD_BASED" : {
+"message" : [
+  "An RDD-based table is not allowed."
+]
+  },
+  "READ_FROM" : {
+"message" : [
+  "The target relation  is also being read from."
+]
+  }
+}
+  },
   "UNSUPPORTED_OVERWRITE" : {
 "message" : [
   "Can't overwrite the target that is also being read from."
@@ -3005,11 +3051,6 @@
   "Window function  requires window to be ordered, please add ORDER BY 
clause. For example SELECT (value_expr) OVER (PARTITION BY window_partition 
ORDER BY window_ordering) from t

[spark] 01/01: Preparing development version 3.4.2-SNAPSHOT

2023-06-19 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git

commit b24511f03062a2ab23610f91b65a2af2517df619
Author: Dongjoon Hyun 
AuthorDate: Mon Jun 19 22:17:32 2023 +

Preparing development version 3.4.2-SNAPSHOT
---
 R/pkg/DESCRIPTION  | 2 +-
 assembly/pom.xml   | 2 +-
 common/kvstore/pom.xml | 2 +-
 common/network-common/pom.xml  | 2 +-
 common/network-shuffle/pom.xml | 2 +-
 common/network-yarn/pom.xml| 2 +-
 common/sketch/pom.xml  | 2 +-
 common/tags/pom.xml| 2 +-
 common/unsafe/pom.xml  | 2 +-
 connector/avro/pom.xml | 2 +-
 connector/connect/client/jvm/pom.xml   | 2 +-
 connector/connect/common/pom.xml   | 2 +-
 connector/connect/server/pom.xml   | 2 +-
 connector/docker-integration-tests/pom.xml | 2 +-
 connector/kafka-0-10-assembly/pom.xml  | 2 +-
 connector/kafka-0-10-sql/pom.xml   | 2 +-
 connector/kafka-0-10-token-provider/pom.xml| 2 +-
 connector/kafka-0-10/pom.xml   | 2 +-
 connector/kinesis-asl-assembly/pom.xml | 2 +-
 connector/kinesis-asl/pom.xml  | 2 +-
 connector/protobuf/pom.xml | 2 +-
 connector/spark-ganglia-lgpl/pom.xml   | 2 +-
 core/pom.xml   | 2 +-
 docs/_config.yml   | 6 +++---
 examples/pom.xml   | 2 +-
 graphx/pom.xml | 2 +-
 hadoop-cloud/pom.xml   | 2 +-
 launcher/pom.xml   | 2 +-
 mllib-local/pom.xml| 2 +-
 mllib/pom.xml  | 2 +-
 pom.xml| 2 +-
 python/pyspark/version.py  | 2 +-
 repl/pom.xml   | 2 +-
 resource-managers/kubernetes/core/pom.xml  | 2 +-
 resource-managers/kubernetes/integration-tests/pom.xml | 2 +-
 resource-managers/mesos/pom.xml| 2 +-
 resource-managers/yarn/pom.xml | 2 +-
 sql/catalyst/pom.xml   | 2 +-
 sql/core/pom.xml   | 2 +-
 sql/hive-thriftserver/pom.xml  | 2 +-
 sql/hive/pom.xml   | 2 +-
 streaming/pom.xml  | 2 +-
 tools/pom.xml  | 2 +-
 43 files changed, 45 insertions(+), 45 deletions(-)

diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index fa7028630a8..77eb942e6d8 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: SparkR
 Type: Package
-Version: 3.4.1
+Version: 3.4.2
 Title: R Front End for 'Apache Spark'
 Description: Provides an R Front end for 'Apache Spark' 
.
 Authors@R:
diff --git a/assembly/pom.xml b/assembly/pom.xml
index b09ffdad3ff..1bbab03d56a 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.1
+3.4.2-SNAPSHOT
 ../pom.xml
   
 
diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml
index bb5467aa0e7..43851f160f3 100644
--- a/common/kvstore/pom.xml
+++ b/common/kvstore/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.1
+3.4.2-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index aa8efeb8143..5d7d3a4d9a7 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.1
+3.4.2-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index f497782888d..9ec6bca498f 100644
--- a/common/network-shuffle/pom.xml
+++ b/common/network-shuffle/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.1
+3.4.2-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml
index 72d60f69160..9b3bf5136f2 100644
--- a/common/network-yarn/pom.xml
+++ b/common/network-yarn/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.1
+3.4.2-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml
index e9c11bb2d6b..265

[spark] branch branch-3.4 updated (864b9869949 -> b24511f0306)

2023-06-19 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


from 864b9869949 [SPARK-44018][SQL] Improve the hashCode and toString for 
some DS V2 Expression
 add 6b1ff22dde1 Preparing Spark release v3.4.1-rc1
 new b24511f0306 Preparing development version 3.4.2-SNAPSHOT

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 R/pkg/DESCRIPTION  | 2 +-
 assembly/pom.xml   | 2 +-
 common/kvstore/pom.xml | 2 +-
 common/network-common/pom.xml  | 2 +-
 common/network-shuffle/pom.xml | 2 +-
 common/network-yarn/pom.xml| 2 +-
 common/sketch/pom.xml  | 2 +-
 common/tags/pom.xml| 2 +-
 common/unsafe/pom.xml  | 2 +-
 connector/avro/pom.xml | 2 +-
 connector/connect/client/jvm/pom.xml   | 2 +-
 connector/connect/common/pom.xml   | 2 +-
 connector/connect/server/pom.xml   | 2 +-
 connector/docker-integration-tests/pom.xml | 2 +-
 connector/kafka-0-10-assembly/pom.xml  | 2 +-
 connector/kafka-0-10-sql/pom.xml   | 2 +-
 connector/kafka-0-10-token-provider/pom.xml| 2 +-
 connector/kafka-0-10/pom.xml   | 2 +-
 connector/kinesis-asl-assembly/pom.xml | 2 +-
 connector/kinesis-asl/pom.xml  | 2 +-
 connector/protobuf/pom.xml | 2 +-
 connector/spark-ganglia-lgpl/pom.xml   | 2 +-
 core/pom.xml   | 2 +-
 docs/_config.yml   | 6 +++---
 examples/pom.xml   | 2 +-
 graphx/pom.xml | 2 +-
 hadoop-cloud/pom.xml   | 2 +-
 launcher/pom.xml   | 2 +-
 mllib-local/pom.xml| 2 +-
 mllib/pom.xml  | 2 +-
 pom.xml| 2 +-
 python/pyspark/version.py  | 2 +-
 repl/pom.xml   | 2 +-
 resource-managers/kubernetes/core/pom.xml  | 2 +-
 resource-managers/kubernetes/integration-tests/pom.xml | 2 +-
 resource-managers/mesos/pom.xml| 2 +-
 resource-managers/yarn/pom.xml | 2 +-
 sql/catalyst/pom.xml   | 2 +-
 sql/core/pom.xml   | 2 +-
 sql/hive-thriftserver/pom.xml  | 2 +-
 sql/hive/pom.xml   | 2 +-
 streaming/pom.xml  | 2 +-
 tools/pom.xml  | 2 +-
 43 files changed, 45 insertions(+), 45 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] tag v3.4.1-rc1 created (now 6b1ff22dde1)

2023-06-19 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to tag v3.4.1-rc1
in repository https://gitbox.apache.org/repos/asf/spark.git


  at 6b1ff22dde1 (commit)
This tag includes the following new commits:

 new 6b1ff22dde1 Preparing Spark release v3.4.1-rc1

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] 01/01: Preparing Spark release v3.4.1-rc1

2023-06-19 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to tag v3.4.1-rc1
in repository https://gitbox.apache.org/repos/asf/spark.git

commit 6b1ff22dde1ead51cbf370be6e48a802daae58b6
Author: Dongjoon Hyun 
AuthorDate: Mon Jun 19 22:17:28 2023 +

Preparing Spark release v3.4.1-rc1
---
 assembly/pom.xml   | 2 +-
 common/kvstore/pom.xml | 2 +-
 common/network-common/pom.xml  | 2 +-
 common/network-shuffle/pom.xml | 2 +-
 common/network-yarn/pom.xml| 2 +-
 common/sketch/pom.xml  | 2 +-
 common/tags/pom.xml| 2 +-
 common/unsafe/pom.xml  | 2 +-
 connector/avro/pom.xml | 2 +-
 connector/connect/client/jvm/pom.xml   | 2 +-
 connector/connect/common/pom.xml   | 2 +-
 connector/connect/server/pom.xml   | 2 +-
 connector/docker-integration-tests/pom.xml | 2 +-
 connector/kafka-0-10-assembly/pom.xml  | 2 +-
 connector/kafka-0-10-sql/pom.xml   | 2 +-
 connector/kafka-0-10-token-provider/pom.xml| 2 +-
 connector/kafka-0-10/pom.xml   | 2 +-
 connector/kinesis-asl-assembly/pom.xml | 2 +-
 connector/kinesis-asl/pom.xml  | 2 +-
 connector/protobuf/pom.xml | 2 +-
 connector/spark-ganglia-lgpl/pom.xml   | 2 +-
 core/pom.xml   | 2 +-
 docs/_config.yml   | 2 +-
 examples/pom.xml   | 2 +-
 graphx/pom.xml | 2 +-
 hadoop-cloud/pom.xml   | 2 +-
 launcher/pom.xml   | 2 +-
 mllib-local/pom.xml| 2 +-
 mllib/pom.xml  | 2 +-
 pom.xml| 2 +-
 python/pyspark/version.py  | 2 +-
 repl/pom.xml   | 2 +-
 resource-managers/kubernetes/core/pom.xml  | 2 +-
 resource-managers/kubernetes/integration-tests/pom.xml | 2 +-
 resource-managers/mesos/pom.xml| 2 +-
 resource-managers/yarn/pom.xml | 2 +-
 sql/catalyst/pom.xml   | 2 +-
 sql/core/pom.xml   | 2 +-
 sql/hive-thriftserver/pom.xml  | 2 +-
 sql/hive/pom.xml   | 2 +-
 streaming/pom.xml  | 2 +-
 tools/pom.xml  | 2 +-
 42 files changed, 42 insertions(+), 42 deletions(-)

diff --git a/assembly/pom.xml b/assembly/pom.xml
index b86fee4bceb..b09ffdad3ff 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.1-SNAPSHOT
+3.4.1
 ../pom.xml
   
 
diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml
index f9ecfb3d692..bb5467aa0e7 100644
--- a/common/kvstore/pom.xml
+++ b/common/kvstore/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.1-SNAPSHOT
+3.4.1
 ../../pom.xml
   
 
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index 22ee65b7d25..aa8efeb8143 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.1-SNAPSHOT
+3.4.1
 ../../pom.xml
   
 
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index 2c67da81ca4..f497782888d 100644
--- a/common/network-shuffle/pom.xml
+++ b/common/network-shuffle/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.1-SNAPSHOT
+3.4.1
 ../../pom.xml
   
 
diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml
index 219682e047d..72d60f69160 100644
--- a/common/network-yarn/pom.xml
+++ b/common/network-yarn/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.1-SNAPSHOT
+3.4.1
 ../../pom.xml
   
 
diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml
index 22ce78c6fd2..e9c11bb2d6b 100644
--- a/common/sketch/pom.xml
+++ b/common/sketch/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.1-SNAPSHOT
+3.4.1
 ../../pom.xml
   
 
diff --git a/common/tags/pom.xml b/common/tags/pom.xml
index 7b1a8527837..f4212d871b0 100644
--- a/common/tags/pom.xml
+++ b/common/tags/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.1-SNAPSHOT
+3.

[spark] branch master updated: [SPARK-44054][CORE][TESTS] Make test cases inherit `SparkFunSuite` have a default timeout

2023-06-19 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 74185cf34a2 [SPARK-44054][CORE][TESTS] Make test cases inherit 
`SparkFunSuite` have a default timeout
74185cf34a2 is described below

commit 74185cf34a20f7a3ac07ffe06dd056d265cd5f74
Author: yangjie01 
AuthorDate: Mon Jun 19 13:05:47 2023 -0700

[SPARK-44054][CORE][TESTS] Make test cases inherit `SparkFunSuite` have a 
default timeout

### What changes were proposed in this pull request?
This pr use `failAfter` to wrap the `testBody` of `SparkFunSuite#test` to 
control the test timeout, and add an un-document config `spark.test.timeout` 
with default value 20 minutes in this pr, the test inherit `SparkFunSuite` will 
fail with `TestFailedDueToTimeoutException` when test timeout.

### Why are the changes needed?
Avoid GA task times out due to test case blocks.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
- Pass Github Actions
- manual checked.

Closes #41590 from LuciferYang/add-failAfter.

Authored-by: yangjie01 
Signed-off-by: Dongjoon Hyun 
---
 .../test/scala/org/apache/spark/sql/kafka010/KafkaSinkSuite.scala | 1 -
 core/src/test/scala/org/apache/spark/SparkFunSuite.scala  | 8 +++-
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git 
a/connector/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaSinkSuite.scala
 
b/connector/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaSinkSuite.scala
index f54eff90a5e..3a400c657ba 100644
--- 
a/connector/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaSinkSuite.scala
+++ 
b/connector/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaSinkSuite.scala
@@ -27,7 +27,6 @@ import org.apache.kafka.clients.producer.ProducerConfig
 import org.apache.kafka.clients.producer.internals.DefaultPartitioner
 import org.apache.kafka.common.Cluster
 import org.apache.kafka.common.serialization.ByteArraySerializer
-import org.scalatest.concurrent.TimeLimits.failAfter
 import org.scalatest.time.SpanSugar._
 
 import org.apache.spark.{SparkConf, SparkContext, SparkException, TestUtils}
diff --git a/core/src/test/scala/org/apache/spark/SparkFunSuite.scala 
b/core/src/test/scala/org/apache/spark/SparkFunSuite.scala
index ff12f643497..692e3215aef 100644
--- a/core/src/test/scala/org/apache/spark/SparkFunSuite.scala
+++ b/core/src/test/scala/org/apache/spark/SparkFunSuite.scala
@@ -33,7 +33,9 @@ import org.apache.logging.log4j.core.appender.AbstractAppender
 import org.apache.logging.log4j.core.config.Property
 import org.scalactic.source.Position
 import org.scalatest.{BeforeAndAfter, BeforeAndAfterAll, BeforeAndAfterEach, 
Failed, Outcome, Tag}
+import org.scalatest.concurrent.TimeLimits
 import org.scalatest.funsuite.AnyFunSuite // scalastyle:ignore funsuite
+import org.scalatest.time._ // scalastyle:ignore
 
 import org.apache.spark.deploy.LocalSparkCluster
 import org.apache.spark.internal.Logging
@@ -69,6 +71,7 @@ abstract class SparkFunSuite
   with BeforeAndAfterAll
   with BeforeAndAfterEach
   with ThreadAudit
+  with TimeLimits
   with Logging {
 // scalastyle:on
 
@@ -147,7 +150,10 @@ abstract class SparkFunSuite
 if (excluded.contains(testName)) {
   ignore(s"$testName (excluded)")(testBody)
 } else {
-  super.test(testName, testTags: _*)(testBody)
+  val timeout = sys.props.getOrElse("spark.test.timeout", "20").toLong
+  super.test(testName, testTags: _*)(
+failAfter(Span(timeout, Minutes))(testBody)
+  )
 }
   }
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-43982][ML][PYTHON][CONNECT] Implement pipeline estimator for ML on spark connect

2023-06-19 Thread weichenxu123

This is an automated email from the ASF dual-hosted git repository.

weichenxu123 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 6c0c226d901 [SPARK-43982][ML][PYTHON][CONNECT] Implement pipeline 
estimator for ML on spark connect
6c0c226d901 is described below

commit 6c0c226d90192e54a4965b6d69905936619e20d6
Author: Weichen Xu 
AuthorDate: Mon Jun 19 21:36:21 2023 +0800

[SPARK-43982][ML][PYTHON][CONNECT] Implement pipeline estimator for ML on 
spark connect

### What changes were proposed in this pull request?

Implement pipeline estimator for ML on spark connect

### Why are the changes needed?

See Distributed ML <> spark connect project design doc:

https://docs.google.com/document/d/1LHzwCjm2SluHkta_08cM3jxFSgfF-niaCZbtIThG-H8/edit#heading=h.x8uc4xogrzbk

### Does this PR introduce _any_ user-facing change?

Yes. New estimator `pyspark.mlv2.pipeline.Pipeline` is added.

### How was this patch tested?

Unit tests.

Closes #41479 from WeichenXu123/mlv2-pipeline.

Authored-by: Weichen Xu 
Signed-off-by: Weichen Xu 
---
 python/pyspark/mlv2/__init__.py|   4 +
 python/pyspark/mlv2/classification.py  |   6 +-
 python/pyspark/mlv2/feature.py |   6 +-
 python/pyspark/mlv2/io_utils.py| 187 ++
 python/pyspark/mlv2/pipeline.py| 241 +
 python/pyspark/mlv2/tests/test_pipeline.py | 184 ++
 6 files changed, 561 insertions(+), 67 deletions(-)

diff --git a/python/pyspark/mlv2/__init__.py b/python/pyspark/mlv2/__init__.py
index 990b4fa9da8..352d24baabe 100644
--- a/python/pyspark/mlv2/__init__.py
+++ b/python/pyspark/mlv2/__init__.py
@@ -26,6 +26,8 @@ from pyspark.mlv2 import (
 evaluation,
 )
 
+from pyspark.mlv2.pipeline import Pipeline, PipelineModel
+
 __all__ = [
 "Estimator",
 "Transformer",
@@ -33,4 +35,6 @@ __all__ = [
 "Model",
 "feature",
 "evaluation",
+"Pipeline",
+"PipelineModel",
 ]
diff --git a/python/pyspark/mlv2/classification.py 
b/python/pyspark/mlv2/classification.py
index fe0d76837f9..522c54b5289 100644
--- a/python/pyspark/mlv2/classification.py
+++ b/python/pyspark/mlv2/classification.py
@@ -40,7 +40,7 @@ from pyspark.ml.param.shared import (
 HasMomentum,
 )
 from pyspark.mlv2.base import Predictor, PredictionModel
-from pyspark.mlv2.io_utils import ParamsReadWrite, ModelReadWrite
+from pyspark.mlv2.io_utils import ParamsReadWrite, CoreModelReadWrite
 from pyspark.sql.functions import lit, count, countDistinct
 
 import torch
@@ -253,7 +253,9 @@ class LogisticRegression(
 
 
 @inherit_doc
-class LogisticRegressionModel(PredictionModel, _LogisticRegressionParams, 
ModelReadWrite):
+class LogisticRegressionModel(
+PredictionModel, _LogisticRegressionParams, ParamsReadWrite, 
CoreModelReadWrite
+):
 """
 Model fitted by LogisticRegression.
 
diff --git a/python/pyspark/mlv2/feature.py b/python/pyspark/mlv2/feature.py
index 57c6213d2bb..a58f214711c 100644
--- a/python/pyspark/mlv2/feature.py
+++ b/python/pyspark/mlv2/feature.py
@@ -24,7 +24,7 @@ from pyspark import keyword_only
 from pyspark.sql import DataFrame
 from pyspark.ml.param.shared import HasInputCol, HasOutputCol
 from pyspark.mlv2.base import Estimator, Model
-from pyspark.mlv2.io_utils import ParamsReadWrite, ModelReadWrite
+from pyspark.mlv2.io_utils import ParamsReadWrite, CoreModelReadWrite
 from pyspark.mlv2.summarizer import summarize_dataframe
 
 
@@ -61,7 +61,7 @@ class MaxAbsScaler(Estimator, HasInputCol, HasOutputCol, 
ParamsReadWrite):
 return self._copyValues(model)
 
 
-class MaxAbsScalerModel(Model, HasInputCol, HasOutputCol, ModelReadWrite):
+class MaxAbsScalerModel(Model, HasInputCol, HasOutputCol, ParamsReadWrite, 
CoreModelReadWrite):
 def __init__(
 self, max_abs_values: Optional["np.ndarray"] = None, n_samples_seen: 
Optional[int] = None
 ) -> None:
@@ -143,7 +143,7 @@ class StandardScaler(Estimator, HasInputCol, HasOutputCol, 
ParamsReadWrite):
 return self._copyValues(model)
 
 
-class StandardScalerModel(Model, HasInputCol, HasOutputCol, ModelReadWrite):
+class StandardScalerModel(Model, HasInputCol, HasOutputCol, ParamsReadWrite, 
CoreModelReadWrite):
 def __init__(
 self,
 mean_values: Optional["np.ndarray"] = None,
diff --git a/python/pyspark/mlv2/io_utils.py b/python/pyspark/mlv2/io_utils.py
index 8f7263206a7..c701736712f 100644
--- a/python/pyspark/mlv2/io_utils.py
+++ b/python/pyspark/mlv2/io_utils.py
@@ -21,7 +21,8 @@ import os
 import tempfile
 import time
 from urllib.parse import urlparse
-from typing import Any, Dict, Optional
+from typing import Any, Dict, List
+from pyspark.ml.base import Params
 from pyspark.ml.util import _get_active_session
 from pyspark.sql.utils imp

[spark] branch master updated: [SPARK-43203][SQL] Move all Drop Table case to DataSource V2

2023-06-19 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 32a5db49e3e [SPARK-43203][SQL] Move all Drop Table case to DataSource 
V2
32a5db49e3e is described below

commit 32a5db49e3e62891736a8544dba440d5399a12c4
Author: Jia Fan 
AuthorDate: Mon Jun 19 20:43:02 2023 +0800

[SPARK-43203][SQL] Move all Drop Table case to DataSource V2



### What changes were proposed in this pull request?
In order to fix DROP table behavior in session catalog cause by #37879. 
Because we always invoke V1 drop logic if the identifier looks like a V1 
identifier. This is a big blocker for external data sources that provide custom 
session catalogs.
So this PR move all Drop Table case to DataSource V2 (use drop table to 
drop view not include). More information please check 
https://github.com/apache/spark/pull/37879/files#r1170501180



### Why are the changes needed?
Move Drop Table case to DataSource V2 to fix bug and prepare for remove 
drop table v1.


### Does this PR introduce _any_ user-facing change?
No


### How was this patch tested?
Tested by:
- V2 table catalog tests: 
`org.apache.spark.sql.execution.command.v2.DropTableSuite`
- V1 table catalog tests: 
`org.apache.spark.sql.execution.command.v1.DropTableSuiteBase`


Closes #41348 from Hisoka-X/SPARK-43203_drop_table_to_v2.

Authored-by: Jia Fan 
Signed-off-by: Wenchen Fan 
---
 .../catalyst/analysis/ResolveSessionCatalog.scala  |  2 +-
 .../apache/spark/sql/execution/CacheManager.scala  |  6 
 .../datasources/v2/V2SessionCatalog.scala  | 40 -
 .../ansi/conditional-functions.sql.out |  3 +-
 .../ansi/decimalArithmeticOperations.sql.out   |  3 +-
 .../analyzer-results/change-column.sql.out |  3 +-
 .../sql-tests/analyzer-results/charvarchar.sql.out | 12 ---
 .../columnresolution-negative.sql.out  |  6 ++--
 .../decimalArithmeticOperations.sql.out|  3 +-
 .../describe-part-after-analyze.sql.out|  3 +-
 .../analyzer-results/describe-query.sql.out|  6 ++--
 .../describe-table-after-alter-table.sql.out   |  6 ++--
 .../sql-tests/analyzer-results/describe.sql.out|  3 +-
 .../sql-tests/analyzer-results/explain-aqe.sql.out | 18 ++
 .../sql-tests/analyzer-results/explain-cbo.sql.out |  6 ++--
 .../sql-tests/analyzer-results/explain.sql.out | 18 ++
 .../analyzer-results/identifier-clause.sql.out | 15 +---
 .../analyzer-results/null-handling.sql.out |  3 +-
 .../order-by-nulls-ordering.sql.out|  6 ++--
 .../analyzer-results/postgreSQL/boolean.sql.out| 12 ---
 .../analyzer-results/postgreSQL/case.sql.out   |  6 ++--
 .../postgreSQL/create_view.sql.out |  9 +++--
 .../analyzer-results/postgreSQL/date.sql.out   |  3 +-
 .../analyzer-results/postgreSQL/float4.sql.out |  3 +-
 .../analyzer-results/postgreSQL/float8.sql.out |  3 +-
 .../postgreSQL/groupingsets.sql.out| 12 ---
 .../analyzer-results/postgreSQL/insert.sql.out |  3 +-
 .../analyzer-results/postgreSQL/int2.sql.out   |  3 +-
 .../analyzer-results/postgreSQL/int4.sql.out   |  3 +-
 .../analyzer-results/postgreSQL/int8.sql.out   |  3 +-
 .../analyzer-results/postgreSQL/join.sql.out   | 21 +++
 .../analyzer-results/postgreSQL/numeric.sql.out| 42 ++
 .../analyzer-results/postgreSQL/select.sql.out |  3 +-
 .../postgreSQL/select_having.sql.out   |  3 +-
 .../postgreSQL/select_implicit.sql.out |  3 +-
 .../analyzer-results/postgreSQL/strings.sql.out|  3 +-
 .../analyzer-results/postgreSQL/text.sql.out   |  3 +-
 .../analyzer-results/postgreSQL/timestamp.sql.out  |  3 +-
 .../postgreSQL/window_part2.sql.out|  6 ++--
 .../postgreSQL/window_part3.sql.out|  9 +++--
 .../analyzer-results/postgreSQL/with.sql.out   | 27 +-
 .../analyzer-results/show-create-table.sql.out | 33 +++--
 .../sql-tests/analyzer-results/show-tables.sql.out |  6 ++--
 .../analyzer-results/show-tblproperties.sql.out|  3 +-
 .../analyzer-results/show_columns.sql.out  |  6 ++--
 .../udf/postgreSQL/udf-case.sql.out|  6 ++--
 .../udf/postgreSQL/udf-join.sql.out| 27 +-
 .../udf/postgreSQL/udf-select_having.sql.out   |  3 +-
 .../udf/postgreSQL/udf-select_implicit.sql.out |  3 +-
 .../execution/command/PlanResolutionSuite.scala| 17 -
 .../hive/execution/command/DropTableSuite.scala|  2 +-
 51 files changed, 304 insertions(+), 147 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/R

[spark] branch master updated: [SPARK-43942][CONNECT][PYTHON] Add string functions to Scala and Python - part 1

2023-06-19 Thread ruifengz

This is an automated email from the ASF dual-hosted git repository.

ruifengz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 476c58ed26a [SPARK-43942][CONNECT][PYTHON] Add string functions to 
Scala and Python - part 1
476c58ed26a is described below

commit 476c58ed26a1155fabe5afe1eb502bb992f31954
Author: panbingkun 
AuthorDate: Mon Jun 19 20:36:18 2023 +0800

[SPARK-43942][CONNECT][PYTHON] Add string functions to Scala and Python - 
part 1

### What changes were proposed in this pull request?
Add following functions:

- char
- btrim
- char_length
- character_length
- chr
- contains
- elt
- find_in_set
- like
- ilike
- lcase
- ucase
- ~~len: Because it conflicts with Python keywords, and we already have 
`length`~~
- left
- right

to:

- Scala API
- Python API
- Spark Connect Scala Client
- Spark Connect Python Client

### Why are the changes needed?
for parity

### Does this PR introduce _any_ user-facing change?
Yes, new functions.

### How was this patch tested?
- Add New UT.

Closes #41561 from panbingkun/SPARK-43942.

Authored-by: panbingkun 
Signed-off-by: Ruifeng Zheng 
---
 .../scala/org/apache/spark/sql/functions.scala | 160 +
 .../apache/spark/sql/PlanGenerationTestSuite.scala |  68 
 .../explain-results/function_btrim.explain |   2 +
 ...nction_btrim_with_specified_trim_string.explain |   2 +
 .../explain-results/function_char.explain  |   2 +
 .../explain-results/function_char_length.explain   |   2 +
 .../function_character_length.explain  |   2 +
 .../explain-results/function_chr.explain   |   2 +
 .../explain-results/function_contains.explain  |   2 +
 .../explain-results/function_elt.explain   |   2 +
 .../explain-results/function_find_in_set.explain   |   2 +
 .../explain-results/function_ilike.explain |   2 +
 .../function_ilike_with_escape.explain |   2 +
 .../explain-results/function_lcase.explain |   2 +
 .../explain-results/function_left.explain  |   2 +
 .../explain-results/function_like.explain  |   2 +
 .../function_like_with_escape.explain  |   2 +
 .../explain-results/function_right.explain |   2 +
 .../explain-results/function_ucase.explain |   2 +
 .../query-tests/queries/function_btrim.json|  25 ++
 .../query-tests/queries/function_btrim.proto.bin   | Bin 0 -> 174 bytes
 .../function_btrim_with_specified_trim_string.json |  29 ++
 ...tion_btrim_with_specified_trim_string.proto.bin | Bin 0 -> 181 bytes
 .../query-tests/queries/function_char.json |  25 ++
 .../query-tests/queries/function_char.proto.bin| Bin 0 -> 173 bytes
 .../query-tests/queries/function_char_length.json  |  25 ++
 .../queries/function_char_length.proto.bin | Bin 0 -> 180 bytes
 .../queries/function_character_length.json |  25 ++
 .../queries/function_character_length.proto.bin| Bin 0 -> 185 bytes
 .../query-tests/queries/function_chr.json  |  25 ++
 .../query-tests/queries/function_chr.proto.bin | Bin 0 -> 172 bytes
 .../query-tests/queries/function_contains.json |  29 ++
 .../queries/function_contains.proto.bin| Bin 0 -> 184 bytes
 .../query-tests/queries/function_elt.json  |  33 ++
 .../query-tests/queries/function_elt.proto.bin | Bin 0 -> 186 bytes
 .../query-tests/queries/function_find_in_set.json  |  29 ++
 .../queries/function_find_in_set.proto.bin | Bin 0 -> 187 bytes
 .../query-tests/queries/function_ilike.json|  29 ++
 .../query-tests/queries/function_ilike.proto.bin   | Bin 0 -> 181 bytes
 .../queries/function_ilike_with_escape.json|  33 ++
 .../queries/function_ilike_with_escape.proto.bin   | Bin 0 -> 188 bytes
 .../query-tests/queries/function_lcase.json|  25 ++
 .../query-tests/queries/function_lcase.proto.bin   | Bin 0 -> 174 bytes
 .../query-tests/queries/function_left.json |  29 ++
 .../query-tests/queries/function_left.proto.bin| Bin 0 -> 180 bytes
 .../query-tests/queries/function_like.json |  29 ++
 .../query-tests/queries/function_like.proto.bin| Bin 0 -> 180 bytes
 .../queries/function_like_with_escape.json |  33 ++
 .../queries/function_like_with_escape.proto.bin| Bin 0 -> 187 bytes
 .../query-tests/queries/function_right.json|  29 ++
 .../query-tests/queries/function_right.proto.bin   | Bin 0 -> 181 bytes
 .../query-tests/queries/function_ucase.json|  25 ++
 .../query-tests/queries/function_ucase.proto.bin   | Bin 0 -> 174 bytes
 .../sql/connect/planner/SparkConnectPlanner.scala  |  12 +
 core/src/main/resources/error/error-classes.json   |   5 +
 .../source/reference/pyspark.sql/functions.rst |  14 +

[spark] branch master updated: [SPARK-44096][PYTHOM][DOCS] Make examples copy-pastable by adding a newline in all modules

2023-06-19 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 0fc7eeb39aa [SPARK-44096][PYTHOM][DOCS] Make examples copy-pastable by 
adding a newline in all modules
0fc7eeb39aa is described below

commit 0fc7eeb39aad5997912c8a3f82aea089a4985898
Author: Hyukjin Kwon 
AuthorDate: Mon Jun 19 13:34:42 2023 +0300

[SPARK-44096][PYTHOM][DOCS] Make examples copy-pastable by adding a newline 
in all modules

### What changes were proposed in this pull request?

I found that there are many instances same as 
https://github.com/apache/spark/pull/41655. This PR aims to address all the 
examples in all components in PySpark.

### Why are the changes needed?

See https://github.com/apache/spark/pull/41655.

### Does this PR introduce _any_ user-facing change?

Yes, it changes the documentation and makes the example copy-pastable, see 
also https://github.com/apache/spark/pull/41655.

### How was this patch tested?

CI in this PR should validate them. This is logically the same as 
https://github.com/apache/spark/pull/41655. I will also build the documentation 
locally and test.

Closes #41657 from HyukjinKwon/minor-newlines.

Authored-by: Hyukjin Kwon 
Signed-off-by: Max Gekk 
---
 python/pyspark/accumulators.py |  4 
 python/pyspark/context.py  |  4 
 python/pyspark/ml/functions.py | 21 +++--
 python/pyspark/ml/torch/distributor.py |  2 ++
 python/pyspark/mllib/clustering.py |  2 ++
 python/pyspark/rdd.py  |  9 +
 python/pyspark/sql/dataframe.py|  4 
 python/pyspark/sql/functions.py|  4 
 python/pyspark/sql/pandas/group_ops.py |  6 ++
 python/pyspark/sql/streaming/query.py  |  2 ++
 python/pyspark/sql/types.py|  1 +
 python/pyspark/sql/udtf.py |  1 +
 12 files changed, 46 insertions(+), 14 deletions(-)

diff --git a/python/pyspark/accumulators.py b/python/pyspark/accumulators.py
index dc8520a844d..a95bd9debfc 100644
--- a/python/pyspark/accumulators.py
+++ b/python/pyspark/accumulators.py
@@ -88,12 +88,14 @@ class Accumulator(Generic[T]):
 >>> def f(x):
 ... global a
 ... a += x
+...
 >>> rdd.foreach(f)
 >>> a.value
 13
 >>> b = sc.accumulator(0)
 >>> def g(x):
 ... b.add(x)
+...
 >>> rdd.foreach(g)
 >>> b.value
 6
@@ -106,6 +108,7 @@ class Accumulator(Generic[T]):
 >>> def h(x):
 ... global a
 ... a.value = 7
+...
 >>> rdd.foreach(h) # doctest: +IGNORE_EXCEPTION_DETAIL
 Traceback (most recent call last):
 ...
@@ -198,6 +201,7 @@ class AccumulatorParam(Generic[T]):
 >>> def g(x):
 ... global va
 ... va += [x] * 3
+...
 >>> rdd = sc.parallelize([1,2,3])
 >>> rdd.foreach(g)
 >>> va.value
diff --git a/python/pyspark/context.py b/python/pyspark/context.py
index 6f5094963be..51a4db67e8c 100644
--- a/python/pyspark/context.py
+++ b/python/pyspark/context.py
@@ -1802,6 +1802,7 @@ class SparkContext:
 >>> def f(x):
 ... global acc
 ... acc += 1
+...
 >>> rdd.foreach(f)
 >>> acc.value
 15
@@ -2140,6 +2141,7 @@ class SparkContext:
 >>> def map_func(x):
 ... sleep(100)
 ... raise RuntimeError("Task should have been cancelled")
+...
 >>> def start_job(x):
 ... global result
 ... try:
@@ -2148,9 +2150,11 @@ class SparkContext:
 ... except Exception as e:
 ... result = "Cancelled"
 ... lock.release()
+...
 >>> def stop_job():
 ... sleep(5)
 ... sc.cancelJobGroup("job_to_cancel")
+...
 >>> suppress = lock.acquire()
 >>> suppress = InheritableThread(target=start_job, args=(10,)).start()
 >>> suppress = InheritableThread(target=stop_job).start()
diff --git a/python/pyspark/ml/functions.py b/python/pyspark/ml/functions.py
index bce4101df1e..89b05b692ea 100644
--- a/python/pyspark/ml/functions.py
+++ b/python/pyspark/ml/functions.py
@@ -512,11 +512,10 @@ def predict_batch_udf(
 ... # outputs.shape = [batch_size]
 ... return inputs * 2
 ... return predict
->>>
+...
 >>> times_two_udf = predict_batch_udf(make_times_two_fn,
 ...   return_type=FloatType(),
 ...   batch_size=10)
->>>
 >>> df = spark.createDataFrame(pd.DataFrame(np.arange(100)))
 >>> df.withColumn("x2", times_two_udf("0")).show(5)
 +---+---+
@@ -561,12 +560,11 @@ def predict_batch_udf(
 ... # outp

[spark] branch master updated: [SPARK-40497][BUILD] Upgrade Scala to 2.13.11

2023-06-19 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new f21303fb17f [SPARK-40497][BUILD] Upgrade Scala to 2.13.11
f21303fb17f is described below

commit f21303fb17f969661da22e21e3c3687d4fe4c69a
Author: yangjie01 
AuthorDate: Mon Jun 19 03:16:24 2023 -0700

[SPARK-40497][BUILD] Upgrade Scala to 2.13.11

### What changes were proposed in this pull request?
This PR aims to upgrade Scala to 2.13.11
- https://www.scala-lang.org/news/2.13.11

Additionally, this pr adds a new suppression rule for warning message: 
`Implicit definition should have explicit type`, this is a new compile check 
introduced by https://github.com/scala/scala/pull/10083, we must fix it when we 
upgrading to use Scala 3,

### Why are the changes needed?
This release improves collections, adds support for JDK 20 and 21, adds 
support for JDK 17 `sealed`:
- https://github.com/scala/scala/pull/10363
- https://github.com/scala/scala/pull/10184
- https://github.com/scala/scala/pull/10397
- https://github.com/scala/scala/pull/10348
- https://github.com/scala/scala/pull/10105

There are 2 known issues in this version:

- https://github.com/scala/bug/issues/12800
- https://github.com/scala/bug/issues/12799

For the first one, there is no compilation warning messages related to 
`match may not be exhaustive` in Spark compile log, and for the second one, 
there is no case of `method.isAnnotationPresent(Deprecated.class)` in Spark 
code, there is just


https://github.com/apache/spark/blob/8c84d2c9349d7b607db949c2e114df781f23e438/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/JavaTypeInference.scala#L130

in Spark Code, and I checked `javax.annotation.Nonnull` no this issue.

So I think These two issues will not affect Spark itself, but this doesn't 
mean it won't affect the code written by end users themselves

The full release notes as follows:

- https://github.com/scala/scala/releases/tag/v2.13.11

### Does this PR introduce _any_ user-facing change?
Yes, this is a Scala version change.

### How was this patch tested?
- Existing Test
- Checked Java 8/17 + Scala 2.13.11 using GA, all test passed

Java 8 + Scala 2.13.11: 
https://github.com/LuciferYang/spark/runs/14337279564
Java 17 + Scala 2.13.11: 
https://github.com/LuciferYang/spark/runs/14343012195

Closes #41626 from LuciferYang/SPARK-40497.

Authored-by: yangjie01 
Signed-off-by: Dongjoon Hyun 
---
 pom.xml  | 6 +-
 project/SparkBuild.scala | 4 +++-
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/pom.xml b/pom.xml
index a5f6d5f9981..c322112fbc6 100644
--- a/pom.xml
+++ b/pom.xml
@@ -3569,7 +3569,7 @@
 
   scala-2.13
   
-2.13.8
+2.13.11
 2.13
   
   
@@ -3628,6 +3628,10 @@
   -->
   
-Wconf:cat=unused-imports&src=org\/apache\/spark\/graphx\/impl\/VertexPartitionBase.scala:s
   
-Wconf:cat=unused-imports&src=org\/apache\/spark\/graphx\/impl\/VertexPartitionBaseOps.scala:s
+  
+  -Wconf:msg=Implicit definition should have explicit 
type:s
 
 
 
diff --git a/project/SparkBuild.scala b/project/SparkBuild.scala
index 65da2f7ba6b..607daa67138 100644
--- a/project/SparkBuild.scala
+++ b/project/SparkBuild.scala
@@ -286,7 +286,9 @@ object SparkBuild extends PomBuild {
   // TODO(SPARK-43850): Remove the following suppression rules and 
remove `import scala.language.higherKinds`
   // from the corresponding files when Scala 2.12 is no longer 
supported.
   
"-Wconf:cat=unused-imports&src=org\\/apache\\/spark\\/graphx\\/impl\\/VertexPartitionBase.scala:s",
-  
"-Wconf:cat=unused-imports&src=org\\/apache\\/spark\\/graphx\\/impl\\/VertexPartitionBaseOps.scala:s"
+  
"-Wconf:cat=unused-imports&src=org\\/apache\\/spark\\/graphx\\/impl\\/VertexPartitionBaseOps.scala:s",
+  // SPARK-40497 Upgrade Scala to 2.13.11 and suppress `Implicit 
definition should have explicit type`
+  "-Wconf:msg=Implicit definition should have explicit type:s"
 )
   }
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.4 updated: [SPARK-44018][SQL] Improve the hashCode and toString for some DS V2 Expression

2023-06-19 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new 864b9869949 [SPARK-44018][SQL] Improve the hashCode and toString for 
some DS V2 Expression
864b9869949 is described below

commit 864b9869949dbf5dd538a2d5dc59f2894d72af1c
Author: Jiaan Geng 
AuthorDate: Mon Jun 19 15:55:06 2023 +0800

[SPARK-44018][SQL] Improve the hashCode and toString for some DS V2 
Expression

### What changes were proposed in this pull request?
The `hashCode() `of `UserDefinedScalarFunc` and `GeneralScalarExpression` 
is not good enough. Take for example, `GeneralScalarExpression` uses 
`Objects.hash(name, children)`, it adopt the hash code of `name` and 
`children`'s reference and then combine them together as the 
`GeneralScalarExpression`'s hash code.
In fact, we should adopt the hash code for each element in `children`.

Because `UserDefinedAggregateFunc` and `GeneralAggregateFunc` missing 
`hashCode()`, this PR also want add them.

This PR also improve the toString for `UserDefinedAggregateFunc` and 
`GeneralAggregateFunc` by using bool primitive comparison instead 
`Objects.equals`. Because the performance of bool primitive comparison better 
than `Objects.equals`.

### Why are the changes needed?
Improve the hash code for some DS V2 Expression.

### Does this PR introduce _any_ user-facing change?
'Yes'.

### How was this patch tested?
N/A

Closes #41543 from beliefer/SPARK-44018.

Authored-by: Jiaan Geng 
Signed-off-by: Wenchen Fan 
(cherry picked from commit 8c84d2c9349d7b607db949c2e114df781f23e438)
Signed-off-by: Wenchen Fan 
---
 .../expressions/GeneralScalarExpression.java   | 10 ++---
 .../expressions/UserDefinedScalarFunc.java | 13 
 .../aggregate/GeneralAggregateFunc.java| 22 
 .../aggregate/UserDefinedAggregateFunc.java| 24 ++
 4 files changed, 62 insertions(+), 7 deletions(-)

diff --git 
a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/expressions/GeneralScalarExpression.java
 
b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/expressions/GeneralScalarExpression.java
index cb9bf6d69e2..85966060021 100644
--- 
a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/expressions/GeneralScalarExpression.java
+++ 
b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/expressions/GeneralScalarExpression.java
@@ -18,7 +18,6 @@
 package org.apache.spark.sql.connector.expressions;
 
 import java.util.Arrays;
-import java.util.Objects;
 
 import org.apache.spark.annotation.Evolving;
 import org.apache.spark.sql.connector.expressions.filter.Predicate;
@@ -441,12 +440,17 @@ public class GeneralScalarExpression extends 
ExpressionWithToString {
   public boolean equals(Object o) {
 if (this == o) return true;
 if (o == null || getClass() != o.getClass()) return false;
+
 GeneralScalarExpression that = (GeneralScalarExpression) o;
-return Objects.equals(name, that.name) && Arrays.equals(children, 
that.children);
+
+if (!name.equals(that.name)) return false;
+return Arrays.equals(children, that.children);
   }
 
   @Override
   public int hashCode() {
-return Objects.hash(name, children);
+int result = name.hashCode();
+result = 31 * result + Arrays.hashCode(children);
+return result;
   }
 }
diff --git 
a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/expressions/UserDefinedScalarFunc.java
 
b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/expressions/UserDefinedScalarFunc.java
index b7f603cd431..cbf3941d77d 100644
--- 
a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/expressions/UserDefinedScalarFunc.java
+++ 
b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/expressions/UserDefinedScalarFunc.java
@@ -18,7 +18,6 @@
 package org.apache.spark.sql.connector.expressions;
 
 import java.util.Arrays;
-import java.util.Objects;
 
 import org.apache.spark.annotation.Evolving;
 import org.apache.spark.sql.internal.connector.ExpressionWithToString;
@@ -51,13 +50,19 @@ public class UserDefinedScalarFunc extends 
ExpressionWithToString {
   public boolean equals(Object o) {
 if (this == o) return true;
 if (o == null || getClass() != o.getClass()) return false;
+
 UserDefinedScalarFunc that = (UserDefinedScalarFunc) o;
-return Objects.equals(name, that.name) && Objects.equals(canonicalName, 
that.canonicalName) &&
-  Arrays.equals(children, that.children);
+
+if (!name.equals(that.name)) return false;
+if (!canonicalName.equals(that.canonicalName)) return false;
+return Arrays.equals(children, that.children);
   }
 
   @Override
   public int hashCode() {
-return Objects.hash(name, canonicalName, childre

[spark] branch master updated: [SPARK-44018][SQL] Improve the hashCode and toString for some DS V2 Expression

2023-06-19 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 8c84d2c9349 [SPARK-44018][SQL] Improve the hashCode and toString for 
some DS V2 Expression
8c84d2c9349 is described below

commit 8c84d2c9349d7b607db949c2e114df781f23e438
Author: Jiaan Geng 
AuthorDate: Mon Jun 19 15:55:06 2023 +0800

[SPARK-44018][SQL] Improve the hashCode and toString for some DS V2 
Expression

### What changes were proposed in this pull request?
The `hashCode() `of `UserDefinedScalarFunc` and `GeneralScalarExpression` 
is not good enough. Take for example, `GeneralScalarExpression` uses 
`Objects.hash(name, children)`, it adopt the hash code of `name` and 
`children`'s reference and then combine them together as the 
`GeneralScalarExpression`'s hash code.
In fact, we should adopt the hash code for each element in `children`.

Because `UserDefinedAggregateFunc` and `GeneralAggregateFunc` missing 
`hashCode()`, this PR also want add them.

This PR also improve the toString for `UserDefinedAggregateFunc` and 
`GeneralAggregateFunc` by using bool primitive comparison instead 
`Objects.equals`. Because the performance of bool primitive comparison better 
than `Objects.equals`.

### Why are the changes needed?
Improve the hash code for some DS V2 Expression.

### Does this PR introduce _any_ user-facing change?
'Yes'.

### How was this patch tested?
N/A

Closes #41543 from beliefer/SPARK-44018.

Authored-by: Jiaan Geng 
Signed-off-by: Wenchen Fan 
---
 .../expressions/GeneralScalarExpression.java   | 10 ++---
 .../expressions/UserDefinedScalarFunc.java | 13 
 .../aggregate/GeneralAggregateFunc.java| 22 
 .../aggregate/UserDefinedAggregateFunc.java| 24 ++
 4 files changed, 62 insertions(+), 7 deletions(-)

diff --git 
a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/expressions/GeneralScalarExpression.java
 
b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/expressions/GeneralScalarExpression.java
index cb9bf6d69e2..85966060021 100644
--- 
a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/expressions/GeneralScalarExpression.java
+++ 
b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/expressions/GeneralScalarExpression.java
@@ -18,7 +18,6 @@
 package org.apache.spark.sql.connector.expressions;
 
 import java.util.Arrays;
-import java.util.Objects;
 
 import org.apache.spark.annotation.Evolving;
 import org.apache.spark.sql.connector.expressions.filter.Predicate;
@@ -441,12 +440,17 @@ public class GeneralScalarExpression extends 
ExpressionWithToString {
   public boolean equals(Object o) {
 if (this == o) return true;
 if (o == null || getClass() != o.getClass()) return false;
+
 GeneralScalarExpression that = (GeneralScalarExpression) o;
-return Objects.equals(name, that.name) && Arrays.equals(children, 
that.children);
+
+if (!name.equals(that.name)) return false;
+return Arrays.equals(children, that.children);
   }
 
   @Override
   public int hashCode() {
-return Objects.hash(name, children);
+int result = name.hashCode();
+result = 31 * result + Arrays.hashCode(children);
+return result;
   }
 }
diff --git 
a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/expressions/UserDefinedScalarFunc.java
 
b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/expressions/UserDefinedScalarFunc.java
index b7f603cd431..cbf3941d77d 100644
--- 
a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/expressions/UserDefinedScalarFunc.java
+++ 
b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/expressions/UserDefinedScalarFunc.java
@@ -18,7 +18,6 @@
 package org.apache.spark.sql.connector.expressions;
 
 import java.util.Arrays;
-import java.util.Objects;
 
 import org.apache.spark.annotation.Evolving;
 import org.apache.spark.sql.internal.connector.ExpressionWithToString;
@@ -51,13 +50,19 @@ public class UserDefinedScalarFunc extends 
ExpressionWithToString {
   public boolean equals(Object o) {
 if (this == o) return true;
 if (o == null || getClass() != o.getClass()) return false;
+
 UserDefinedScalarFunc that = (UserDefinedScalarFunc) o;
-return Objects.equals(name, that.name) && Objects.equals(canonicalName, 
that.canonicalName) &&
-  Arrays.equals(children, that.children);
+
+if (!name.equals(that.name)) return false;
+if (!canonicalName.equals(that.canonicalName)) return false;
+return Arrays.equals(children, that.children);
   }
 
   @Override
   public int hashCode() {
-return Objects.hash(name, canonicalName, children);
+int result = name.hashCode();
+result = 31 * result + canonicalName.hashCode();
+result = 31 * r

[spark] branch master updated (25a14c313bf -> 8ccacb84306)

2023-06-19 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 25a14c313bf [SPARK-43929][SQL][PYTHON][CONNECT] Add date time 
functions to Scala, Python and Connect API - part 1
 add 8ccacb84306 [MINOR][PYTHON][DOCS] Make mapInPandas example 
copy-pastable

No new revisions were added by this update.

Summary of changes:
 python/pyspark/sql/pandas/map_ops.py | 1 +
 1 file changed, 1 insertion(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-43944][SQL][CONNECT][PYTHON][FOLLOW-UP] Make `startswith` & `endswith` support binary type

[spark] branch master updated: [SPARK-44073][SQL][PYTHON][CONNECT] Add date time functions to Scala, Python and Connect - part 2

[spark] branch master updated: [SPARK-44074][CORE][SQL][TESTS] Fix loglevel restore behavior of `SparkFunSuite#withLogAppender` and re-enable UT `Logging plan changes for execution`

[spark] branch master updated: [SPARK-43942][SQL][CONNECT][PYTHON][FOLLOW-UP] Make contains support binary type

[spark] branch master updated: [SPARK-44099][INFRA] Support both java-8-openjdk-amd64 and java-8-openjdk-arm64 in `spark-rm` Dockerfile

svn commit: r62501 - in /dev/spark/v3.4.1-rc1-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/R/articles/ _site/api/R/deps/ _site/api/R/deps/bootstrap-5.2.2/ _site/api/R/deps/jquery-3.6.0/ _site/api

[spark] branch master updated: [SPARK-44036][FOLLOWUP][CONNECT][TESTS] Consolidate remaining tickets

[spark] branch master updated: [SPARK-43624][PS][CONNECT] Add `EWM` to SparkConnectPlanner

svn commit: r62499 - /dev/spark/v3.4.1-rc1-bin/

[spark] branch master updated: [SPARK-43969][SQL] Refactor & Assign names to the error class _LEGACY_ERROR_TEMP_1170

[spark] 01/01: Preparing development version 3.4.2-SNAPSHOT

[spark] branch branch-3.4 updated (864b9869949 -> b24511f0306)

[spark] tag v3.4.1-rc1 created (now 6b1ff22dde1)

[spark] 01/01: Preparing Spark release v3.4.1-rc1

[spark] branch master updated: [SPARK-44054][CORE][TESTS] Make test cases inherit `SparkFunSuite` have a default timeout

[spark] branch master updated: [SPARK-43982][ML][PYTHON][CONNECT] Implement pipeline estimator for ML on spark connect

[spark] branch master updated: [SPARK-43203][SQL] Move all Drop Table case to DataSource V2

[spark] branch master updated: [SPARK-43942][CONNECT][PYTHON] Add string functions to Scala and Python - part 1

[spark] branch master updated: [SPARK-44096][PYTHOM][DOCS] Make examples copy-pastable by adding a newline in all modules

[spark] branch master updated: [SPARK-40497][BUILD] Upgrade Scala to 2.13.11

[spark] branch branch-3.4 updated: [SPARK-44018][SQL] Improve the hashCode and toString for some DS V2 Expression

[spark] branch master updated: [SPARK-44018][SQL] Improve the hashCode and toString for some DS V2 Expression

[spark] branch master updated (25a14c313bf -> 8ccacb84306)

23 matches

Site Navigation

Mail list logo

Footer information