date:20230713

[spark] branch branch-3.4 updated: [SPARK-44391][SQL][3.4] Check the number of argument types in `InvokeLike`

2023-07-13 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new 53383fcd2be [SPARK-44391][SQL][3.4] Check the number of argument types 
in `InvokeLike`
53383fcd2be is described below

commit 53383fcd2be178f4f0d231334ee36f1c3d67f64d
Author: Max Gekk 
AuthorDate: Fri Jul 14 08:37:29 2023 +0300

[SPARK-44391][SQL][3.4] Check the number of argument types in `InvokeLike`

### What changes were proposed in this pull request?
In the PR, I propose to check the number of argument types in the 
`InvokeLike` expressions. If the input types are provided, the number of types 
should be exactly the same as the number of argument expressions.

This is a backport of https://github.com/apache/spark/pull/41954.

### Why are the changes needed?
1. This PR checks the contract described in the comment explicitly:

https://github.com/apache/spark/blob/d9248e83bbb3af49333608bebe7149b1aaeca738/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala#L247

that can prevent the errors of expression implementations, and improve code 
maintainability.

2. Also it fixes the issue in the `UrlEncode` and `UrlDecode`.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
By running the related tests:
```
$ build/sbt "test:testOnly *UrlFunctionsSuite"
$ build/sbt "test:testOnly *DataSourceV2FunctionSuite"
```

Authored-by: Max Gekk 
(cherry picked from commit 3e82ac6ea3d9f87c8ac09e481235beefaa1bf758)

Closes #41985 from MaxGekk/fix-url_decode-3.4.

Authored-by: Max Gekk 
Signed-off-by: Max Gekk 
---
 core/src/main/resources/error/error-classes.json  |  5 +
 .../sql-error-conditions-datatype-mismatch-error-class.md |  4 
 .../spark/sql/catalyst/analysis/CheckAnalysis.scala   |  5 +++--
 .../spark/sql/catalyst/expressions/objects/objects.scala  | 15 +++
 .../spark/sql/catalyst/expressions/urlExpressions.scala   |  4 ++--
 5 files changed, 29 insertions(+), 4 deletions(-)

diff --git a/core/src/main/resources/error/error-classes.json 
b/core/src/main/resources/error/error-classes.json
index febed9283d8..90dec2ee45e 100644
--- a/core/src/main/resources/error/error-classes.json
+++ b/core/src/main/resources/error/error-classes.json
@@ -468,6 +468,11 @@
   "The  must be between  (current value = 
)."
 ]
   },
+  "WRONG_NUM_ARG_TYPES" : {
+"message" : [
+  "The expression requires  argument types but the actual 
number is ."
+]
+  },
   "WRONG_NUM_ENDPOINTS" : {
 "message" : [
   "The number of endpoints must be >= 2 to construct intervals but the 
actual number is ."
diff --git a/docs/sql-error-conditions-datatype-mismatch-error-class.md 
b/docs/sql-error-conditions-datatype-mismatch-error-class.md
index 6ccd63e6ee9..2178deca4f2 100644
--- a/docs/sql-error-conditions-datatype-mismatch-error-class.md
+++ b/docs/sql-error-conditions-datatype-mismatch-error-class.md
@@ -231,6 +231,10 @@ The input of `` can't be `` type 
data.
 
 The `` must be between `` (current value = 
``).
 
+## WRONG_NUM_ARG_TYPES
+
+The expression requires `` argument types but the actual number 
is ``.
+
 ## WRONG_NUM_ENDPOINTS
 
 The number of endpoints must be >= 2 to construct intervals but the actual 
number is ``.
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
index 223fdf12d6d..e717483ec94 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
@@ -288,8 +288,9 @@ trait CheckAnalysis extends PredicateHelper with 
LookupCatalog with QueryErrorsB
 "srcType" -> c.child.dataType.catalogString,
 "targetType" -> c.dataType.catalogString))
   case e: RuntimeReplaceable if !e.replacement.resolved =>
-throw new IllegalStateException("Illegal RuntimeReplaceable: " + e 
+
-  "\nReplacement is unresolved: " + e.replacement)
+throw SparkException.internalError(
+  s"Cannot resolve the runtime replaceable expression 
${toSQLExpr(e)}. " +
+  s"The replacement is unresolved: ${toSQLExpr(e.replacement)}.")
 
   case g: Grouping =>
 g.failAnalysis(errorClass = "_LEGACY_ERROR_TEMP_2445", 
messageParameters = Map.empty)
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala

[spark] branch dependabot/pip/dev/grpcio-1.53.0 deleted (was 40975b59142)

2023-07-13 Thread github-bot

This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a change to branch dependabot/pip/dev/grpcio-1.53.0
in repository https://gitbox.apache.org/repos/asf/spark.git


 was 40975b59142 Bump grpcio from 1.48.1 to 1.53.0 in /dev

The revisions that were on this branch are still contained in
other references; therefore, this change does not discard any commits
from the repository.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-44222][BUILD][PYTHON] Upgrade grpc to 1.56.0 with lower/upperbound

2023-07-13 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new de59caa83af [SPARK-44222][BUILD][PYTHON] Upgrade grpc to 1.56.0 with 
lower/upperbound
de59caa83af is described below

commit de59caa83af1e6b2febb24597d06c8ff8505d888
Author: Hyukjin Kwon 
AuthorDate: Fri Jul 14 14:06:48 2023 +0900

[SPARK-44222][BUILD][PYTHON] Upgrade grpc to 1.56.0 with lower/upperbound

### What changes were proposed in this pull request?

This PR revert the revert of https://github.com/apache/spark/pull/41767 
with setting grpc lowerbounds.

### Why are the changes needed?

See https://github.com/apache/spark/pull/41767

### Does this PR introduce _any_ user-facing change?

See https://github.com/apache/spark/pull/41767

### How was this patch tested?

Manually tested with Conda environment, with `pip install -r 
dev/requirements.txt` in Python 3.9, Python 3.10 and Python 3.11.

Closes #41997 from HyukjinKwon/SPARK-44222.

Authored-by: Hyukjin Kwon 
Signed-off-by: Hyukjin Kwon 
---
 .github/workflows/build_and_test.yml   |  4 ++--
 connector/connect/common/src/main/buf.gen.yaml |  4 ++--
 dev/create-release/spark-rm/Dockerfile |  2 +-
 dev/requirements.txt   |  4 ++--
 pom.xml|  2 +-
 project/SparkBuild.scala   |  2 +-
 python/docs/source/getting_started/install.rst | 16 
 python/setup.py|  2 +-
 8 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/.github/workflows/build_and_test.yml 
b/.github/workflows/build_and_test.yml
index 4370e622cf4..0b184c6c248 100644
--- a/.github/workflows/build_and_test.yml
+++ b/.github/workflows/build_and_test.yml
@@ -256,7 +256,7 @@ jobs:
 - name: Install Python packages (Python 3.8)
   if: (contains(matrix.modules, 'sql') && !contains(matrix.modules, 
'sql-'))
   run: |
-python3.8 -m pip install 'numpy>=1.20.0' pyarrow pandas scipy 
unittest-xml-reporting 'grpcio==1.48.1' 'protobuf==3.19.5'
+python3.8 -m pip install 'numpy>=1.20.0' pyarrow pandas scipy 
unittest-xml-reporting 'grpcio==1.56.0' 'protobuf==3.19.5'
 python3.8 -m pip list
 # Run the tests.
 - name: Run tests
@@ -625,7 +625,7 @@ jobs:
 # Jinja2 3.0.0+ causes error when building with Sphinx.
 #   See also https://issues.apache.org/jira/browse/SPARK-35375.
 python3.9 -m pip install 'flake8==3.9.0' pydata_sphinx_theme 
'mypy==0.982' 'pytest==7.1.3' 'pytest-mypy-plugins==1.9.3' numpydoc 
'jinja2<3.0.0' 'black==22.6.0'
-python3.9 -m pip install 'pandas-stubs==1.2.0.53' ipython 
'grpcio==1.48.1' 'grpc-stubs==1.24.11' 'googleapis-common-protos-stubs==2.2.0'
+python3.9 -m pip install 'pandas-stubs==1.2.0.53' ipython 
'grpcio==1.56.0' 'grpc-stubs==1.24.11' 'googleapis-common-protos-stubs==2.2.0'
 - name: Python linter
   run: PYTHON_EXECUTABLE=python3.9 ./dev/lint-python
 - name: Install dependencies for Python code generation check
diff --git a/connector/connect/common/src/main/buf.gen.yaml 
b/connector/connect/common/src/main/buf.gen.yaml
index c816f12398d..07edaa567b1 100644
--- a/connector/connect/common/src/main/buf.gen.yaml
+++ b/connector/connect/common/src/main/buf.gen.yaml
@@ -22,14 +22,14 @@ plugins:
 out: gen/proto/csharp
   - plugin: buf.build/protocolbuffers/java:v21.7
 out: gen/proto/java
-  - remote: buf.build/grpc/plugins/ruby:v1.47.0-1
+  - plugin: buf.build/grpc/ruby:v1.56.0
 out: gen/proto/ruby
   - plugin: buf.build/protocolbuffers/ruby:v21.7
 out: gen/proto/ruby
# Building the Python build and building the mypy interfaces.
   - plugin: buf.build/protocolbuffers/python:v21.7
 out: gen/proto/python
-  - remote: buf.build/grpc/plugins/python:v1.47.0-1
+  - plugin: buf.build/grpc/python:v1.56.0
 out: gen/proto/python
   - name: mypy
 out: gen/proto/python
diff --git a/dev/create-release/spark-rm/Dockerfile 
b/dev/create-release/spark-rm/Dockerfile
index 8f198a420bc..def8626d3be 100644
--- a/dev/create-release/spark-rm/Dockerfile
+++ b/dev/create-release/spark-rm/Dockerfile
@@ -42,7 +42,7 @@ ARG APT_INSTALL="apt-get install --no-install-recommends -y"
 #   We should use the latest Sphinx version once this is fixed.
 # TODO(SPARK-35375): Jinja2 3.0.0+ causes error when building with Sphinx.
 #   See also https://issues.apache.org/jira/browse/SPARK-35375.
-ARG PIP_PKGS="sphinx==3.0.4 mkdocs==1.1.2 numpy==1.20.3 
pydata_sphinx_theme==0.4.1 ipython==7.19.0 nbsphinx==0.8.0 numpydoc==1.1.0 
jinja2==2.11.3 twine==3.4.1 sphinx-plotly-directive==0.1.3 pandas==1.5.3 
pyarrow==3.0.0 plotly==5.4.0 markupsafe==2.0.1 docutils<0.17 grpcio==1.48.1 
protobuf==4.21.6

[spark] branch dependabot/pip/dev/grpcio-1.53.0 created (now 40975b59142)

2023-07-13 Thread github-bot

This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a change to branch dependabot/pip/dev/grpcio-1.53.0
in repository https://gitbox.apache.org/repos/asf/spark.git


  at 40975b59142 Bump grpcio from 1.48.1 to 1.53.0 in /dev

No new revisions were added by this update.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: Revert "[SPARK-44222][BUILD][PYTHON] Upgrade `grpc` to 1.56.0"

2023-07-13 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new a47bde28d9a Revert "[SPARK-44222][BUILD][PYTHON] Upgrade `grpc` to 
1.56.0"
a47bde28d9a is described below

commit a47bde28d9a2f6d77fba66350b81f1b4ce00c6cc
Author: Hyukjin Kwon 
AuthorDate: Fri Jul 14 13:26:59 2023 +0900

Revert "[SPARK-44222][BUILD][PYTHON] Upgrade `grpc` to 1.56.0"

This reverts commit f26bdb7bfde56e4856a2f962162f204ba2dbb1c1.
---
 .github/workflows/build_and_test.yml   | 4 ++--
 connector/connect/common/src/main/buf.gen.yaml | 4 ++--
 dev/create-release/spark-rm/Dockerfile | 2 +-
 dev/requirements.txt   | 4 ++--
 pom.xml| 2 +-
 project/SparkBuild.scala   | 2 +-
 python/docs/source/getting_started/install.rst | 4 ++--
 python/setup.py| 2 +-
 8 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/.github/workflows/build_and_test.yml 
b/.github/workflows/build_and_test.yml
index 0b184c6c248..4370e622cf4 100644
--- a/.github/workflows/build_and_test.yml
+++ b/.github/workflows/build_and_test.yml
@@ -256,7 +256,7 @@ jobs:
 - name: Install Python packages (Python 3.8)
   if: (contains(matrix.modules, 'sql') && !contains(matrix.modules, 
'sql-'))
   run: |
-python3.8 -m pip install 'numpy>=1.20.0' pyarrow pandas scipy 
unittest-xml-reporting 'grpcio==1.56.0' 'protobuf==3.19.5'
+python3.8 -m pip install 'numpy>=1.20.0' pyarrow pandas scipy 
unittest-xml-reporting 'grpcio==1.48.1' 'protobuf==3.19.5'
 python3.8 -m pip list
 # Run the tests.
 - name: Run tests
@@ -625,7 +625,7 @@ jobs:
 # Jinja2 3.0.0+ causes error when building with Sphinx.
 #   See also https://issues.apache.org/jira/browse/SPARK-35375.
 python3.9 -m pip install 'flake8==3.9.0' pydata_sphinx_theme 
'mypy==0.982' 'pytest==7.1.3' 'pytest-mypy-plugins==1.9.3' numpydoc 
'jinja2<3.0.0' 'black==22.6.0'
-python3.9 -m pip install 'pandas-stubs==1.2.0.53' ipython 
'grpcio==1.56.0' 'grpc-stubs==1.24.11' 'googleapis-common-protos-stubs==2.2.0'
+python3.9 -m pip install 'pandas-stubs==1.2.0.53' ipython 
'grpcio==1.48.1' 'grpc-stubs==1.24.11' 'googleapis-common-protos-stubs==2.2.0'
 - name: Python linter
   run: PYTHON_EXECUTABLE=python3.9 ./dev/lint-python
 - name: Install dependencies for Python code generation check
diff --git a/connector/connect/common/src/main/buf.gen.yaml 
b/connector/connect/common/src/main/buf.gen.yaml
index 07edaa567b1..c816f12398d 100644
--- a/connector/connect/common/src/main/buf.gen.yaml
+++ b/connector/connect/common/src/main/buf.gen.yaml
@@ -22,14 +22,14 @@ plugins:
 out: gen/proto/csharp
   - plugin: buf.build/protocolbuffers/java:v21.7
 out: gen/proto/java
-  - plugin: buf.build/grpc/ruby:v1.56.0
+  - remote: buf.build/grpc/plugins/ruby:v1.47.0-1
 out: gen/proto/ruby
   - plugin: buf.build/protocolbuffers/ruby:v21.7
 out: gen/proto/ruby
# Building the Python build and building the mypy interfaces.
   - plugin: buf.build/protocolbuffers/python:v21.7
 out: gen/proto/python
-  - plugin: buf.build/grpc/python:v1.56.0
+  - remote: buf.build/grpc/plugins/python:v1.47.0-1
 out: gen/proto/python
   - name: mypy
 out: gen/proto/python
diff --git a/dev/create-release/spark-rm/Dockerfile 
b/dev/create-release/spark-rm/Dockerfile
index def8626d3be..8f198a420bc 100644
--- a/dev/create-release/spark-rm/Dockerfile
+++ b/dev/create-release/spark-rm/Dockerfile
@@ -42,7 +42,7 @@ ARG APT_INSTALL="apt-get install --no-install-recommends -y"
 #   We should use the latest Sphinx version once this is fixed.
 # TODO(SPARK-35375): Jinja2 3.0.0+ causes error when building with Sphinx.
 #   See also https://issues.apache.org/jira/browse/SPARK-35375.
-ARG PIP_PKGS="sphinx==3.0.4 mkdocs==1.1.2 numpy==1.20.3 
pydata_sphinx_theme==0.4.1 ipython==7.19.0 nbsphinx==0.8.0 numpydoc==1.1.0 
jinja2==2.11.3 twine==3.4.1 sphinx-plotly-directive==0.1.3 pandas==1.5.3 
pyarrow==3.0.0 plotly==5.4.0 markupsafe==2.0.1 docutils<0.17 grpcio==1.56.0 
protobuf==4.21.6 grpcio-status==1.56.0 googleapis-common-protos==1.56.4"
+ARG PIP_PKGS="sphinx==3.0.4 mkdocs==1.1.2 numpy==1.20.3 
pydata_sphinx_theme==0.4.1 ipython==7.19.0 nbsphinx==0.8.0 numpydoc==1.1.0 
jinja2==2.11.3 twine==3.4.1 sphinx-plotly-directive==0.1.3 pandas==1.5.3 
pyarrow==3.0.0 plotly==5.4.0 markupsafe==2.0.1 docutils<0.17 grpcio==1.48.1 
protobuf==4.21.6 grpcio-status==1.48.1 googleapis-common-protos==1.56.4"
 ARG GEM_PKGS="bundler:2.3.8"
 
 # Install extra needed repos and refresh.
diff --git a/dev/requirements.txt b/dev/requirements.txt
index eb482f3f8a2..72da5dbe163 100644
--- a/dev/requirements.txt
+++ b/dev/requirements.txt
@@ -50,8 +50,8 @@ black==22.6.0
 py

[spark] branch master updated: [SPARK-43995][SPARK-43996][CONNECT] Add support for UDFRegistration to the Connect Scala Client

2023-07-13 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 7ecdad5c59c [SPARK-43995][SPARK-43996][CONNECT] Add support for 
UDFRegistration to the Connect Scala Client
7ecdad5c59c is described below

commit 7ecdad5c59ce2eecd4686effeb10819a6d784844
Author: vicennial 
AuthorDate: Fri Jul 14 10:52:12 2023 +0900

[SPARK-43995][SPARK-43996][CONNECT] Add support for UDFRegistration to the 
Connect Scala Client

### What changes were proposed in this pull request?

This PR adds support to register a scala UDF from the scala/jvm client.

The following APIs are implemented in `UDFRegistration`:

- `def register(name: String, udf: UserDefinedFunction): 
UserDefinedFunction`
- `def register[RT: TypeTag, A1: TypeTag ...](name: String, func: (A1, ...) 
=> RT): UserDefinedFunction` for 0 to 22 arguments.

The following API is implemented in `functions`:

- `def call_udf(udfName: String, cols: Column*): Column`

Note: This PR is stacked on https://github.com/apache/spark/pull/41959.
### Why are the changes needed?

To reach parity with classic Spark.

### Does this PR introduce _any_ user-facing change?

Yes. spark.udf.register() is added as shown below:
```scala
class A(x: Int) { def get = x * 100 }
val myUdf = udf((x: Int) => new A(x).get)
spark.udf.register("dummyUdf", myUdf)
spark.sql("select dummyUdf(id) from range(5)").as[Long].collect()
```
The output:
```scala
Array[Long] = Array(0L, 100L, 200L, 300L, 400L)


### How was this patch tested?

New tests in `ReplE2ESuite`.

Closes #41953 from vicennial/SPARK-43995.

Authored-by: vicennial 
Signed-off-by: Hyukjin Kwon 
---
 .../scala/org/apache/spark/sql/SparkSession.scala  |   31 +
 .../org/apache/spark/sql/UDFRegistration.scala | 1028 
 .../sql/expressions/UserDefinedFunction.scala  |   10 +
 .../scala/org/apache/spark/sql/functions.scala |   17 +
 .../spark/sql/application/ReplE2ESuite.scala   |   31 +
 .../CheckConnectJvmClientCompatibility.scala   |1 -
 .../sql/connect/planner/SparkConnectPlanner.scala  |   23 +-
 7 files changed, 1139 insertions(+), 2 deletions(-)

diff --git 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala
 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala
index c27f0f32e0d..fb9959c9942 100644
--- 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala
+++ 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala
@@ -417,6 +417,30 @@ class SparkSession private[sql] (
 range(start, end, step, Option(numPartitions))
   }
 
+  /**
+   * A collection of methods for registering user-defined functions (UDF).
+   *
+   * The following example registers a Scala closure as UDF:
+   * {{{
+   *   sparkSession.udf.register("myUDF", (arg1: Int, arg2: String) => arg2 + 
arg1)
+   * }}}
+   *
+   * The following example registers a UDF in Java:
+   * {{{
+   *   sparkSession.udf().register("myUDF",
+   *   (Integer arg1, String arg2) -> arg2 + arg1,
+   *   DataTypes.StringType);
+   * }}}
+   *
+   * @note
+   *   The user-defined functions must be deterministic. Due to optimization, 
duplicate
+   *   invocations may be eliminated or the function may even be invoked more 
times than it is
+   *   present in the query.
+   *
+   * @since 3.5.0
+   */
+  lazy val udf: UDFRegistration = new UDFRegistration(this)
+
   // scalastyle:off
   // Disable style checker so "implicits" object can start with lowercase i
   /**
@@ -525,6 +549,13 @@ class SparkSession private[sql] (
 client.execute(plan).asScala.toSeq
   }
 
+  private[sql] def registerUdf(udf: proto.CommonInlineUserDefinedFunction): 
Unit = {
+val command = proto.Command.newBuilder().setRegisterFunction(udf).build()
+val plan = proto.Plan.newBuilder().setCommand(command).build()
+
+client.execute(plan)
+  }
+
   @DeveloperApi
   def execute(extension: com.google.protobuf.Any): Unit = {
 val command = proto.Command.newBuilder().setExtension(extension).build()
diff --git 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/UDFRegistration.scala
 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/UDFRegistration.scala
new file mode 100644
index 000..426709b8f18
--- /dev/null
+++ 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/UDFRegistration.scala
@@ -0,0 +1,1028 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright

[spark] branch branch-3.4 updated: [SPARK-44180][SQL] DistributionAndOrderingUtils should apply ResolveTimeZone

2023-07-13 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new 679f33e7ed2 [SPARK-44180][SQL] DistributionAndOrderingUtils should 
apply ResolveTimeZone
679f33e7ed2 is described below

commit 679f33e7ed24c24aad1687d2d906232a9c1c7604
Author: Cheng Pan 
AuthorDate: Fri Jul 14 09:25:07 2023 +0800

[SPARK-44180][SQL] DistributionAndOrderingUtils should apply ResolveTimeZone

### What changes were proposed in this pull request?

Apply `ResolveTimeZone` for the plan generated by 
`DistributionAndOrderingUtils#prepareQuery`.

### Why are the changes needed?

In SPARK-39607, we only applied `typeCoercionRules` for the plan generated 
by `DistributionAndOrderingUtils#prepareQuery`, this is not enough, the 
following exception will be thrown if `TimeZoneAwareExpression` participates in 
the implicit cast.

```
23/06/25 07:30:58 WARN UnsafeProjection: Expr codegen error and falling 
back to interpreter mode
java.util.NoSuchElementException: None.get
at scala.None$.get(Option.scala:529)
at scala.None$.get(Option.scala:527)
at 
org.apache.spark.sql.catalyst.expressions.TimeZoneAwareExpression.zoneId(datetimeExpressions.scala:63)
at 
org.apache.spark.sql.catalyst.expressions.TimeZoneAwareExpression.zoneId$(datetimeExpressions.scala:63)
at 
org.apache.spark.sql.catalyst.expressions.Cast.zoneId$lzycompute(Cast.scala:491)
at 
org.apache.spark.sql.catalyst.expressions.Cast.zoneId(Cast.scala:491)
at 
org.apache.spark.sql.catalyst.expressions.Cast.castToDateCode(Cast.scala:1655)
at 
org.apache.spark.sql.catalyst.expressions.Cast.nullSafeCastFunction(Cast.scala:1335)
at 
org.apache.spark.sql.catalyst.expressions.Cast.doGenCode(Cast.scala:1316)
at 
org.apache.spark.sql.catalyst.expressions.Expression.$anonfun$genCode$3(Expression.scala:200)
at scala.Option.getOrElse(Option.scala:189)
at 
org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:195)
at 
org.apache.spark.sql.catalyst.expressions.Cast.genCode(Cast.scala:1310)
at 
org.apache.spark.sql.catalyst.expressions.objects.InvokeLike.$anonfun$prepareArguments$3(objects.scala:124)
at 
scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
at 
scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at 
scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at 
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at scala.collection.TraversableLike.map(TraversableLike.scala:286)
at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
at scala.collection.AbstractTraversable.map(Traversable.scala:108)
at 
org.apache.spark.sql.catalyst.expressions.objects.InvokeLike.prepareArguments(objects.scala:123)
at 
org.apache.spark.sql.catalyst.expressions.objects.InvokeLike.prepareArguments$(objects.scala:91)
at 
org.apache.spark.sql.catalyst.expressions.objects.Invoke.prepareArguments(objects.scala:363)
at 
org.apache.spark.sql.catalyst.expressions.objects.Invoke.doGenCode(objects.scala:414)
at 
org.apache.spark.sql.catalyst.expressions.Expression.$anonfun$genCode$3(Expression.scala:200)
at scala.Option.getOrElse(Option.scala:189)
at 
org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:195)
at 
org.apache.spark.sql.catalyst.expressions.objects.InvokeLike.$anonfun$prepareArguments$3(objects.scala:124)
at 
scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
at 
scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at 
scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at 
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at scala.collection.TraversableLike.map(TraversableLike.scala:286)
at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
at scala.collection.AbstractTraversable.map(Traversable.scala:108)
at 
org.apache.spark.sql.catalyst.expressions.objects.InvokeLike.prepareArguments(objects.scala:123)
at 
org.apache.spark.sql.catalyst.expressions.objects.InvokeLike.prepareArguments$(objects.scala:91)
at 
org.apache.spark.sql.catalyst.expressions.objects.Invoke.prepareArguments(objects.scala:363)
at 
org.apache.spark.sql.catalyst.expressions.objects.Invoke.doGenCode(objects.scala:414)
at

[spark] branch master updated: [SPARK-44180][SQL] DistributionAndOrderingUtils should apply ResolveTimeZone

2023-07-13 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new d1d97604fbe [SPARK-44180][SQL] DistributionAndOrderingUtils should 
apply ResolveTimeZone
d1d97604fbe is described below

commit d1d97604fbec2fccedbfa52b02eb1f3428b00d9f
Author: Cheng Pan 
AuthorDate: Fri Jul 14 09:25:07 2023 +0800

[SPARK-44180][SQL] DistributionAndOrderingUtils should apply ResolveTimeZone

### What changes were proposed in this pull request?

Apply `ResolveTimeZone` for the plan generated by 
`DistributionAndOrderingUtils#prepareQuery`.

### Why are the changes needed?

In SPARK-39607, we only applied `typeCoercionRules` for the plan generated 
by `DistributionAndOrderingUtils#prepareQuery`, this is not enough, the 
following exception will be thrown if `TimeZoneAwareExpression` participates in 
the implicit cast.

```
23/06/25 07:30:58 WARN UnsafeProjection: Expr codegen error and falling 
back to interpreter mode
java.util.NoSuchElementException: None.get
at scala.None$.get(Option.scala:529)
at scala.None$.get(Option.scala:527)
at 
org.apache.spark.sql.catalyst.expressions.TimeZoneAwareExpression.zoneId(datetimeExpressions.scala:63)
at 
org.apache.spark.sql.catalyst.expressions.TimeZoneAwareExpression.zoneId$(datetimeExpressions.scala:63)
at 
org.apache.spark.sql.catalyst.expressions.Cast.zoneId$lzycompute(Cast.scala:491)
at 
org.apache.spark.sql.catalyst.expressions.Cast.zoneId(Cast.scala:491)
at 
org.apache.spark.sql.catalyst.expressions.Cast.castToDateCode(Cast.scala:1655)
at 
org.apache.spark.sql.catalyst.expressions.Cast.nullSafeCastFunction(Cast.scala:1335)
at 
org.apache.spark.sql.catalyst.expressions.Cast.doGenCode(Cast.scala:1316)
at 
org.apache.spark.sql.catalyst.expressions.Expression.$anonfun$genCode$3(Expression.scala:200)
at scala.Option.getOrElse(Option.scala:189)
at 
org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:195)
at 
org.apache.spark.sql.catalyst.expressions.Cast.genCode(Cast.scala:1310)
at 
org.apache.spark.sql.catalyst.expressions.objects.InvokeLike.$anonfun$prepareArguments$3(objects.scala:124)
at 
scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
at 
scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at 
scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at 
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at scala.collection.TraversableLike.map(TraversableLike.scala:286)
at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
at scala.collection.AbstractTraversable.map(Traversable.scala:108)
at 
org.apache.spark.sql.catalyst.expressions.objects.InvokeLike.prepareArguments(objects.scala:123)
at 
org.apache.spark.sql.catalyst.expressions.objects.InvokeLike.prepareArguments$(objects.scala:91)
at 
org.apache.spark.sql.catalyst.expressions.objects.Invoke.prepareArguments(objects.scala:363)
at 
org.apache.spark.sql.catalyst.expressions.objects.Invoke.doGenCode(objects.scala:414)
at 
org.apache.spark.sql.catalyst.expressions.Expression.$anonfun$genCode$3(Expression.scala:200)
at scala.Option.getOrElse(Option.scala:189)
at 
org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:195)
at 
org.apache.spark.sql.catalyst.expressions.objects.InvokeLike.$anonfun$prepareArguments$3(objects.scala:124)
at 
scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
at 
scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at 
scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at 
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at scala.collection.TraversableLike.map(TraversableLike.scala:286)
at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
at scala.collection.AbstractTraversable.map(Traversable.scala:108)
at 
org.apache.spark.sql.catalyst.expressions.objects.InvokeLike.prepareArguments(objects.scala:123)
at 
org.apache.spark.sql.catalyst.expressions.objects.InvokeLike.prepareArguments$(objects.scala:91)
at 
org.apache.spark.sql.catalyst.expressions.objects.Invoke.prepareArguments(objects.scala:363)
at 
org.apache.spark.sql.catalyst.expressions.objects.Invoke.doGenCode(objects.scala:414)
at

[spark] branch master updated (bac7050cf0a -> fd518dac8d9)

2023-07-13 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from bac7050cf0a Revert "[SPARK-41811][PYTHON][CONNECT] Implement 
SparkSession.sql's string formatter"
 add fd518dac8d9 [SPARK-44201][CONNECT][SS] Add support for Streaming 
Listener in Scala for Spark Connect

No new revisions were added by this update.

Summary of changes:
 .../sql/streaming/StreamingQueryListener.scala | 132 -
 .../sql/streaming/StreamingQueryManager.scala  |  77 
 .../ui/StreamingQueryStatusListener.scala  |  61 +-
 .../apache/spark/sql/streaming/ui/UIUtils.scala|   0
 .../CheckConnectJvmClientCompatibility.scala   |  27 +++--
 .../spark/sql/streaming/StreamingQuerySuite.scala  |  90 +-
 .../src/main/protobuf/spark/connect/commands.proto |  21 
 ...rPacket.scala => StreamingListenerPacket.scala} |  20 ++--
 .../sql/connect/planner/SparkConnectPlanner.scala  |  46 ++-
 .../spark/sql/connect/service/SessionHolder.scala  |  33 ++
 python/pyspark/sql/connect/proto/commands_pb2.py   |  42 ---
 python/pyspark/sql/connect/proto/commands_pb2.pyi  | 121 ++-
 .../sql/streaming/StreamingQueryListener.scala |  13 +-
 .../ui/StreamingQueryStatusListener.scala  |   4 +-
 14 files changed, 545 insertions(+), 142 deletions(-)
 copy {sql/core => 
connector/connect/client/jvm}/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryListener.scala
 (53%)
 copy {sql/core => 
connector/connect/client/jvm}/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatusListener.scala
 (84%)
 copy {sql/core => 
connector/connect/client/jvm}/src/main/scala/org/apache/spark/sql/streaming/ui/UIUtils.scala
 (100%)
 copy 
connector/connect/common/src/main/scala/org/apache/spark/sql/connect/common/{ForeachWriterPacket.scala
 => StreamingListenerPacket.scala} (66%)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: Revert "[SPARK-41811][PYTHON][CONNECT] Implement SparkSession.sql's string formatter"

2023-07-13 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new bac7050cf0a Revert "[SPARK-41811][PYTHON][CONNECT] Implement 
SparkSession.sql's string formatter"
bac7050cf0a is described below

commit bac7050cf0ad18608e921f46e40152d341d53fb8
Author: Hyukjin Kwon 
AuthorDate: Fri Jul 14 09:31:17 2023 +0900

Revert "[SPARK-41811][PYTHON][CONNECT] Implement SparkSession.sql's string 
formatter"

This reverts commit 9aa42a970c4bd8e54603b1795a0f449bd556b11b.
---
 python/pyspark/pandas/sql_formatter.py  |  7 +--
 python/pyspark/sql/connect/session.py   | 35 -
 python/pyspark/sql/connect/sql_formatter.py | 78 -
 python/pyspark/sql/sql_formatter.py |  5 +-
 python/pyspark/sql/utils.py |  8 ---
 5 files changed, 16 insertions(+), 117 deletions(-)

diff --git a/python/pyspark/pandas/sql_formatter.py 
b/python/pyspark/pandas/sql_formatter.py
index 7501e19c038..8593703bd94 100644
--- a/python/pyspark/pandas/sql_formatter.py
+++ b/python/pyspark/pandas/sql_formatter.py
@@ -264,9 +264,10 @@ class PandasSQLStringFormatter(string.Formatter):
 val._to_spark().createOrReplaceTempView(df_name)
 return df_name
 elif isinstance(val, str):
-from pyspark.sql.utils import get_lit_sql_str
-
-return get_lit_sql_str(val)
+# This is matched to behavior from JVM implementation.
+# See `sql` definition from 
`sql/catalyst/src/main/scala/org/apache/spark/
+# sql/catalyst/expressions/literals.scala`
+return "'" + val.replace("\\", "").replace("'", "\\'") + "'"
 else:
 return val
 
diff --git a/python/pyspark/sql/connect/session.py 
b/python/pyspark/sql/connect/session.py
index 13868263174..ea88d60d760 100644
--- a/python/pyspark/sql/connect/session.py
+++ b/python/pyspark/sql/connect/session.py
@@ -489,31 +489,13 @@ class SparkSession:
 
 createDataFrame.__doc__ = PySparkSession.createDataFrame.__doc__
 
-def sql(
-self,
-sqlQuery: str,
-args: Optional[Union[Dict[str, Any], List]] = None,
-**kwargs: Any,
-) -> "DataFrame":
-
-if len(kwargs) > 0:
-from pyspark.sql.connect.sql_formatter import SQLStringFormatter
-
-formatter = SQLStringFormatter(self)
-sqlQuery = formatter.format(sqlQuery, **kwargs)
-
-try:
-cmd = SQL(sqlQuery, args)
-data, properties = 
self.client.execute_command(cmd.command(self._client))
-if "sql_command_result" in properties:
-return 
DataFrame.withPlan(CachedRelation(properties["sql_command_result"]), self)
-else:
-return DataFrame.withPlan(SQL(sqlQuery, args), self)
-finally:
-if len(kwargs) > 0:
-# TODO: should drop temp views after SPARK-44406 get resolved
-# formatter.clear()
-pass
+def sql(self, sqlQuery: str, args: Optional[Union[Dict[str, Any], List]] = 
None) -> "DataFrame":
+cmd = SQL(sqlQuery, args)
+data, properties = 
self.client.execute_command(cmd.command(self._client))
+if "sql_command_result" in properties:
+return 
DataFrame.withPlan(CachedRelation(properties["sql_command_result"]), self)
+else:
+return DataFrame.withPlan(SQL(sqlQuery, args), self)
 
 sql.__doc__ = PySparkSession.sql.__doc__
 
@@ -826,6 +808,9 @@ def _test() -> None:
 # RDD API is not supported in Spark Connect.
 del pyspark.sql.connect.session.SparkSession.createDataFrame.__doc__
 
+# TODO(SPARK-41811): Implement SparkSession.sql's string formatter
+del pyspark.sql.connect.session.SparkSession.sql.__doc__
+
 (failure_count, test_count) = doctest.testmod(
 pyspark.sql.connect.session,
 globs=globs,
diff --git a/python/pyspark/sql/connect/sql_formatter.py 
b/python/pyspark/sql/connect/sql_formatter.py
deleted file mode 100644
index ab90a1bb847..000
--- a/python/pyspark/sql/connect/sql_formatter.py
+++ /dev/null
@@ -1,78 +0,0 @@
-#
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express

[spark] branch master updated: [MINOR] Removing redundant parentheses from SQL function docs

2023-07-13 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new a6db459328d [MINOR] Removing redundant parentheses from SQL function 
docs
a6db459328d is described below

commit a6db459328d7aa75e43097ddf2286c146646810a
Author: panbingkun 
AuthorDate: Fri Jul 14 08:40:21 2023 +0900

[MINOR] Removing redundant parentheses from SQL function docs

### What changes were proposed in this pull request?
The pr aims to removing redundant parentheses from SQL function docs.

### Why are the changes needed?
Make the document clearer, reduce misunderstandings.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
- Manually test.

Closes #41984 from panbingkun/minor_function_docs.

Authored-by: panbingkun 
Signed-off-by: Hyukjin Kwon 
---
 python/pyspark/sql/connect/functions.py   | 2 +-
 python/pyspark/sql/functions.py   | 4 ++--
 .../spark/sql/catalyst/expressions/aggregate/MaxByAndMinBy.scala  | 4 ++--
 .../org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala   | 2 +-
 sql/core/src/test/resources/sql-functions/sql-expression-schema.md| 4 ++--
 5 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/python/pyspark/sql/connect/functions.py 
b/python/pyspark/sql/connect/functions.py
index c6445f110c0..1be759d9b6e 100644
--- a/python/pyspark/sql/connect/functions.py
+++ b/python/pyspark/sql/connect/functions.py
@@ -195,7 +195,7 @@ def _invoke_higher_order_function(
 
 :param name: Name of the expression
 :param cols: a list of columns
-:param funs: a list of((*Column) -> Column functions.
+:param funs: a list of (*Column) -> Column functions.
 
 :return: a Column
 """
diff --git a/python/pyspark/sql/functions.py b/python/pyspark/sql/functions.py
index d7a2f529fa6..b2017627598 100644
--- a/python/pyspark/sql/functions.py
+++ b/python/pyspark/sql/functions.py
@@ -12770,7 +12770,7 @@ def arrays_zip(*cols: "ColumnOrName") -> Column:
 Examples
 
 >>> from pyspark.sql.functions import arrays_zip
->>> df = spark.createDataFrame([(([1, 2, 3], [2, 4, 6], [3, 6]))], 
['vals1', 'vals2', 'vals3'])
+>>> df = spark.createDataFrame([([1, 2, 3], [2, 4, 6], [3, 6])], ['vals1', 
'vals2', 'vals3'])
 >>> df = df.select(arrays_zip(df.vals1, df.vals2, 
df.vals3).alias('zipped'))
 >>> df.show(truncate=False)
 ++
@@ -13041,7 +13041,7 @@ def _invoke_higher_order_function(
 
 :param name: Name of the expression
 :param cols: a list of columns
-:param funs: a list of((*Column) -> Column functions.
+:param funs: a list of (*Column) -> Column functions.
 
 :return: a Column
 """
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/MaxByAndMinBy.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/MaxByAndMinBy.scala
index 096a42686a3..56941c9de45 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/MaxByAndMinBy.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/MaxByAndMinBy.scala
@@ -96,7 +96,7 @@ abstract class MaxMinBy extends DeclarativeAggregate with 
BinaryLike[Expression]
   usage = "_FUNC_(x, y) - Returns the value of `x` associated with the maximum 
value of `y`.",
   examples = """
 Examples:
-  > SELECT _FUNC_(x, y) FROM VALUES (('a', 10)), (('b', 50)), (('c', 20)) 
AS tab(x, y);
+  > SELECT _FUNC_(x, y) FROM VALUES ('a', 10), ('b', 50), ('c', 20) AS 
tab(x, y);
b
   """,
   group = "agg_funcs",
@@ -119,7 +119,7 @@ case class MaxBy(valueExpr: Expression, orderingExpr: 
Expression) extends MaxMin
   usage = "_FUNC_(x, y) - Returns the value of `x` associated with the minimum 
value of `y`.",
   examples = """
 Examples:
-  > SELECT _FUNC_(x, y) FROM VALUES (('a', 10)), (('b', 50)), (('c', 20)) 
AS tab(x, y);
+  > SELECT _FUNC_(x, y) FROM VALUES ('a', 10), ('b', 50), ('c', 20) AS 
tab(x, y);
a
   """,
   group = "agg_funcs",
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
index 32c41cba4e1..0bbae04fb89 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
@@ -663,7 +663,7 @@ case class JsonToStructs(
{"[1]":{"b":2}}
   > SELECT _FUNC_(map('a', 1));
{"a":1}
-  > SELECT _FUNC_(array((map('a', 1;
+  > SELECT

[spark] branch master updated: [SPARK-44364] [PYTHON] Add support for List[Row] data type for expected

2023-07-13 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 61030de02f5 [SPARK-44364] [PYTHON] Add support for List[Row] data type 
for expected
61030de02f5 is described below

commit 61030de02f5735919d6b3e3c2923831c5d2fcc61
Author: Amanda Liu 
AuthorDate: Fri Jul 14 08:27:41 2023 +0900

[SPARK-44364] [PYTHON] Add support for List[Row] data type for expected

### What changes were proposed in this pull request?
This PR adds supported for List[Row] type for the `expected` argument in 
`assertDataFrameEqual`.

### Why are the changes needed?
The change improves flexibility of the `assertDataFrameEqual` function by 
allowing for comparison between dataframe and List[Row] types.

### Does this PR introduce _any_ user-facing change?
Yes, the PR modifies the user-facing API `assertDataFrameEqual`.

### How was this patch tested?
Added tests to `runtime/python/pyspark/sql/tests/test_utils.py` and 
`runtime/python/pyspark/sql/tests/connect/test_utils.py`

Closes #41924 from asl3/list-row-support.

Authored-by: Amanda Liu 
Signed-off-by: Hyukjin Kwon 
---
 python/pyspark/errors/error_classes.py |   5 -
 python/pyspark/sql/tests/test_utils.py | 266 ++---
 python/pyspark/testing/utils.py|  48 +++---
 3 files changed, 209 insertions(+), 110 deletions(-)

diff --git a/python/pyspark/errors/error_classes.py 
b/python/pyspark/errors/error_classes.py
index 57447d56892..8c51024bf06 100644
--- a/python/pyspark/errors/error_classes.py
+++ b/python/pyspark/errors/error_classes.py
@@ -713,11 +713,6 @@ ERROR_CLASSES_JSON = """
   " is only supported with pyarrow 2.0.0 and above."
 ]
   },
-  "UNSUPPORTED_DATA_TYPE_FOR_IGNORE_ROW_ORDER" : {
-"message" : [
-  "Cannot ignore row order because undefined sorting for data type."
-]
-  },
   "UNSUPPORTED_JOIN_TYPE" : {
 "message" : [
   "Unsupported join type: . Supported join types include: 
\\"inner\\", \\"outer\\", \\"full\\", \\"fullouter\\", \\"full_outer\\", 
\\"leftouter\\", \\"left\\", \\"left_outer\\", \\"rightouter\\", \\"right\\", 
\\"right_outer\\", \\"leftsemi\\", \\"left_semi\\", \\"semi\\", \\"leftanti\\", 
\\"left_anti\\", \\"anti\\", \\"cross\\"."
diff --git a/python/pyspark/sql/tests/test_utils.py 
b/python/pyspark/sql/tests/test_utils.py
index 1757d8dd2e1..ce8d83e6cb9 100644
--- a/python/pyspark/sql/tests/test_utils.py
+++ b/python/pyspark/sql/tests/test_utils.py
@@ -57,7 +57,8 @@ class UtilsTestsMixin:
 schema=["id", "amount"],
 )
 
-assertDataFrameEqual(df1, df2)
+assertDataFrameEqual(df1, df2, checkRowOrder=False)
+assertDataFrameEqual(df1, df2, checkRowOrder=True)
 
 def test_assert_equal_arraytype(self):
 df1 = self.spark.createDataFrame(
@@ -85,7 +86,8 @@ class UtilsTestsMixin:
 ),
 )
 
-assertDataFrameEqual(df1, df2)
+assertDataFrameEqual(df1, df2, checkRowOrder=False)
+assertDataFrameEqual(df1, df2, checkRowOrder=True)
 
 def test_assert_approx_equal_arraytype_float(self):
 df1 = self.spark.createDataFrame(
@@ -113,7 +115,8 @@ class UtilsTestsMixin:
 ),
 )
 
-assertDataFrameEqual(df1, df2)
+assertDataFrameEqual(df1, df2, checkRowOrder=False)
+assertDataFrameEqual(df1, df2, checkRowOrder=True)
 
 def test_assert_notequal_arraytype(self):
 df1 = self.spark.createDataFrame(
@@ -167,6 +170,15 @@ class UtilsTestsMixin:
 message_parameters={"error_msg": expected_error_message},
 )
 
+with self.assertRaises(PySparkAssertionError) as pe:
+assertDataFrameEqual(df1, df2, checkRowOrder=True)
+
+self.check_error(
+exception=pe.exception,
+error_class="DIFFERENT_ROWS",
+message_parameters={"error_msg": expected_error_message},
+)
+
 def test_assert_equal_maptype(self):
 df1 = self.spark.createDataFrame(
 data=[
@@ -193,35 +205,8 @@ class UtilsTestsMixin:
 ),
 )
 
-assertDataFrameEqual(df1, df2, check_row_order=True)
-
-def test_assert_approx_equal_maptype_double(self):
-df1 = self.spark.createDataFrame(
-data=[
-("student1", {"math": 76.23, "english": 92.64}),
-("student2", {"math": 87.89, "english": 84.48}),
-],
-schema=StructType(
-[
-StructField("student", StringType(), True),
-StructField("grades", MapType(StringType(), DoubleType()), 
True),
-]
-),
-)
-df2 = self.spark.createDataFrame(
-data=[
-("student1", {"math": 76.23,

[spark] branch master updated: [SPARK-43389][SQL] Added a null check for lineSep option

2023-07-13 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 9f07e4a747b [SPARK-43389][SQL] Added a null check for lineSep option
9f07e4a747b is described below

commit 9f07e4a747b0e2a62b954db3c9be425c924da47a
Author: Gurpreet Singh 
AuthorDate: Thu Jul 13 18:17:45 2023 -0500

[SPARK-43389][SQL] Added a null check for lineSep option

### What changes were proposed in this pull request?

### Why are the changes needed?

- `spark.read.csv` throws `NullPointerException` when lineSep is set to None
- More details about the issue here: 
https://issues.apache.org/jira/browse/SPARK-43389

### Does this PR introduce _any_ user-facing change?

~~Users now should be able to explicitly set `lineSep` as `None` without 
getting an exception~~
After some discussion, it was decided to add a `require` check for `null` 
instead of letting it through.

### How was this patch tested?

Tested the changes with a python script that explicitly sets `lineSep` to 
`None`
```python
from pyspark.sql import SparkSession

# Create a SparkSession
spark = SparkSession.builder.appName("HelloWorld").getOrCreate()

# Read CSV into a DataFrame
df = spark.read.csv("/tmp/hello.csv", header=True, inferSchema=True, 
lineSep=None)

# Also tested the following case when options are passed before invoking 
.csv
#df = spark.read.option("lineSep", 
None).csv("/Users/gdhuper/Documents/tmp/hello.csv", header=True, 
inferSchema=True)

# Show the DataFrame
df.show()

# Stop the SparkSession
spark.stop()
```

Closes #41904 from gdhuper/gdhuper/SPARK-43389.

Authored-by: Gurpreet Singh 
Signed-off-by: Sean Owen 
---
 .../src/main/scala/org/apache/spark/sql/catalyst/csv/CSVOptions.scala| 1 +
 .../org/apache/spark/sql/execution/datasources/text/TextOptions.scala| 1 +
 2 files changed, 2 insertions(+)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVOptions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVOptions.scala
index 2b6b60fdf76..f4ad1f2f2e5 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVOptions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVOptions.scala
@@ -254,6 +254,7 @@ class CSVOptions(
* A string between two consecutive JSON records.
*/
   val lineSeparator: Option[String] = parameters.get(LINE_SEP).map { sep =>
+require(sep != null, "'lineSep' cannot be a null value.")
 require(sep.nonEmpty, "'lineSep' cannot be an empty string.")
 // Intentionally allow it up to 2 for Window's CRLF although multiple
 // characters have an issue with quotes. This is intentionally 
undocumented.
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/text/TextOptions.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/text/TextOptions.scala
index f26f05cbe1c..468d58974ed 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/text/TextOptions.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/text/TextOptions.scala
@@ -45,6 +45,7 @@ class TextOptions(@transient private val parameters: 
CaseInsensitiveMap[String])
   val encoding: Option[String] = parameters.get(ENCODING)
 
   val lineSeparator: Option[String] = parameters.get(LINE_SEP).map { lineSep =>
+require(lineSep != null, s"'$LINE_SEP' cannot be a null value.")
 require(lineSep.nonEmpty, s"'$LINE_SEP' cannot be an empty string.")
 
 lineSep


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-44395][SQL] Update TVF arguments to require parentheses around identifier after TABLE keyword

2023-07-13 Thread ueshin

This is an automated email from the ASF dual-hosted git repository.

ueshin pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 63cef16ef77 [SPARK-44395][SQL] Update TVF arguments to require 
parentheses around identifier after TABLE keyword
63cef16ef77 is described below

commit 63cef16ef779addaa841fa863a5797fc9f33fd82
Author: Daniel Tenedorio 
AuthorDate: Thu Jul 13 14:04:09 2023 -0700

[SPARK-44395][SQL] Update TVF arguments to require parentheses around 
identifier after TABLE keyword

### What changes were proposed in this pull request?

This PR updates the parsing of table function arguments to require 
parentheses around identifier after the TABLE keyword. Instead of `TABLE t`, 
the syntax should look like `TABLE(t)` instead as specified in the SQL standard.

* I kept the previous syntax without the parentheses as an optional case in 
the SQL grammar so that we can catch it in the `AstBuilder` and throw an 
informative error message telling the user to add parentheses and try the query 
again.

* I had to swap the order of parsing table function arguments, so the 
`table(identifier)` syntax does not accidentally parse as a scalar function 
call:

```
functionTableArgument
: functionTableReferenceArgument
| functionArgument
;
```

### Why are the changes needed?

This syntax is written down in the SQL standard. Per the standard, `TABLE 
identifier` should actually be passed as `TABLE(identifier)`.

### Does this PR introduce _any_ user-facing change?

Yes, SQL syntax changes slightly.

### How was this patch tested?

This PR adds and updates unit test coverage.

Closes #41965 from dtenedor/parentheses-table-clause.

Authored-by: Daniel Tenedorio 
Signed-off-by: Takuya UESHIN 
---
 .../src/main/resources/error/error-classes.json |  5 +
 ...ror-conditions-invalid-sql-syntax-error-class.md |  4 
 python/pyspark/sql/tests/test_udtf.py   | 10 +-
 .../spark/sql/catalyst/parser/SqlBaseParser.g4  |  5 +++--
 .../spark/sql/catalyst/parser/AstBuilder.scala  |  6 ++
 .../spark/sql/errors/QueryParsingErrors.scala   | 11 +++
 .../spark/sql/catalyst/parser/PlanParserSuite.scala | 21 +
 7 files changed, 51 insertions(+), 11 deletions(-)

diff --git a/common/utils/src/main/resources/error/error-classes.json 
b/common/utils/src/main/resources/error/error-classes.json
index 6691e86b463..99a0a4ae4ba 100644
--- a/common/utils/src/main/resources/error/error-classes.json
+++ b/common/utils/src/main/resources/error/error-classes.json
@@ -1694,6 +1694,11 @@
   "Expected a column reference for transform : ."
 ]
   },
+  "INVALID_TABLE_FUNCTION_IDENTIFIER_ARGUMENT_MISSING_PARENTHESES" : {
+"message" : [
+  "Syntax error: call to table-valued function is invalid because 
parentheses are missing around the provided TABLE argument ; 
please surround this with parentheses and try again."
+]
+  },
   "INVALID_TABLE_VALUED_FUNC_NAME" : {
 "message" : [
   "Table valued function cannot specify database name: ."
diff --git a/docs/sql-error-conditions-invalid-sql-syntax-error-class.md 
b/docs/sql-error-conditions-invalid-sql-syntax-error-class.md
index 6c9f588ba49..b1e298f7b90 100644
--- a/docs/sql-error-conditions-invalid-sql-syntax-error-class.md
+++ b/docs/sql-error-conditions-invalid-sql-syntax-error-class.md
@@ -49,6 +49,10 @@ Partition key `` must set value.
 
 Expected a column reference for transform ``: ``.
 
+## INVALID_TABLE_FUNCTION_IDENTIFIER_ARGUMENT_MISSING_PARENTHESES
+
+Syntax error: call to table-valued function is invalid because parentheses are 
missing around the provided TABLE argument ``; please surround 
this with parentheses and try again.
+
 ## INVALID_TABLE_VALUED_FUNC_NAME
 
 Table valued function cannot specify database name: ``.
diff --git a/python/pyspark/sql/tests/test_udtf.py 
b/python/pyspark/sql/tests/test_udtf.py
index e4db542cbb7..8f42b4123e4 100644
--- a/python/pyspark/sql/tests/test_udtf.py
+++ b/python/pyspark/sql/tests/test_udtf.py
@@ -545,7 +545,7 @@ class BaseUDTFTestsMixin:
 with self.tempView("v"):
 self.spark.sql("CREATE OR REPLACE TEMPORARY VIEW v as SELECT id 
FROM range(0, 8)")
 self.assertEqual(
-self.spark.sql("SELECT * FROM test_udtf(TABLE v)").collect(),
+self.spark.sql("SELECT * FROM test_udtf(TABLE (v))").collect(),
 [Row(a=6), Row(a=7)],
 )
 
@@ -561,7 +561,7 @@ class BaseUDTFTestsMixin:
 with self.tempView("v"):
 self.spark.sql("CREATE OR REPLACE TEMPORARY VIEW v as SELECT id 
FROM range(0, 8)")
 self.assertEqual(
-

[spark] branch master updated: [SPARK-44279][BUILD] Upgrade `optionator` to ^0.9.3

2023-07-13 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new d35fda69e49 [SPARK-44279][BUILD] Upgrade `optionator`  to ^0.9.3
d35fda69e49 is described below

commit d35fda69e49b06cda316ecd664acb22cb8c12266
Author: Bjørn Jørgensen 
AuthorDate: Fri Jul 14 03:26:56 2023 +0900

[SPARK-44279][BUILD] Upgrade `optionator`  to ^0.9.3

### What changes were proposed in this pull request?
This PR proposes a change in the package.json file to update the resolution 
for the `optionator` package to ^0.9.3.

I've added a resolutions field to package.json and specified the 
`optionator` package version as ^0.9.3.
This will ensure that our project uses `optionator` version 0.9.3 or the 
latest minor or patch version (due to the caret ^), regardless of any other 
version that may be specified in the dependencies or nested dependencies of our 
project.

### Why are the changes needed?
[CVE-2023-26115](https://nvd.nist.gov/vuln/detail/CVE-2023-26115)

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA

Closes #41955 from bjornjorgensen/word-wrap.

Authored-by: Bjørn Jørgensen 
Signed-off-by: Kousuke Saruta 
---
 dev/package-lock.json | 774 ++
 dev/package.json  |   3 +
 2 files changed, 350 insertions(+), 427 deletions(-)

diff --git a/dev/package-lock.json b/dev/package-lock.json
index 104a3fb7854..f676b9cec07 100644
--- a/dev/package-lock.json
+++ b/dev/package-lock.json
@@ -10,6 +10,15 @@
 "minimatch": "^3.1.2"
   }
 },
+"node_modules/@aashutoshrathi/word-wrap": {
+  "version": "1.2.6",
+  "resolved": 
"https://registry.npmjs.org/@aashutoshrathi/word-wrap/-/word-wrap-1.2.6.tgz;,
+  "integrity": 
"sha512-1Yjs2SvM8TflER/OD3cOjhWWOZb58A2t7wpE2S9XfBYTiIl+XFhQG2bjy4Pu1I+EAlCNUzRDYDdFwFYUKvXcIA==",
+  "dev": true,
+  "engines": {
+"node": ">=0.10.0"
+  }
+},
 "node_modules/@babel/code-frame": {
   "version": "7.12.11",
   "resolved": 
"https://registry.npmjs.org/@babel/code-frame/-/code-frame-7.12.11.tgz;,
@@ -20,20 +29,38 @@
   }
 },
 "node_modules/@babel/helper-validator-identifier": {
-  "version": "7.14.0",
-  "resolved": 
"https://registry.npmjs.org/@babel/helper-validator-identifier/-/helper-validator-identifier-7.14.0.tgz;,
-  "integrity": 
"sha512-V3ts7zMSu5lfiwWDVWzRDGIN+lnCEUdaXgtVHJgLb1rGaA6jMrtB9EmE7L18foXJIE8Un/A/h6NJfGQp/e1J4A==",
-  "dev": true
+  "version": "7.22.5",
+  "resolved": 
"https://registry.npmjs.org/@babel/helper-validator-identifier/-/helper-validator-identifier-7.22.5.tgz;,
+  "integrity": 
"sha512-aJXu+6lErq8ltp+JhkJUfk1MTGyuA4v7f3pA+BJ5HLfNC6nAQ0Cpi9uOquUj8Hehg0aUiHzWQbOVJGao6ztBAQ==",
+  "dev": true,
+  "engines": {
+"node": ">=6.9.0"
+  }
 },
 "node_modules/@babel/highlight": {
-  "version": "7.14.0",
-  "resolved": 
"https://registry.npmjs.org/@babel/highlight/-/highlight-7.14.0.tgz;,
-  "integrity": 
"sha512-YSCOwxvTYEIMSGaBQb5kDDsCopDdiUGsqpatp3fOlI4+2HQSkTmEVWnVuySdAC5EWCqSWWTv0ib63RjR7dTBdg==",
+  "version": "7.22.5",
+  "resolved": 
"https://registry.npmjs.org/@babel/highlight/-/highlight-7.22.5.tgz;,
+  "integrity": 
"sha512-BSKlD1hgnedS5XRnGOljZawtag7H1yPfQp0tdNJCHoH6AZ+Pcm9VvkrK59/Yy593Ypg0zMxH2BxD1VPYUQ7UIw==",
   "dev": true,
   "dependencies": {
-"@babel/helper-validator-identifier": "^7.14.0",
+"@babel/helper-validator-identifier": "^7.22.5",
 "chalk": "^2.0.0",
 "js-tokens": "^4.0.0"
+  },
+  "engines": {
+"node": ">=6.9.0"
+  }
+},
+"node_modules/@babel/highlight/node_modules/ansi-styles": {
+  "version": "3.2.1",
+  "resolved": 
"https://registry.npmjs.org/ansi-styles/-/ansi-styles-3.2.1.tgz;,
+  "integrity": 
"sha512-VT0ZI6kZRdTh8YyJw3SMbYm/u+NqfsAxEpWO0Pf9sq8/e94WxxOpPKx9FR1FlyCtOVDNOQ+8ntlqFxiRc+r5qA==",
+  "dev": true,
+  "dependencies": {
+"color-convert": "^1.9.0"
+  },
+  "engines": {
+"node": ">=4"
   }
 },
 "node_modules/@babel/highlight/node_modules/chalk": {
@@ -50,16 +77,40 @@
 "node": ">=4"
   }
 },
+"node_modules/@babel/highlight/node_modules/color-convert": {
+  "version": "1.9.3",
+  "resolved": 
"https://registry.npmjs.org/color-convert/-/color-convert-1.9.3.tgz;,
+  "integrity": 
"sha512-QfAUtd+vFdAtFQcC8CCyYt1fYWxSqAiK2cSD6zDB8N3cpsEBAvRxp9zOGg6G/SHHJYAT88/az/IuDGALsNVbGg==",
+  "dev": true,
+  "dependencies": {
+"color-name": "1.1.3"
+  }
+},
+"node_modules/@babel/highlight/node_modules/color-name": {
+  "version": "1.1.3",
+  "resolved":

[spark] branch master updated: [SPARK-44398][CONNECT] Scala foreachBatch API

2023-07-13 Thread xinrong

This is an automated email from the ASF dual-hosted git repository.

xinrong pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 4771853c9bc [SPARK-44398][CONNECT] Scala foreachBatch API
4771853c9bc is described below

commit 4771853c9bc26b8741091d63d77c4b6487e74189
Author: Raghu Angadi 
AuthorDate: Thu Jul 13 10:47:49 2023 -0700

[SPARK-44398][CONNECT] Scala foreachBatch API

This implements Scala foreachBatch(). The implementation basic and needs 
some more enhancements. The server side will be shared by Python implementation 
as well.

One notable hack in this PR is that it runs user's `foreachBatch()` with 
regular(legacy) DataFrame, rather than setting up remote Spark connect session 
and connect DataFrame.

### Why are the changes needed?
Adds foreachBatch() support in Scala Spark Connect.

### Does this PR introduce _any_ user-facing change?
Yes. Adds foreachBatch() API

### How was this patch tested?
- A simple unit test.

Closes #41969 from rangadi/feb-scala.

Authored-by: Raghu Angadi 
Signed-off-by: Xinrong Meng 
---
 .../spark/sql/streaming/DataStreamWriter.scala | 28 ++-
 .../spark/sql/streaming/StreamingQuerySuite.scala  | 52 -
 .../src/main/protobuf/spark/connect/commands.proto | 11 +--
 .../sql/connect/planner/SparkConnectPlanner.scala  | 25 +-
 .../planner/StreamingForeachBatchHelper.scala  | 69 +
 python/pyspark/sql/connect/proto/commands_pb2.py   | 88 +++---
 python/pyspark/sql/connect/proto/commands_pb2.pyi  | 46 +++
 python/pyspark/sql/connect/streaming/readwriter.py |  4 +-
 8 files changed, 251 insertions(+), 72 deletions(-)

diff --git 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala
 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala
index 9f63f68a000..ad76ab4a1bc 100644
--- 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala
+++ 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala
@@ -30,12 +30,15 @@ import org.apache.spark.connect.proto.Command
 import org.apache.spark.connect.proto.WriteStreamOperationStart
 import org.apache.spark.internal.Logging
 import org.apache.spark.sql.{Dataset, ForeachWriter}
+import org.apache.spark.sql.connect.common.DataTypeProtoConverter
 import org.apache.spark.sql.connect.common.ForeachWriterPacket
 import org.apache.spark.sql.execution.streaming.AvailableNowTrigger
 import org.apache.spark.sql.execution.streaming.ContinuousTrigger
 import org.apache.spark.sql.execution.streaming.OneTimeTrigger
 import org.apache.spark.sql.execution.streaming.ProcessingTimeTrigger
+import org.apache.spark.sql.types.NullType
 import org.apache.spark.util.SparkSerDeUtils
+import org.apache.spark.util.Utils
 
 /**
  * Interface used to write a streaming `Dataset` to external storage systems 
(e.g. file systems,
@@ -218,7 +221,30 @@ final class DataStreamWriter[T] private[sql] (ds: 
Dataset[T]) extends Logging {
 val scalaWriterBuilder = proto.ScalarScalaUDF
   .newBuilder()
   .setPayload(ByteString.copyFrom(serialized))
-sinkBuilder.getForeachWriterBuilder.setScalaWriter(scalaWriterBuilder)
+sinkBuilder.getForeachWriterBuilder.setScalaFunction(scalaWriterBuilder)
+this
+  }
+
+  /**
+   * :: Experimental ::
+   *
+   * (Scala-specific) Sets the output of the streaming query to be processed 
using the provided
+   * function. This is supported only in the micro-batch execution modes (that 
is, when the
+   * trigger is not continuous). In every micro-batch, the provided function 
will be called in
+   * every micro-batch with (i) the output rows as a Dataset and (ii) the 
batch identifier. The
+   * batchId can be used to deduplicate and transactionally write the output 
(that is, the
+   * provided Dataset) to external systems. The output Dataset is guaranteed 
to be exactly the
+   * same for the same batchId (assuming all operations are deterministic in 
the query).
+   *
+   * @since 3.5.0
+   */
+  @Evolving
+  def foreachBatch(function: (Dataset[T], Long) => Unit): DataStreamWriter[T] 
= {
+val serializedFn = Utils.serialize(function)
+sinkBuilder.getForeachBatchBuilder.getScalaFunctionBuilder
+  .setPayload(ByteString.copyFrom(serializedFn))
+  .setOutputType(DataTypeProtoConverter.toConnectProtoType(NullType)) // 
Unused.
+  .setNullable(true) // Unused.
 this
   }
 
diff --git 
a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/streaming/StreamingQuerySuite.scala
 
b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/streaming/StreamingQuerySuite.scala
index 6ddcedf19cb..438e6e0c2fe 100644
---

[spark] branch master updated: [SPARK-44338][SQL] Fix view schema mismatch error message

2023-07-13 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 7e0679be8aa [SPARK-44338][SQL] Fix view schema mismatch error message
7e0679be8aa is described below

commit 7e0679be8aa1aea531417907da7c66d86b74302a
Author: Wenchen Fan 
AuthorDate: Thu Jul 13 10:32:43 2023 -0700

[SPARK-44338][SQL] Fix view schema mismatch error message

### What changes were proposed in this pull request?

This fixes a minor issue that the view schema mismatch error message may 
have an extra space between view name and `AS SELECT`. It also takes the chance 
to simplify the view test a little bit, to always use `tableIdentifier` to 
produce the view name.

### Why are the changes needed?

small fix of the error message

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

existing tests

Closes #41898 from cloud-fan/view.

Authored-by: Wenchen Fan 
Signed-off-by: Gengliang Wang 
---
 .../sql/catalyst/catalog/SessionCatalog.scala  |  6 +--
 .../columnresolution-negative.sql.out  |  2 +-
 .../results/columnresolution-negative.sql.out  |  2 +-
 .../spark/sql/execution/SQLViewTestSuite.scala | 53 +-
 4 files changed, 27 insertions(+), 36 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
index 135dc17ad95..392c911ddb8 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
@@ -909,11 +909,11 @@ class SessionCatalog(
   val viewText = metadata.viewText.get
   val userSpecifiedColumns =
 if (metadata.schema.fieldNames.toSeq == metadata.viewQueryColumnNames) 
{
-  ""
+  " "
 } else {
-  s"(${metadata.schema.fieldNames.mkString(", ")})"
+  s" (${metadata.schema.fieldNames.mkString(", ")}) "
 }
-  Some(s"CREATE OR REPLACE VIEW $viewName $userSpecifiedColumns AS 
$viewText")
+  Some(s"CREATE OR REPLACE VIEW $viewName${userSpecifiedColumns}AS 
$viewText")
 }
   }
 
diff --git 
a/sql/core/src/test/resources/sql-tests/analyzer-results/columnresolution-negative.sql.out
 
b/sql/core/src/test/resources/sql-tests/analyzer-results/columnresolution-negative.sql.out
index 95f3e53ff60..c80b9e82ab0 100644
--- 
a/sql/core/src/test/resources/sql-tests/analyzer-results/columnresolution-negative.sql.out
+++ 
b/sql/core/src/test/resources/sql-tests/analyzer-results/columnresolution-negative.sql.out
@@ -417,7 +417,7 @@ org.apache.spark.sql.AnalysisException
 "actualCols" : "[]",
 "colName" : "i1",
 "expectedNum" : "1",
-"suggestion" : "CREATE OR REPLACE VIEW spark_catalog.mydb1.v1  AS SELECT * 
FROM t1",
+"suggestion" : "CREATE OR REPLACE VIEW spark_catalog.mydb1.v1 AS SELECT * 
FROM t1",
 "viewName" : "`spark_catalog`.`mydb1`.`v1`"
   }
 }
diff --git 
a/sql/core/src/test/resources/sql-tests/results/columnresolution-negative.sql.out
 
b/sql/core/src/test/resources/sql-tests/results/columnresolution-negative.sql.out
index 4d700b0a142..b3e62faa3b3 100644
--- 
a/sql/core/src/test/resources/sql-tests/results/columnresolution-negative.sql.out
+++ 
b/sql/core/src/test/resources/sql-tests/results/columnresolution-negative.sql.out
@@ -460,7 +460,7 @@ org.apache.spark.sql.AnalysisException
 "actualCols" : "[]",
 "colName" : "i1",
 "expectedNum" : "1",
-"suggestion" : "CREATE OR REPLACE VIEW spark_catalog.mydb1.v1  AS SELECT * 
FROM t1",
+"suggestion" : "CREATE OR REPLACE VIEW spark_catalog.mydb1.v1 AS SELECT * 
FROM t1",
 "viewName" : "`spark_catalog`.`mydb1`.`v1`"
   }
 }
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/SQLViewTestSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/SQLViewTestSuite.scala
index 2956a6345bf..1bee93fa429 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/SQLViewTestSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/SQLViewTestSuite.scala
@@ -41,10 +41,7 @@ abstract class SQLViewTestSuite extends QueryTest with 
SQLTestUtils {
 
   protected def viewTypeString: String
   protected def formattedViewName(viewName: String): String
-  protected def fullyQualifiedViewName(viewName: String): String
   protected def tableIdentifier(viewName: String): TableIdentifier
-  protected def tableIdentQuotedString(viewName: String): String =
-tableIdentifier(viewName).quotedString
 
   def createView(
   viewName: String,
@@ -185,10 +182,10 @@ abstract class SQLViewTestSuite extends QueryTest with

[spark] branch master updated: [SPARK-44402][SQL][TESTS] Add tests for schema pruning in delta-based DELETEs

2023-07-13 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new af27400bd1f [SPARK-44402][SQL][TESTS] Add tests for schema pruning in 
delta-based DELETEs
af27400bd1f is described below

commit af27400bd1f674c59564eb66570209a783859f6e
Author: aokolnychyi 
AuthorDate: Thu Jul 13 10:18:51 2023 -0700

[SPARK-44402][SQL][TESTS] Add tests for schema pruning in delta-based 
DELETEs

### What changes were proposed in this pull request?

This PR adds tests for schema pruning for delta-based DELETEs, similar to 
the ones we have for MERGE and UPDATE.

### Why are the changes needed?

These changes are needed to verify schema pruning works as expected.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

This PR comes with tests.

Closes #41975 from aokolnychyi/spark-44402.

Authored-by: aokolnychyi 
Signed-off-by: Dongjoon Hyun 
---
 .../connector/DeltaBasedDeleteFromTableSuite.scala  | 21 -
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/connector/DeltaBasedDeleteFromTableSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/connector/DeltaBasedDeleteFromTableSuite.scala
index 4da85a5ce05..7336b3a6e92 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/connector/DeltaBasedDeleteFromTableSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/connector/DeltaBasedDeleteFromTableSuite.scala
@@ -17,7 +17,7 @@
 
 package org.apache.spark.sql.connector
 
-import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.{AnalysisException, Row}
 
 class DeltaBasedDeleteFromTableSuite extends DeleteFromTableSuiteBase {
 
@@ -59,4 +59,23 @@ class DeltaBasedDeleteFromTableSuite extends 
DeleteFromTableSuiteBase {
 }
 assert(exception.message.contains("Row ID attributes cannot be nullable"))
   }
+
+  test("delete with schema pruning") {
+createAndInitTable("pk INT NOT NULL, id INT, country STRING, dep STRING",
+  """{ "pk": 1, "id": 1, "country": "uk", "dep": "hr" }
+|{ "pk": 2, "id": 2, "country": "us", "dep": "software" }
+|{ "pk": 3, "id": 3, "country": "canada", "dep": "hr" }
+|""".stripMargin)
+
+executeAndCheckScan(
+  s"DELETE FROM $tableNameAsString WHERE id <= 1",
+  // `pk` is used to encode deletes
+  // `id` is used in the condition
+  // `_partition` is used in the requested write distribution
+  expectedScanSchema = "pk INT, id INT, _partition STRING")
+
+checkAnswer(
+  sql(s"SELECT * FROM $tableNameAsString"),
+  Row(2, 2, "us", "software") :: Row(3, 3, "canada", "hr") :: Nil)
+  }
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-42309][SQL] Introduce `INCOMPATIBLE_DATA_TO_TABLE` and sub classes

2023-07-13 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new efed39516c0 [SPARK-42309][SQL] Introduce `INCOMPATIBLE_DATA_TO_TABLE` 
and sub classes
efed39516c0 is described below

commit efed39516c0c4e9654aec447ce91676026368384
Author: itholic 
AuthorDate: Thu Jul 13 17:21:29 2023 +0300

[SPARK-42309][SQL] Introduce `INCOMPATIBLE_DATA_TO_TABLE` and sub classes

### What changes were proposed in this pull request?

This PR proposes to assign name to _LEGACY_ERROR_TEMP_1204, 
"INCOMPATIBLE_DATA_TO_TABLE" and its sub classes:
- CANNOT_FIND_DATA
- AMBIGUOUS_COLUMN_NAME
- EXTRA_STRUCT_FIELDS
- NULLABLE_COLUMN
- NULLABLE_ARRAY_ELEMENTS
- NULLABLE_MAP_VALUES
- CANNOT_SAFELY_CAST
- STRUCT_MISSING_FIELDS
- UNEXPECTED_COLUMN_NAME

### Why are the changes needed?

We should assign proper name to _LEGACY_ERROR_TEMP_*

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

`./build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite*`

Closes #39937 from itholic/LEGACY_1204.

Authored-by: itholic 
Signed-off-by: Max Gekk 
---
 common/utils/src/main/resources/error/README.md|  14 +
 .../src/main/resources/error/error-classes.json|  59 ++-
 docs/_data/menu-sql.yaml   |   2 +-
 ...ions-incompatible-data-for-table-error-class.md |  64 +++
 ...tions-incompatible-data-to-table-error-class.md |  64 +++
 docs/sql-error-conditions.md   |   8 +
 docs/sql-ref-ansi-compliance.md|   3 +-
 .../sql/catalyst/analysis/AssignmentUtils.scala|   5 +-
 .../catalyst/analysis/TableOutputResolver.scala|  97 +++--
 .../spark/sql/catalyst/types/DataTypeUtils.scala   |  59 +--
 .../spark/sql/errors/QueryCompilationErrors.scala  | 110 +-
 .../catalyst/analysis/V2WriteAnalysisSuite.scala   | 267 ++---
 .../types/DataTypeWriteCompatibilitySuite.scala| 429 -
 .../apache/spark/sql/DataFrameWriterV2Suite.scala  |  39 +-
 .../org/apache/spark/sql/SQLInsertTestSuite.scala  |   5 +-
 .../command/AlignMergeAssignmentsSuite.scala   |  78 +++-
 .../command/AlignUpdateAssignmentsSuite.scala  |  54 ++-
 .../org/apache/spark/sql/sources/InsertSuite.scala |  98 +++--
 .../sql/test/DataFrameReaderWriterSuite.scala  |  47 ++-
 .../spark/sql/hive/client/HiveClientSuite.scala|  22 +-
 20 files changed, 1100 insertions(+), 424 deletions(-)

diff --git a/common/utils/src/main/resources/error/README.md 
b/common/utils/src/main/resources/error/README.md
index 838991c2b6a..dfcb42d49e7 100644
--- a/common/utils/src/main/resources/error/README.md
+++ b/common/utils/src/main/resources/error/README.md
@@ -1294,6 +1294,20 @@ The following SQLSTATEs are collated from:
 |IM013|IM   |ODBC driver   |013 
|Trace file error|SQL Server |N 
  |SQL Server   
   |
 |IM014|IM   |ODBC driver   |014 
|Invalid name of File DSN|SQL Server |N 
  |SQL Server   
   |
 |IM015|IM   |ODBC driver   |015 
|Corrupt file data source|SQL Server |N 
  |SQL Server   
   |
+|KD000   |KD   |datasource specific errors|000 
|datasource specific errors  |Databricks
 |N   |Databricks   
   |
+|KD001   |KD   |datasource specific errors|001 
|Cannot read file footer |Databricks
 |N   |Databricks   
   |
+|KD002   |KD   |datasource specific errors|002 
|Unexpected version  |Databricks
 |N   |Databricks   
   |
+|KD003   |KD   |datasource specific errors|003 
|Incorrect access to data type   
|Databricks |N   |Databricks
  |
+|KD004   |KD   |datasource specific errors|004 
|Delta protocol version error
|Databricks |N   |Databricks

[spark] branch master updated: [SPARK-44384][SQL][TESTS] Use checkError() to check Exception in ViewSuite, NamespaceSuite, DataSourceSuite

2023-07-13 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 4dedb4ad2c9 [SPARK-44384][SQL][TESTS] Use checkError() to check 
Exception in *View*Suite, *Namespace*Suite, *DataSource*Suite
4dedb4ad2c9 is described below

commit 4dedb4ad2c9b2ecd75dd9ccec5f565805752ad8e
Author: panbingkun 
AuthorDate: Thu Jul 13 16:26:34 2023 +0300

[SPARK-44384][SQL][TESTS] Use checkError() to check Exception in 
*View*Suite, *Namespace*Suite, *DataSource*Suite

### What changes were proposed in this pull request?
The pr aims to use `checkError()` to check `Exception` in `*View*Suite`, 
`*Namespace*Suite`, `*DataSource*Suite`, include:
- sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite
- sql/core/src/test/scala/org/apache/spark/sql/NestedDataSourceSuite
- 
sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2DataFrameSuite
- 
sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2FunctionSuite
- 
sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite
- sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2Suite
- sql/core/src/test/scala/org/apache/spark/sql/execution/SQLViewSuite
- 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/v2/ShowNamespacesSuite
- 
sql/core/src/test/scala/org/apache/spark/sql/sources/ResolvedDataSourceSuite
- 
sql/core/src/test/scala/org/apache/spark/sql/streaming/sources/StreamingDataSourceV2Suite
- 
sql/hive/src/test/scala/org/apache/spark/sql/hive/MetastoreDataSourcesSuite
- 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveSQLViewSuite
- 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/command/AlterNamespaceSetLocationSuite

### Why are the changes needed?
Migration on checkError() will make the tests independent from the text of 
error messages.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
- Manually test.
- Pass GA.

Closes #41952 from panbingkun/view_and_namespace_checkerror.

Authored-by: panbingkun 
Signed-off-by: Max Gekk 
---
 .../spark/sql/FileBasedDataSourceSuite.scala   |  67 +++--
 .../apache/spark/sql/NestedDataSourceSuite.scala   |  24 +-
 .../sql/connector/DataSourceV2DataFrameSuite.scala |  21 +-
 .../sql/connector/DataSourceV2FunctionSuite.scala  | 189 +++--
 .../spark/sql/connector/DataSourceV2SQLSuite.scala |  54 +++-
 .../spark/sql/connector/DataSourceV2Suite.scala|  38 ++-
 .../apache/spark/sql/execution/SQLViewSuite.scala  | 313 ++---
 .../execution/command/v2/ShowNamespacesSuite.scala |  28 +-
 .../sql/sources/ResolvedDataSourceSuite.scala  |  24 +-
 .../sources/StreamingDataSourceV2Suite.scala   |  68 +++--
 .../spark/sql/hive/MetastoreDataSourcesSuite.scala |  88 +++---
 .../sql/hive/execution/HiveSQLViewSuite.scala  |  31 +-
 .../command/AlterNamespaceSetLocationSuite.scala   |  11 +-
 13 files changed, 671 insertions(+), 285 deletions(-)

diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala
index e7e53285d62..d69a68f5726 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala
@@ -26,7 +26,7 @@ import scala.collection.mutable
 import org.apache.hadoop.conf.Configuration
 import org.apache.hadoop.fs.{LocalFileSystem, Path}
 
-import org.apache.spark.SparkException
+import org.apache.spark.{SparkException, SparkFileNotFoundException, 
SparkRuntimeException}
 import org.apache.spark.scheduler.{SparkListener, SparkListenerTaskEnd}
 import org.apache.spark.sql.TestingUDT.{IntervalUDT, NullData, NullUDT}
 import org.apache.spark.sql.catalyst.expressions.{AttributeReference, 
GreaterThan, Literal}
@@ -129,11 +129,13 @@ class FileBasedDataSourceSuite extends QueryTest
   allFileBasedDataSources.foreach { format =>
 test(s"SPARK-23372 error while writing empty schema files using $format") {
   withTempPath { outputPath =>
-val errMsg = intercept[AnalysisException] {
-  spark.emptyDataFrame.write.format(format).save(outputPath.toString)
-}
-assert(errMsg.getMessage.contains(
-  "Datasource does not support writing empty or nested empty schemas"))
+checkError(
+  exception = intercept[AnalysisException] {
+spark.emptyDataFrame.write.format(format).save(outputPath.toString)
+  },
+  errorClass = "_LEGACY_ERROR_TEMP_1142",
+  parameters = Map.empty
+)
   }
 
   // Nested empty schema
@@ -144,11 +146,13 @@ class

[spark] branch master updated (9aa42a970c4 -> 8bb07388ea6)

2023-07-13 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 9aa42a970c4 [SPARK-41811][PYTHON][CONNECT] Implement 
SparkSession.sql's string formatter
 add 8bb07388ea6 [SPARK-44145][SQL] Callback when ready for execution

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/QueryPlanningTracker.scala  |  52 -
 .../sql/catalyst/QueryPlanningTrackerSuite.scala   |  19 +++
 .../org/apache/spark/sql/DataFrameWriter.scala |   3 +-
 .../org/apache/spark/sql/DataFrameWriterV2.scala   |   3 +-
 .../scala/org/apache/spark/sql/SparkSession.scala  |  84 ++
 .../spark/sql/execution/QueryExecution.scala   |  27 -
 .../spark/sql/execution/QueryExecutionSuite.scala  | 128 -
 7 files changed, 283 insertions(+), 33 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-41811][PYTHON][CONNECT] Implement SparkSession.sql's string formatter

2023-07-13 Thread ruifengz

This is an automated email from the ASF dual-hosted git repository.

ruifengz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 9aa42a970c4 [SPARK-41811][PYTHON][CONNECT] Implement 
SparkSession.sql's string formatter
9aa42a970c4 is described below

commit 9aa42a970c4bd8e54603b1795a0f449bd556b11b
Author: Ruifeng Zheng 
AuthorDate: Thu Jul 13 17:58:00 2023 +0800

[SPARK-41811][PYTHON][CONNECT] Implement SparkSession.sql's string formatter

### What changes were proposed in this pull request?
Implement SparkSession.sql's string formatter

### Why are the changes needed?
for parity

### Does this PR introduce _any_ user-facing change?
yes

before:
```
In [1]: spark.createDataFrame([("Alice", 6), ("Bob", 7), ("John", 10)], 
['name', 'age']).createOrReplaceTempView("person")

In [2]: spark.sql("""SELECT * FROM person WHERE age < {age}""", age = 
9).show()
---
TypeError Traceback (most recent call last)
Cell In[2], line 1
> 1 spark.sql("""SELECT * FROM person WHERE age < {age}""", age = 
9).show()

TypeError: sql() got an unexpected keyword argument 'age'
```

after:
```
In [1]: spark.createDataFrame([("Alice", 6), ("Bob", 7), ("John", 10)], 
['name', 'age']).createOrReplaceTempView("person")

In [2]: spark.sql("""SELECT * FROM person WHERE age < {age}""", age = 
9).show()
+-+---+
| name|age|
+-+---+
|Alice|  6|
|  Bob|  7|
+-+---+
```

### How was this patch tested?
enabled doc test

Closes #41980 from zhengruifeng/py_connect_sql_formatter.

Authored-by: Ruifeng Zheng 
Signed-off-by: Ruifeng Zheng 
---
 python/pyspark/pandas/sql_formatter.py|  7 ++---
 python/pyspark/sql/connect/session.py | 35 ---
 python/pyspark/sql/{ => connect}/sql_formatter.py | 30 ---
 python/pyspark/sql/sql_formatter.py   |  5 ++--
 python/pyspark/sql/utils.py   |  8 ++
 5 files changed, 51 insertions(+), 34 deletions(-)

diff --git a/python/pyspark/pandas/sql_formatter.py 
b/python/pyspark/pandas/sql_formatter.py
index 8593703bd94..7501e19c038 100644
--- a/python/pyspark/pandas/sql_formatter.py
+++ b/python/pyspark/pandas/sql_formatter.py
@@ -264,10 +264,9 @@ class PandasSQLStringFormatter(string.Formatter):
 val._to_spark().createOrReplaceTempView(df_name)
 return df_name
 elif isinstance(val, str):
-# This is matched to behavior from JVM implementation.
-# See `sql` definition from 
`sql/catalyst/src/main/scala/org/apache/spark/
-# sql/catalyst/expressions/literals.scala`
-return "'" + val.replace("\\", "").replace("'", "\\'") + "'"
+from pyspark.sql.utils import get_lit_sql_str
+
+return get_lit_sql_str(val)
 else:
 return val
 
diff --git a/python/pyspark/sql/connect/session.py 
b/python/pyspark/sql/connect/session.py
index ea88d60d760..13868263174 100644
--- a/python/pyspark/sql/connect/session.py
+++ b/python/pyspark/sql/connect/session.py
@@ -489,13 +489,31 @@ class SparkSession:
 
 createDataFrame.__doc__ = PySparkSession.createDataFrame.__doc__
 
-def sql(self, sqlQuery: str, args: Optional[Union[Dict[str, Any], List]] = 
None) -> "DataFrame":
-cmd = SQL(sqlQuery, args)
-data, properties = 
self.client.execute_command(cmd.command(self._client))
-if "sql_command_result" in properties:
-return 
DataFrame.withPlan(CachedRelation(properties["sql_command_result"]), self)
-else:
-return DataFrame.withPlan(SQL(sqlQuery, args), self)
+def sql(
+self,
+sqlQuery: str,
+args: Optional[Union[Dict[str, Any], List]] = None,
+**kwargs: Any,
+) -> "DataFrame":
+
+if len(kwargs) > 0:
+from pyspark.sql.connect.sql_formatter import SQLStringFormatter
+
+formatter = SQLStringFormatter(self)
+sqlQuery = formatter.format(sqlQuery, **kwargs)
+
+try:
+cmd = SQL(sqlQuery, args)
+data, properties = 
self.client.execute_command(cmd.command(self._client))
+if "sql_command_result" in properties:
+return 
DataFrame.withPlan(CachedRelation(properties["sql_command_result"]), self)
+else:
+return DataFrame.withPlan(SQL(sqlQuery, args), self)
+finally:
+if len(kwargs) > 0:
+# TODO: should drop temp views after SPARK-44406 get resolved
+# formatter.clear()
+pass
 
 sql.__doc__ = PySparkSession.sql.__doc__
 
@@ -808,9 +826,6 @@

[spark] branch master updated: [SPARK-44388][CONNECT] Fix protobuf cast issue when UDF instance is updated

2023-07-13 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 357bcac823c [SPARK-44388][CONNECT] Fix protobuf cast issue when UDF 
instance is updated
357bcac823c is described below

commit 357bcac823c0fafbdcb95458327d35e4a492046c
Author: vicennial 
AuthorDate: Thu Jul 13 18:55:10 2023 +0900

[SPARK-44388][CONNECT] Fix protobuf cast issue when UDF instance is updated

### What changes were proposed in this pull request?

This PR modifies the arguments of the `ScalarUserDefinedFunction` case 
class to take in `serializedUdfPacket`, `inputTypes` and `outputType` instead 
of calculating these types internally from `function`, `inputEncoders` and 
`outputEncoder`. This allows the class to copy over the serialized udf value 
from the parent instance instead of re-creating it. Through this, we avoid 
hitting a variant of the issue mentioned in 
[SPARK-43198](https://issues.apache.org/jira/browse/SPARK-43198) whic [...]

### Why are the changes needed?

Bugfix. Consider the following code:
```
class A(x: Int) { def get = x * 7 }
val myUdf = udf((x: Int) => new A(x).get)
val modifiedUdf = myUdf.withName("myUdf").asNondeterministic()
spark.range(5).select(modifiedUdf(col("id"))).as[Int].collect()
```

Executing this code currently results in hitting the following error:
```
java.lang.ClassCastException: org.apache.spark.connect.proto.ScalarScalaUDF 
cannot be cast to com.google.protobuf.MessageLite
at 
com.google.protobuf.GeneratedMessageLite$SerializedForm.readResolve(GeneratedMessageLite.java:1462)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
java.io.ObjectStreamClass.invokeReadResolve(ObjectStreamClass.java:1274)
at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2196)
...
```
If we do not include the `myUdf.withName("myUdf").asNondeterministic()`, 
the UDF runs as expected.

### Does this PR introduce _any_ user-facing change?

Yes, fixes the bug mentioned above.

### How was this patch tested?

Added a new E2E test in `ReplE2ESuite`.

Closes #41959 from vicennial/SPARK-44388.

Lead-authored-by: vicennial 
Co-authored-by: Venkata Sai Akhil Gudesa 
Co-authored-by: Venkata Sai Akhil Gudesa 
Signed-off-by: Hyukjin Kwon 
---
 .../sql/expressions/UserDefinedFunction.scala  | 29 +++---
 .../spark/sql/application/ReplE2ESuite.scala   | 11 
 2 files changed, 25 insertions(+), 15 deletions(-)

diff --git 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala
 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala
index d911b7efe29..7bce4b5b31a 100644
--- 
a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala
+++ 
b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala
@@ -92,26 +92,23 @@ sealed abstract class UserDefinedFunction {
 /**
  * Holder class for a scalar user-defined function and it's input/output 
encoder(s).
  */
-case class ScalarUserDefinedFunction(
-function: AnyRef,
-inputEncoders: Seq[AgnosticEncoder[_]],
-outputEncoder: AgnosticEncoder[_],
+case class ScalarUserDefinedFunction private (
+// SPARK-43198: Eagerly serialize to prevent the UDF from containing a 
reference to this class.
+serializedUdfPacket: Array[Byte],
+inputTypes: Seq[proto.DataType],
+outputType: proto.DataType,
 name: Option[String],
 override val nullable: Boolean,
 override val deterministic: Boolean)
 extends UserDefinedFunction {
 
-  // SPARK-43198: Eagerly serialize to prevent the UDF from containing a 
reference to this class.
-  private[this] val udf = {
-val udfPacketBytes =
-  SparkSerDeUtils.serialize(UdfPacket(function, inputEncoders, 
outputEncoder))
+  private[this] lazy val udf = {
 val scalaUdfBuilder = proto.ScalarScalaUDF
   .newBuilder()
-  .setPayload(ByteString.copyFrom(udfPacketBytes))
+  .setPayload(ByteString.copyFrom(serializedUdfPacket))
   // Send the real inputs and return types to obtain the types without 
deser the udf bytes.
-  .addAllInputTypes(
-
inputEncoders.map(_.dataType).map(DataTypeProtoConverter.toConnectProtoType).asJava)
-  
.setOutputType(DataTypeProtoConverter.toConnectProtoType(outputEncoder.dataType))
+

[spark] branch master updated: [SPARK-44391][SQL] Check the number of argument types in `InvokeLike`

2023-07-13 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 3e82ac6ea3d [SPARK-44391][SQL] Check the number of argument types in 
`InvokeLike`
3e82ac6ea3d is described below

commit 3e82ac6ea3d9f87c8ac09e481235beefaa1bf758
Author: Max Gekk 
AuthorDate: Thu Jul 13 12:17:20 2023 +0300

[SPARK-44391][SQL] Check the number of argument types in `InvokeLike`

### What changes were proposed in this pull request?
In the PR, I propose to check the number of argument types in the 
`InvokeLike` expressions. If the input types are provided, the number of types 
should be exactly the same as the number of argument expressions.

### Why are the changes needed?
1. This PR checks the contract described in the comment explicitly:

https://github.com/apache/spark/blob/d9248e83bbb3af49333608bebe7149b1aaeca738/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala#L247

that can prevent the errors of expression implementations, and improve code 
maintainability.

2. Also it fixes the issue in the `UrlEncode` and `UrlDecode`.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
By running the related tests:
```
$ build/sbt "test:testOnly *UrlFunctionsSuite"
$ build/sbt "test:testOnly *DataSourceV2FunctionSuite"
```

Closes #41954 from MaxGekk/fix-url_decode.

Authored-by: Max Gekk 
Signed-off-by: Max Gekk 
---
 common/utils/src/main/resources/error/error-classes.json  |  5 +
 .../explain-results/function_url_decode.explain   |  2 +-
 .../explain-results/function_url_encode.explain   |  2 +-
 .../sql-error-conditions-datatype-mismatch-error-class.md |  4 
 .../spark/sql/catalyst/analysis/CheckAnalysis.scala   |  5 +++--
 .../spark/sql/catalyst/expressions/objects/objects.scala  | 15 +++
 .../spark/sql/catalyst/expressions/urlExpressions.scala   |  4 ++--
 7 files changed, 31 insertions(+), 6 deletions(-)

diff --git a/common/utils/src/main/resources/error/error-classes.json 
b/common/utils/src/main/resources/error/error-classes.json
index 347ce026476..2c4d2b533a6 100644
--- a/common/utils/src/main/resources/error/error-classes.json
+++ b/common/utils/src/main/resources/error/error-classes.json
@@ -657,6 +657,11 @@
   "The  must be between  (current value = 
)."
 ]
   },
+  "WRONG_NUM_ARG_TYPES" : {
+"message" : [
+  "The expression requires  argument types but the actual 
number is ."
+]
+  },
   "WRONG_NUM_ENDPOINTS" : {
 "message" : [
   "The number of endpoints must be >= 2 to construct intervals but the 
actual number is ."
diff --git 
a/connector/connect/common/src/test/resources/query-tests/explain-results/function_url_decode.explain
 
b/connector/connect/common/src/test/resources/query-tests/explain-results/function_url_decode.explain
index 36b21e27c10..d612190396d 100644
--- 
a/connector/connect/common/src/test/resources/query-tests/explain-results/function_url_decode.explain
+++ 
b/connector/connect/common/src/test/resources/query-tests/explain-results/function_url_decode.explain
@@ -1,2 +1,2 @@
-Project [staticinvoke(class 
org.apache.spark.sql.catalyst.expressions.UrlCodec$, StringType, decode, g#0, 
UTF-8, StringType, true, true, true) AS url_decode(g)#0]
+Project [staticinvoke(class 
org.apache.spark.sql.catalyst.expressions.UrlCodec$, StringType, decode, g#0, 
UTF-8, StringType, StringType, true, true, true) AS url_decode(g)#0]
 +- LocalRelation , [id#0L, a#0, b#0, d#0, e#0, f#0, g#0]
diff --git 
a/connector/connect/common/src/test/resources/query-tests/explain-results/function_url_encode.explain
 
b/connector/connect/common/src/test/resources/query-tests/explain-results/function_url_encode.explain
index 70a0f628fc9..bd2c63e19c6 100644
--- 
a/connector/connect/common/src/test/resources/query-tests/explain-results/function_url_encode.explain
+++ 
b/connector/connect/common/src/test/resources/query-tests/explain-results/function_url_encode.explain
@@ -1,2 +1,2 @@
-Project [staticinvoke(class 
org.apache.spark.sql.catalyst.expressions.UrlCodec$, StringType, encode, g#0, 
UTF-8, StringType, true, true, true) AS url_encode(g)#0]
+Project [staticinvoke(class 
org.apache.spark.sql.catalyst.expressions.UrlCodec$, StringType, encode, g#0, 
UTF-8, StringType, StringType, true, true, true) AS url_encode(g)#0]
 +- LocalRelation , [id#0L, a#0, b#0, d#0, e#0, f#0, g#0]
diff --git a/docs/sql-error-conditions-datatype-mismatch-error-class.md 
b/docs/sql-error-conditions-datatype-mismatch-error-class.md
index 3bd63925323..ddc3e0c2b1b 100644
--- a/docs/sql-error-conditions-datatype-mismatch-error-class.md
+++

[GitHub] [spark-website] wangyum closed pull request #448: Updated downloads.js

2023-07-13 Thread via GitHub



wangyum closed pull request #448: Updated downloads.js
URL: https://github.com/apache/spark-website/pull/448


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark-website] branch asf-site updated: Fix the link of PR-41535 in Spark 3.4.1 release note (#466)

2023-07-13 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/spark-website.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new 4be5a1ea33 Fix the link of PR-41535 in Spark 3.4.1 release note (#466)
4be5a1ea33 is described below

commit 4be5a1ea33dcaa05aa78dc4b0594493c7b496642
Author: Junbo wang <1042815...@qq.com>
AuthorDate: Thu Jul 13 16:04:51 2023 +0800

Fix the link of PR-41535 in Spark 3.4.1 release note (#466)

* Fix the link of PR-41535 in Spark 3.4.1 release note

* build the html website
---
 releases/_posts/2023-06-23-spark-release-3-4-1.md | 2 +-
 site/releases/spark-release-3-4-1.html| 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/releases/_posts/2023-06-23-spark-release-3-4-1.md 
b/releases/_posts/2023-06-23-spark-release-3-4-1.md
index 47e231ce8c..6c19f23fcf 100644
--- a/releases/_posts/2023-06-23-spark-release-3-4-1.md
+++ b/releases/_posts/2023-06-23-spark-release-3-4-1.md
@@ -15,7 +15,7 @@ Spark 3.4.1 is a maintenance release containing stability 
fixes. This release is
 
 ### Notable changes
 
-  - [[SPARK-32559]](https://issues.apache.org/jira/browse/SPARK-32559): Fix 
the trim logic did't handle ASCII control characters correctly
+  - [[SPARK-44383]](https://issues.apache.org/jira/browse/SPARK-44383): Fix 
the trim logic did't handle ASCII control characters correctly
   - [[SPARK-37829]](https://issues.apache.org/jira/browse/SPARK-37829): 
Dataframe.joinWith outer-join should return a null value for unmatched row
   - [[SPARK-42078]](https://issues.apache.org/jira/browse/SPARK-42078): Add 
`CapturedException` to utils
   - [[SPARK-42290]](https://issues.apache.org/jira/browse/SPARK-42290): Fix 
the OOM error can't be reported when AQE on
diff --git a/site/releases/spark-release-3-4-1.html 
b/site/releases/spark-release-3-4-1.html
index de2cef6b56..85e1cc7caa 100644
--- a/site/releases/spark-release-3-4-1.html
+++ b/site/releases/spark-release-3-4-1.html
@@ -130,7 +130,7 @@
 Notable changes
 
 
-  https://issues.apache.org/jira/browse/SPARK-32559;>[SPARK-32559]: Fix 
the trim logic didt handle ASCII control characters correctly
+  https://issues.apache.org/jira/browse/SPARK-44383;>[SPARK-44383]: Fix 
the trim logic didt handle ASCII control characters correctly
   https://issues.apache.org/jira/browse/SPARK-37829;>[SPARK-37829]: 
Dataframe.joinWith outer-join should return a null value for unmatched row
   https://issues.apache.org/jira/browse/SPARK-42078;>[SPARK-42078]: Add 
CapturedException to 
utils
   https://issues.apache.org/jira/browse/SPARK-42290;>[SPARK-42290]: Fix 
the OOM error cant be reported when AQE on


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] yaooqinn merged pull request #466: Fix the link of PR-41535 in Spark 3.4.1 release note

2023-07-13 Thread via GitHub



yaooqinn merged PR #466:
URL: https://github.com/apache/spark-website/pull/466


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (27a3dccfa04 -> fbb85ac1bbb)

2023-07-13 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 27a3dccfa04 [SPARK-44390][CORE][SQL] Rename `SparkSerDerseUtils` to 
`SparkSerDeUtils`
 add fbb85ac1bbb [SPARK-43321][CONNECT][FOLLOWUP] Better names for APIs 
used in Scala Client joinWith

No new revisions were added by this update.

Summary of changes:
 .../main/scala/org/apache/spark/sql/Dataset.scala  |   4 +-
 .../main/protobuf/spark/connect/relations.proto|   8 +-
 .../sql/connect/planner/SparkConnectPlanner.scala  |   4 +-
 python/pyspark/sql/connect/proto/relations_pb2.py  | 212 ++---
 python/pyspark/sql/connect/proto/relations_pb2.pyi |  21 +-
 .../sql/catalyst/encoders/AgnosticEncoder.scala|  15 +-
 6 files changed, 130 insertions(+), 134 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] Kwafoor commented on pull request #466: Fix the link of PR-41535 in Spark 3.4.1 release note

2023-07-13 Thread via GitHub



Kwafoor commented on PR #466:
URL: https://github.com/apache/spark-website/pull/466#issuecomment-1633609429

   > Please check the README.md to build and update the releated html file too
   
   fixed


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.4 updated: [SPARK-44391][SQL][3.4] Check the number of argument types in `InvokeLike`

[spark] branch dependabot/pip/dev/grpcio-1.53.0 deleted (was 40975b59142)

[spark] branch master updated: [SPARK-44222][BUILD][PYTHON] Upgrade grpc to 1.56.0 with lower/upperbound

[spark] branch dependabot/pip/dev/grpcio-1.53.0 created (now 40975b59142)

[spark] branch master updated: Revert "[SPARK-44222][BUILD][PYTHON] Upgrade `grpc` to 1.56.0"

[spark] branch master updated: [SPARK-43995][SPARK-43996][CONNECT] Add support for UDFRegistration to the Connect Scala Client

[spark] branch branch-3.4 updated: [SPARK-44180][SQL] DistributionAndOrderingUtils should apply ResolveTimeZone

[spark] branch master updated: [SPARK-44180][SQL] DistributionAndOrderingUtils should apply ResolveTimeZone

[spark] branch master updated (bac7050cf0a -> fd518dac8d9)

[spark] branch master updated: Revert "[SPARK-41811][PYTHON][CONNECT] Implement SparkSession.sql's string formatter"

[spark] branch master updated: [MINOR] Removing redundant parentheses from SQL function docs

[spark] branch master updated: [SPARK-44364] [PYTHON] Add support for List[Row] data type for expected

[spark] branch master updated: [SPARK-43389][SQL] Added a null check for lineSep option

[spark] branch master updated: [SPARK-44395][SQL] Update TVF arguments to require parentheses around identifier after TABLE keyword

[spark] branch master updated: [SPARK-44279][BUILD] Upgrade `optionator` to ^0.9.3

[spark] branch master updated: [SPARK-44398][CONNECT] Scala foreachBatch API

[spark] branch master updated: [SPARK-44338][SQL] Fix view schema mismatch error message

[spark] branch master updated: [SPARK-44402][SQL][TESTS] Add tests for schema pruning in delta-based DELETEs

[spark] branch master updated: [SPARK-42309][SQL] Introduce `INCOMPATIBLE_DATA_TO_TABLE` and sub classes

[spark] branch master updated: [SPARK-44384][SQL][TESTS] Use checkError() to check Exception in ViewSuite, NamespaceSuite, DataSourceSuite

[spark] branch master updated (9aa42a970c4 -> 8bb07388ea6)

[spark] branch master updated: [SPARK-41811][PYTHON][CONNECT] Implement SparkSession.sql's string formatter

[spark] branch master updated: [SPARK-44388][CONNECT] Fix protobuf cast issue when UDF instance is updated

[spark] branch master updated: [SPARK-44391][SQL] Check the number of argument types in `InvokeLike`

[GitHub] [spark-website] wangyum closed pull request #448: Updated downloads.js

[spark-website] branch asf-site updated: Fix the link of PR-41535 in Spark 3.4.1 release note (#466)

[GitHub] [spark-website] yaooqinn merged pull request #466: Fix the link of PR-41535 in Spark 3.4.1 release note

[spark] branch master updated (27a3dccfa04 -> fbb85ac1bbb)

[GitHub] [spark-website] Kwafoor commented on pull request #466: Fix the link of PR-41535 in Spark 3.4.1 release note

29 matches

Site Navigation

Mail list logo

Footer information