[spark] branch branch-3.4 updated: [SPARK-44391][SQL][3.4] Check the number of argument types in `InvokeLike`
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.4 by this push: new 53383fcd2be [SPARK-44391][SQL][3.4] Check the number of argument types in `InvokeLike` 53383fcd2be is described below commit 53383fcd2be178f4f0d231334ee36f1c3d67f64d Author: Max Gekk AuthorDate: Fri Jul 14 08:37:29 2023 +0300 [SPARK-44391][SQL][3.4] Check the number of argument types in `InvokeLike` ### What changes were proposed in this pull request? In the PR, I propose to check the number of argument types in the `InvokeLike` expressions. If the input types are provided, the number of types should be exactly the same as the number of argument expressions. This is a backport of https://github.com/apache/spark/pull/41954. ### Why are the changes needed? 1. This PR checks the contract described in the comment explicitly: https://github.com/apache/spark/blob/d9248e83bbb3af49333608bebe7149b1aaeca738/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala#L247 that can prevent the errors of expression implementations, and improve code maintainability. 2. Also it fixes the issue in the `UrlEncode` and `UrlDecode`. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? By running the related tests: ``` $ build/sbt "test:testOnly *UrlFunctionsSuite" $ build/sbt "test:testOnly *DataSourceV2FunctionSuite" ``` Authored-by: Max Gekk (cherry picked from commit 3e82ac6ea3d9f87c8ac09e481235beefaa1bf758) Closes #41985 from MaxGekk/fix-url_decode-3.4. Authored-by: Max Gekk Signed-off-by: Max Gekk --- core/src/main/resources/error/error-classes.json | 5 + .../sql-error-conditions-datatype-mismatch-error-class.md | 4 .../spark/sql/catalyst/analysis/CheckAnalysis.scala | 5 +++-- .../spark/sql/catalyst/expressions/objects/objects.scala | 15 +++ .../spark/sql/catalyst/expressions/urlExpressions.scala | 4 ++-- 5 files changed, 29 insertions(+), 4 deletions(-) diff --git a/core/src/main/resources/error/error-classes.json b/core/src/main/resources/error/error-classes.json index febed9283d8..90dec2ee45e 100644 --- a/core/src/main/resources/error/error-classes.json +++ b/core/src/main/resources/error/error-classes.json @@ -468,6 +468,11 @@ "The must be between (current value = )." ] }, + "WRONG_NUM_ARG_TYPES" : { +"message" : [ + "The expression requires argument types but the actual number is ." +] + }, "WRONG_NUM_ENDPOINTS" : { "message" : [ "The number of endpoints must be >= 2 to construct intervals but the actual number is ." diff --git a/docs/sql-error-conditions-datatype-mismatch-error-class.md b/docs/sql-error-conditions-datatype-mismatch-error-class.md index 6ccd63e6ee9..2178deca4f2 100644 --- a/docs/sql-error-conditions-datatype-mismatch-error-class.md +++ b/docs/sql-error-conditions-datatype-mismatch-error-class.md @@ -231,6 +231,10 @@ The input of `` can't be `` type data. The `` must be between `` (current value = ``). +## WRONG_NUM_ARG_TYPES + +The expression requires `` argument types but the actual number is ``. + ## WRONG_NUM_ENDPOINTS The number of endpoints must be >= 2 to construct intervals but the actual number is ``. diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala index 223fdf12d6d..e717483ec94 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala @@ -288,8 +288,9 @@ trait CheckAnalysis extends PredicateHelper with LookupCatalog with QueryErrorsB "srcType" -> c.child.dataType.catalogString, "targetType" -> c.dataType.catalogString)) case e: RuntimeReplaceable if !e.replacement.resolved => -throw new IllegalStateException("Illegal RuntimeReplaceable: " + e + - "\nReplacement is unresolved: " + e.replacement) +throw SparkException.internalError( + s"Cannot resolve the runtime replaceable expression ${toSQLExpr(e)}. " + + s"The replacement is unresolved: ${toSQLExpr(e.replacement)}.") case g: Grouping => g.failAnalysis(errorClass = "_LEGACY_ERROR_TEMP_2445", messageParameters = Map.empty) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala
[spark] branch dependabot/pip/dev/grpcio-1.53.0 deleted (was 40975b59142)
This is an automated email from the ASF dual-hosted git repository. github-bot pushed a change to branch dependabot/pip/dev/grpcio-1.53.0 in repository https://gitbox.apache.org/repos/asf/spark.git was 40975b59142 Bump grpcio from 1.48.1 to 1.53.0 in /dev The revisions that were on this branch are still contained in other references; therefore, this change does not discard any commits from the repository. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-44222][BUILD][PYTHON] Upgrade grpc to 1.56.0 with lower/upperbound
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new de59caa83af [SPARK-44222][BUILD][PYTHON] Upgrade grpc to 1.56.0 with lower/upperbound de59caa83af is described below commit de59caa83af1e6b2febb24597d06c8ff8505d888 Author: Hyukjin Kwon AuthorDate: Fri Jul 14 14:06:48 2023 +0900 [SPARK-44222][BUILD][PYTHON] Upgrade grpc to 1.56.0 with lower/upperbound ### What changes were proposed in this pull request? This PR revert the revert of https://github.com/apache/spark/pull/41767 with setting grpc lowerbounds. ### Why are the changes needed? See https://github.com/apache/spark/pull/41767 ### Does this PR introduce _any_ user-facing change? See https://github.com/apache/spark/pull/41767 ### How was this patch tested? Manually tested with Conda environment, with `pip install -r dev/requirements.txt` in Python 3.9, Python 3.10 and Python 3.11. Closes #41997 from HyukjinKwon/SPARK-44222. Authored-by: Hyukjin Kwon Signed-off-by: Hyukjin Kwon --- .github/workflows/build_and_test.yml | 4 ++-- connector/connect/common/src/main/buf.gen.yaml | 4 ++-- dev/create-release/spark-rm/Dockerfile | 2 +- dev/requirements.txt | 4 ++-- pom.xml| 2 +- project/SparkBuild.scala | 2 +- python/docs/source/getting_started/install.rst | 16 python/setup.py| 2 +- 8 files changed, 18 insertions(+), 18 deletions(-) diff --git a/.github/workflows/build_and_test.yml b/.github/workflows/build_and_test.yml index 4370e622cf4..0b184c6c248 100644 --- a/.github/workflows/build_and_test.yml +++ b/.github/workflows/build_and_test.yml @@ -256,7 +256,7 @@ jobs: - name: Install Python packages (Python 3.8) if: (contains(matrix.modules, 'sql') && !contains(matrix.modules, 'sql-')) run: | -python3.8 -m pip install 'numpy>=1.20.0' pyarrow pandas scipy unittest-xml-reporting 'grpcio==1.48.1' 'protobuf==3.19.5' +python3.8 -m pip install 'numpy>=1.20.0' pyarrow pandas scipy unittest-xml-reporting 'grpcio==1.56.0' 'protobuf==3.19.5' python3.8 -m pip list # Run the tests. - name: Run tests @@ -625,7 +625,7 @@ jobs: # Jinja2 3.0.0+ causes error when building with Sphinx. # See also https://issues.apache.org/jira/browse/SPARK-35375. python3.9 -m pip install 'flake8==3.9.0' pydata_sphinx_theme 'mypy==0.982' 'pytest==7.1.3' 'pytest-mypy-plugins==1.9.3' numpydoc 'jinja2<3.0.0' 'black==22.6.0' -python3.9 -m pip install 'pandas-stubs==1.2.0.53' ipython 'grpcio==1.48.1' 'grpc-stubs==1.24.11' 'googleapis-common-protos-stubs==2.2.0' +python3.9 -m pip install 'pandas-stubs==1.2.0.53' ipython 'grpcio==1.56.0' 'grpc-stubs==1.24.11' 'googleapis-common-protos-stubs==2.2.0' - name: Python linter run: PYTHON_EXECUTABLE=python3.9 ./dev/lint-python - name: Install dependencies for Python code generation check diff --git a/connector/connect/common/src/main/buf.gen.yaml b/connector/connect/common/src/main/buf.gen.yaml index c816f12398d..07edaa567b1 100644 --- a/connector/connect/common/src/main/buf.gen.yaml +++ b/connector/connect/common/src/main/buf.gen.yaml @@ -22,14 +22,14 @@ plugins: out: gen/proto/csharp - plugin: buf.build/protocolbuffers/java:v21.7 out: gen/proto/java - - remote: buf.build/grpc/plugins/ruby:v1.47.0-1 + - plugin: buf.build/grpc/ruby:v1.56.0 out: gen/proto/ruby - plugin: buf.build/protocolbuffers/ruby:v21.7 out: gen/proto/ruby # Building the Python build and building the mypy interfaces. - plugin: buf.build/protocolbuffers/python:v21.7 out: gen/proto/python - - remote: buf.build/grpc/plugins/python:v1.47.0-1 + - plugin: buf.build/grpc/python:v1.56.0 out: gen/proto/python - name: mypy out: gen/proto/python diff --git a/dev/create-release/spark-rm/Dockerfile b/dev/create-release/spark-rm/Dockerfile index 8f198a420bc..def8626d3be 100644 --- a/dev/create-release/spark-rm/Dockerfile +++ b/dev/create-release/spark-rm/Dockerfile @@ -42,7 +42,7 @@ ARG APT_INSTALL="apt-get install --no-install-recommends -y" # We should use the latest Sphinx version once this is fixed. # TODO(SPARK-35375): Jinja2 3.0.0+ causes error when building with Sphinx. # See also https://issues.apache.org/jira/browse/SPARK-35375. -ARG PIP_PKGS="sphinx==3.0.4 mkdocs==1.1.2 numpy==1.20.3 pydata_sphinx_theme==0.4.1 ipython==7.19.0 nbsphinx==0.8.0 numpydoc==1.1.0 jinja2==2.11.3 twine==3.4.1 sphinx-plotly-directive==0.1.3 pandas==1.5.3 pyarrow==3.0.0 plotly==5.4.0 markupsafe==2.0.1 docutils<0.17 grpcio==1.48.1 protobuf==4.21.6
[spark] branch dependabot/pip/dev/grpcio-1.53.0 created (now 40975b59142)
This is an automated email from the ASF dual-hosted git repository. github-bot pushed a change to branch dependabot/pip/dev/grpcio-1.53.0 in repository https://gitbox.apache.org/repos/asf/spark.git at 40975b59142 Bump grpcio from 1.48.1 to 1.53.0 in /dev No new revisions were added by this update. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: Revert "[SPARK-44222][BUILD][PYTHON] Upgrade `grpc` to 1.56.0"
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new a47bde28d9a Revert "[SPARK-44222][BUILD][PYTHON] Upgrade `grpc` to 1.56.0" a47bde28d9a is described below commit a47bde28d9a2f6d77fba66350b81f1b4ce00c6cc Author: Hyukjin Kwon AuthorDate: Fri Jul 14 13:26:59 2023 +0900 Revert "[SPARK-44222][BUILD][PYTHON] Upgrade `grpc` to 1.56.0" This reverts commit f26bdb7bfde56e4856a2f962162f204ba2dbb1c1. --- .github/workflows/build_and_test.yml | 4 ++-- connector/connect/common/src/main/buf.gen.yaml | 4 ++-- dev/create-release/spark-rm/Dockerfile | 2 +- dev/requirements.txt | 4 ++-- pom.xml| 2 +- project/SparkBuild.scala | 2 +- python/docs/source/getting_started/install.rst | 4 ++-- python/setup.py| 2 +- 8 files changed, 12 insertions(+), 12 deletions(-) diff --git a/.github/workflows/build_and_test.yml b/.github/workflows/build_and_test.yml index 0b184c6c248..4370e622cf4 100644 --- a/.github/workflows/build_and_test.yml +++ b/.github/workflows/build_and_test.yml @@ -256,7 +256,7 @@ jobs: - name: Install Python packages (Python 3.8) if: (contains(matrix.modules, 'sql') && !contains(matrix.modules, 'sql-')) run: | -python3.8 -m pip install 'numpy>=1.20.0' pyarrow pandas scipy unittest-xml-reporting 'grpcio==1.56.0' 'protobuf==3.19.5' +python3.8 -m pip install 'numpy>=1.20.0' pyarrow pandas scipy unittest-xml-reporting 'grpcio==1.48.1' 'protobuf==3.19.5' python3.8 -m pip list # Run the tests. - name: Run tests @@ -625,7 +625,7 @@ jobs: # Jinja2 3.0.0+ causes error when building with Sphinx. # See also https://issues.apache.org/jira/browse/SPARK-35375. python3.9 -m pip install 'flake8==3.9.0' pydata_sphinx_theme 'mypy==0.982' 'pytest==7.1.3' 'pytest-mypy-plugins==1.9.3' numpydoc 'jinja2<3.0.0' 'black==22.6.0' -python3.9 -m pip install 'pandas-stubs==1.2.0.53' ipython 'grpcio==1.56.0' 'grpc-stubs==1.24.11' 'googleapis-common-protos-stubs==2.2.0' +python3.9 -m pip install 'pandas-stubs==1.2.0.53' ipython 'grpcio==1.48.1' 'grpc-stubs==1.24.11' 'googleapis-common-protos-stubs==2.2.0' - name: Python linter run: PYTHON_EXECUTABLE=python3.9 ./dev/lint-python - name: Install dependencies for Python code generation check diff --git a/connector/connect/common/src/main/buf.gen.yaml b/connector/connect/common/src/main/buf.gen.yaml index 07edaa567b1..c816f12398d 100644 --- a/connector/connect/common/src/main/buf.gen.yaml +++ b/connector/connect/common/src/main/buf.gen.yaml @@ -22,14 +22,14 @@ plugins: out: gen/proto/csharp - plugin: buf.build/protocolbuffers/java:v21.7 out: gen/proto/java - - plugin: buf.build/grpc/ruby:v1.56.0 + - remote: buf.build/grpc/plugins/ruby:v1.47.0-1 out: gen/proto/ruby - plugin: buf.build/protocolbuffers/ruby:v21.7 out: gen/proto/ruby # Building the Python build and building the mypy interfaces. - plugin: buf.build/protocolbuffers/python:v21.7 out: gen/proto/python - - plugin: buf.build/grpc/python:v1.56.0 + - remote: buf.build/grpc/plugins/python:v1.47.0-1 out: gen/proto/python - name: mypy out: gen/proto/python diff --git a/dev/create-release/spark-rm/Dockerfile b/dev/create-release/spark-rm/Dockerfile index def8626d3be..8f198a420bc 100644 --- a/dev/create-release/spark-rm/Dockerfile +++ b/dev/create-release/spark-rm/Dockerfile @@ -42,7 +42,7 @@ ARG APT_INSTALL="apt-get install --no-install-recommends -y" # We should use the latest Sphinx version once this is fixed. # TODO(SPARK-35375): Jinja2 3.0.0+ causes error when building with Sphinx. # See also https://issues.apache.org/jira/browse/SPARK-35375. -ARG PIP_PKGS="sphinx==3.0.4 mkdocs==1.1.2 numpy==1.20.3 pydata_sphinx_theme==0.4.1 ipython==7.19.0 nbsphinx==0.8.0 numpydoc==1.1.0 jinja2==2.11.3 twine==3.4.1 sphinx-plotly-directive==0.1.3 pandas==1.5.3 pyarrow==3.0.0 plotly==5.4.0 markupsafe==2.0.1 docutils<0.17 grpcio==1.56.0 protobuf==4.21.6 grpcio-status==1.56.0 googleapis-common-protos==1.56.4" +ARG PIP_PKGS="sphinx==3.0.4 mkdocs==1.1.2 numpy==1.20.3 pydata_sphinx_theme==0.4.1 ipython==7.19.0 nbsphinx==0.8.0 numpydoc==1.1.0 jinja2==2.11.3 twine==3.4.1 sphinx-plotly-directive==0.1.3 pandas==1.5.3 pyarrow==3.0.0 plotly==5.4.0 markupsafe==2.0.1 docutils<0.17 grpcio==1.48.1 protobuf==4.21.6 grpcio-status==1.48.1 googleapis-common-protos==1.56.4" ARG GEM_PKGS="bundler:2.3.8" # Install extra needed repos and refresh. diff --git a/dev/requirements.txt b/dev/requirements.txt index eb482f3f8a2..72da5dbe163 100644 --- a/dev/requirements.txt +++ b/dev/requirements.txt @@ -50,8 +50,8 @@ black==22.6.0 py
[spark] branch master updated: [SPARK-43995][SPARK-43996][CONNECT] Add support for UDFRegistration to the Connect Scala Client
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 7ecdad5c59c [SPARK-43995][SPARK-43996][CONNECT] Add support for UDFRegistration to the Connect Scala Client 7ecdad5c59c is described below commit 7ecdad5c59ce2eecd4686effeb10819a6d784844 Author: vicennial AuthorDate: Fri Jul 14 10:52:12 2023 +0900 [SPARK-43995][SPARK-43996][CONNECT] Add support for UDFRegistration to the Connect Scala Client ### What changes were proposed in this pull request? This PR adds support to register a scala UDF from the scala/jvm client. The following APIs are implemented in `UDFRegistration`: - `def register(name: String, udf: UserDefinedFunction): UserDefinedFunction` - `def register[RT: TypeTag, A1: TypeTag ...](name: String, func: (A1, ...) => RT): UserDefinedFunction` for 0 to 22 arguments. The following API is implemented in `functions`: - `def call_udf(udfName: String, cols: Column*): Column` Note: This PR is stacked on https://github.com/apache/spark/pull/41959. ### Why are the changes needed? To reach parity with classic Spark. ### Does this PR introduce _any_ user-facing change? Yes. spark.udf.register() is added as shown below: ```scala class A(x: Int) { def get = x * 100 } val myUdf = udf((x: Int) => new A(x).get) spark.udf.register("dummyUdf", myUdf) spark.sql("select dummyUdf(id) from range(5)").as[Long].collect() ``` The output: ```scala Array[Long] = Array(0L, 100L, 200L, 300L, 400L) ### How was this patch tested? New tests in `ReplE2ESuite`. Closes #41953 from vicennial/SPARK-43995. Authored-by: vicennial Signed-off-by: Hyukjin Kwon --- .../scala/org/apache/spark/sql/SparkSession.scala | 31 + .../org/apache/spark/sql/UDFRegistration.scala | 1028 .../sql/expressions/UserDefinedFunction.scala | 10 + .../scala/org/apache/spark/sql/functions.scala | 17 + .../spark/sql/application/ReplE2ESuite.scala | 31 + .../CheckConnectJvmClientCompatibility.scala |1 - .../sql/connect/planner/SparkConnectPlanner.scala | 23 +- 7 files changed, 1139 insertions(+), 2 deletions(-) diff --git a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala index c27f0f32e0d..fb9959c9942 100644 --- a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala +++ b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala @@ -417,6 +417,30 @@ class SparkSession private[sql] ( range(start, end, step, Option(numPartitions)) } + /** + * A collection of methods for registering user-defined functions (UDF). + * + * The following example registers a Scala closure as UDF: + * {{{ + * sparkSession.udf.register("myUDF", (arg1: Int, arg2: String) => arg2 + arg1) + * }}} + * + * The following example registers a UDF in Java: + * {{{ + * sparkSession.udf().register("myUDF", + * (Integer arg1, String arg2) -> arg2 + arg1, + * DataTypes.StringType); + * }}} + * + * @note + * The user-defined functions must be deterministic. Due to optimization, duplicate + * invocations may be eliminated or the function may even be invoked more times than it is + * present in the query. + * + * @since 3.5.0 + */ + lazy val udf: UDFRegistration = new UDFRegistration(this) + // scalastyle:off // Disable style checker so "implicits" object can start with lowercase i /** @@ -525,6 +549,13 @@ class SparkSession private[sql] ( client.execute(plan).asScala.toSeq } + private[sql] def registerUdf(udf: proto.CommonInlineUserDefinedFunction): Unit = { +val command = proto.Command.newBuilder().setRegisterFunction(udf).build() +val plan = proto.Plan.newBuilder().setCommand(command).build() + +client.execute(plan) + } + @DeveloperApi def execute(extension: com.google.protobuf.Any): Unit = { val command = proto.Command.newBuilder().setExtension(extension).build() diff --git a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/UDFRegistration.scala b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/UDFRegistration.scala new file mode 100644 index 000..426709b8f18 --- /dev/null +++ b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/UDFRegistration.scala @@ -0,0 +1,1028 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright
[spark] branch branch-3.4 updated: [SPARK-44180][SQL] DistributionAndOrderingUtils should apply ResolveTimeZone
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.4 by this push: new 679f33e7ed2 [SPARK-44180][SQL] DistributionAndOrderingUtils should apply ResolveTimeZone 679f33e7ed2 is described below commit 679f33e7ed24c24aad1687d2d906232a9c1c7604 Author: Cheng Pan AuthorDate: Fri Jul 14 09:25:07 2023 +0800 [SPARK-44180][SQL] DistributionAndOrderingUtils should apply ResolveTimeZone ### What changes were proposed in this pull request? Apply `ResolveTimeZone` for the plan generated by `DistributionAndOrderingUtils#prepareQuery`. ### Why are the changes needed? In SPARK-39607, we only applied `typeCoercionRules` for the plan generated by `DistributionAndOrderingUtils#prepareQuery`, this is not enough, the following exception will be thrown if `TimeZoneAwareExpression` participates in the implicit cast. ``` 23/06/25 07:30:58 WARN UnsafeProjection: Expr codegen error and falling back to interpreter mode java.util.NoSuchElementException: None.get at scala.None$.get(Option.scala:529) at scala.None$.get(Option.scala:527) at org.apache.spark.sql.catalyst.expressions.TimeZoneAwareExpression.zoneId(datetimeExpressions.scala:63) at org.apache.spark.sql.catalyst.expressions.TimeZoneAwareExpression.zoneId$(datetimeExpressions.scala:63) at org.apache.spark.sql.catalyst.expressions.Cast.zoneId$lzycompute(Cast.scala:491) at org.apache.spark.sql.catalyst.expressions.Cast.zoneId(Cast.scala:491) at org.apache.spark.sql.catalyst.expressions.Cast.castToDateCode(Cast.scala:1655) at org.apache.spark.sql.catalyst.expressions.Cast.nullSafeCastFunction(Cast.scala:1335) at org.apache.spark.sql.catalyst.expressions.Cast.doGenCode(Cast.scala:1316) at org.apache.spark.sql.catalyst.expressions.Expression.$anonfun$genCode$3(Expression.scala:200) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:195) at org.apache.spark.sql.catalyst.expressions.Cast.genCode(Cast.scala:1310) at org.apache.spark.sql.catalyst.expressions.objects.InvokeLike.$anonfun$prepareArguments$3(objects.scala:124) at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at scala.collection.TraversableLike.map(TraversableLike.scala:286) at scala.collection.TraversableLike.map$(TraversableLike.scala:279) at scala.collection.AbstractTraversable.map(Traversable.scala:108) at org.apache.spark.sql.catalyst.expressions.objects.InvokeLike.prepareArguments(objects.scala:123) at org.apache.spark.sql.catalyst.expressions.objects.InvokeLike.prepareArguments$(objects.scala:91) at org.apache.spark.sql.catalyst.expressions.objects.Invoke.prepareArguments(objects.scala:363) at org.apache.spark.sql.catalyst.expressions.objects.Invoke.doGenCode(objects.scala:414) at org.apache.spark.sql.catalyst.expressions.Expression.$anonfun$genCode$3(Expression.scala:200) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:195) at org.apache.spark.sql.catalyst.expressions.objects.InvokeLike.$anonfun$prepareArguments$3(objects.scala:124) at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at scala.collection.TraversableLike.map(TraversableLike.scala:286) at scala.collection.TraversableLike.map$(TraversableLike.scala:279) at scala.collection.AbstractTraversable.map(Traversable.scala:108) at org.apache.spark.sql.catalyst.expressions.objects.InvokeLike.prepareArguments(objects.scala:123) at org.apache.spark.sql.catalyst.expressions.objects.InvokeLike.prepareArguments$(objects.scala:91) at org.apache.spark.sql.catalyst.expressions.objects.Invoke.prepareArguments(objects.scala:363) at org.apache.spark.sql.catalyst.expressions.objects.Invoke.doGenCode(objects.scala:414) at
[spark] branch master updated: [SPARK-44180][SQL] DistributionAndOrderingUtils should apply ResolveTimeZone
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new d1d97604fbe [SPARK-44180][SQL] DistributionAndOrderingUtils should apply ResolveTimeZone d1d97604fbe is described below commit d1d97604fbec2fccedbfa52b02eb1f3428b00d9f Author: Cheng Pan AuthorDate: Fri Jul 14 09:25:07 2023 +0800 [SPARK-44180][SQL] DistributionAndOrderingUtils should apply ResolveTimeZone ### What changes were proposed in this pull request? Apply `ResolveTimeZone` for the plan generated by `DistributionAndOrderingUtils#prepareQuery`. ### Why are the changes needed? In SPARK-39607, we only applied `typeCoercionRules` for the plan generated by `DistributionAndOrderingUtils#prepareQuery`, this is not enough, the following exception will be thrown if `TimeZoneAwareExpression` participates in the implicit cast. ``` 23/06/25 07:30:58 WARN UnsafeProjection: Expr codegen error and falling back to interpreter mode java.util.NoSuchElementException: None.get at scala.None$.get(Option.scala:529) at scala.None$.get(Option.scala:527) at org.apache.spark.sql.catalyst.expressions.TimeZoneAwareExpression.zoneId(datetimeExpressions.scala:63) at org.apache.spark.sql.catalyst.expressions.TimeZoneAwareExpression.zoneId$(datetimeExpressions.scala:63) at org.apache.spark.sql.catalyst.expressions.Cast.zoneId$lzycompute(Cast.scala:491) at org.apache.spark.sql.catalyst.expressions.Cast.zoneId(Cast.scala:491) at org.apache.spark.sql.catalyst.expressions.Cast.castToDateCode(Cast.scala:1655) at org.apache.spark.sql.catalyst.expressions.Cast.nullSafeCastFunction(Cast.scala:1335) at org.apache.spark.sql.catalyst.expressions.Cast.doGenCode(Cast.scala:1316) at org.apache.spark.sql.catalyst.expressions.Expression.$anonfun$genCode$3(Expression.scala:200) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:195) at org.apache.spark.sql.catalyst.expressions.Cast.genCode(Cast.scala:1310) at org.apache.spark.sql.catalyst.expressions.objects.InvokeLike.$anonfun$prepareArguments$3(objects.scala:124) at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at scala.collection.TraversableLike.map(TraversableLike.scala:286) at scala.collection.TraversableLike.map$(TraversableLike.scala:279) at scala.collection.AbstractTraversable.map(Traversable.scala:108) at org.apache.spark.sql.catalyst.expressions.objects.InvokeLike.prepareArguments(objects.scala:123) at org.apache.spark.sql.catalyst.expressions.objects.InvokeLike.prepareArguments$(objects.scala:91) at org.apache.spark.sql.catalyst.expressions.objects.Invoke.prepareArguments(objects.scala:363) at org.apache.spark.sql.catalyst.expressions.objects.Invoke.doGenCode(objects.scala:414) at org.apache.spark.sql.catalyst.expressions.Expression.$anonfun$genCode$3(Expression.scala:200) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:195) at org.apache.spark.sql.catalyst.expressions.objects.InvokeLike.$anonfun$prepareArguments$3(objects.scala:124) at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at scala.collection.TraversableLike.map(TraversableLike.scala:286) at scala.collection.TraversableLike.map$(TraversableLike.scala:279) at scala.collection.AbstractTraversable.map(Traversable.scala:108) at org.apache.spark.sql.catalyst.expressions.objects.InvokeLike.prepareArguments(objects.scala:123) at org.apache.spark.sql.catalyst.expressions.objects.InvokeLike.prepareArguments$(objects.scala:91) at org.apache.spark.sql.catalyst.expressions.objects.Invoke.prepareArguments(objects.scala:363) at org.apache.spark.sql.catalyst.expressions.objects.Invoke.doGenCode(objects.scala:414) at
[spark] branch master updated (bac7050cf0a -> fd518dac8d9)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from bac7050cf0a Revert "[SPARK-41811][PYTHON][CONNECT] Implement SparkSession.sql's string formatter" add fd518dac8d9 [SPARK-44201][CONNECT][SS] Add support for Streaming Listener in Scala for Spark Connect No new revisions were added by this update. Summary of changes: .../sql/streaming/StreamingQueryListener.scala | 132 - .../sql/streaming/StreamingQueryManager.scala | 77 .../ui/StreamingQueryStatusListener.scala | 61 +- .../apache/spark/sql/streaming/ui/UIUtils.scala| 0 .../CheckConnectJvmClientCompatibility.scala | 27 +++-- .../spark/sql/streaming/StreamingQuerySuite.scala | 90 +- .../src/main/protobuf/spark/connect/commands.proto | 21 ...rPacket.scala => StreamingListenerPacket.scala} | 20 ++-- .../sql/connect/planner/SparkConnectPlanner.scala | 46 ++- .../spark/sql/connect/service/SessionHolder.scala | 33 ++ python/pyspark/sql/connect/proto/commands_pb2.py | 42 --- python/pyspark/sql/connect/proto/commands_pb2.pyi | 121 ++- .../sql/streaming/StreamingQueryListener.scala | 13 +- .../ui/StreamingQueryStatusListener.scala | 4 +- 14 files changed, 545 insertions(+), 142 deletions(-) copy {sql/core => connector/connect/client/jvm}/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryListener.scala (53%) copy {sql/core => connector/connect/client/jvm}/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatusListener.scala (84%) copy {sql/core => connector/connect/client/jvm}/src/main/scala/org/apache/spark/sql/streaming/ui/UIUtils.scala (100%) copy connector/connect/common/src/main/scala/org/apache/spark/sql/connect/common/{ForeachWriterPacket.scala => StreamingListenerPacket.scala} (66%) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: Revert "[SPARK-41811][PYTHON][CONNECT] Implement SparkSession.sql's string formatter"
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new bac7050cf0a Revert "[SPARK-41811][PYTHON][CONNECT] Implement SparkSession.sql's string formatter" bac7050cf0a is described below commit bac7050cf0ad18608e921f46e40152d341d53fb8 Author: Hyukjin Kwon AuthorDate: Fri Jul 14 09:31:17 2023 +0900 Revert "[SPARK-41811][PYTHON][CONNECT] Implement SparkSession.sql's string formatter" This reverts commit 9aa42a970c4bd8e54603b1795a0f449bd556b11b. --- python/pyspark/pandas/sql_formatter.py | 7 +-- python/pyspark/sql/connect/session.py | 35 - python/pyspark/sql/connect/sql_formatter.py | 78 - python/pyspark/sql/sql_formatter.py | 5 +- python/pyspark/sql/utils.py | 8 --- 5 files changed, 16 insertions(+), 117 deletions(-) diff --git a/python/pyspark/pandas/sql_formatter.py b/python/pyspark/pandas/sql_formatter.py index 7501e19c038..8593703bd94 100644 --- a/python/pyspark/pandas/sql_formatter.py +++ b/python/pyspark/pandas/sql_formatter.py @@ -264,9 +264,10 @@ class PandasSQLStringFormatter(string.Formatter): val._to_spark().createOrReplaceTempView(df_name) return df_name elif isinstance(val, str): -from pyspark.sql.utils import get_lit_sql_str - -return get_lit_sql_str(val) +# This is matched to behavior from JVM implementation. +# See `sql` definition from `sql/catalyst/src/main/scala/org/apache/spark/ +# sql/catalyst/expressions/literals.scala` +return "'" + val.replace("\\", "").replace("'", "\\'") + "'" else: return val diff --git a/python/pyspark/sql/connect/session.py b/python/pyspark/sql/connect/session.py index 13868263174..ea88d60d760 100644 --- a/python/pyspark/sql/connect/session.py +++ b/python/pyspark/sql/connect/session.py @@ -489,31 +489,13 @@ class SparkSession: createDataFrame.__doc__ = PySparkSession.createDataFrame.__doc__ -def sql( -self, -sqlQuery: str, -args: Optional[Union[Dict[str, Any], List]] = None, -**kwargs: Any, -) -> "DataFrame": - -if len(kwargs) > 0: -from pyspark.sql.connect.sql_formatter import SQLStringFormatter - -formatter = SQLStringFormatter(self) -sqlQuery = formatter.format(sqlQuery, **kwargs) - -try: -cmd = SQL(sqlQuery, args) -data, properties = self.client.execute_command(cmd.command(self._client)) -if "sql_command_result" in properties: -return DataFrame.withPlan(CachedRelation(properties["sql_command_result"]), self) -else: -return DataFrame.withPlan(SQL(sqlQuery, args), self) -finally: -if len(kwargs) > 0: -# TODO: should drop temp views after SPARK-44406 get resolved -# formatter.clear() -pass +def sql(self, sqlQuery: str, args: Optional[Union[Dict[str, Any], List]] = None) -> "DataFrame": +cmd = SQL(sqlQuery, args) +data, properties = self.client.execute_command(cmd.command(self._client)) +if "sql_command_result" in properties: +return DataFrame.withPlan(CachedRelation(properties["sql_command_result"]), self) +else: +return DataFrame.withPlan(SQL(sqlQuery, args), self) sql.__doc__ = PySparkSession.sql.__doc__ @@ -826,6 +808,9 @@ def _test() -> None: # RDD API is not supported in Spark Connect. del pyspark.sql.connect.session.SparkSession.createDataFrame.__doc__ +# TODO(SPARK-41811): Implement SparkSession.sql's string formatter +del pyspark.sql.connect.session.SparkSession.sql.__doc__ + (failure_count, test_count) = doctest.testmod( pyspark.sql.connect.session, globs=globs, diff --git a/python/pyspark/sql/connect/sql_formatter.py b/python/pyspark/sql/connect/sql_formatter.py deleted file mode 100644 index ab90a1bb847..000 --- a/python/pyspark/sql/connect/sql_formatter.py +++ /dev/null @@ -1,78 +0,0 @@ -# -# Licensed to the Apache Software Foundation (ASF) under one or more -# contributor license agreements. See the NOTICE file distributed with -# this work for additional information regarding copyright ownership. -# The ASF licenses this file to You under the Apache License, Version 2.0 -# (the "License"); you may not use this file except in compliance with -# the License. You may obtain a copy of the License at -# -#http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
[spark] branch master updated: [MINOR] Removing redundant parentheses from SQL function docs
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new a6db459328d [MINOR] Removing redundant parentheses from SQL function docs a6db459328d is described below commit a6db459328d7aa75e43097ddf2286c146646810a Author: panbingkun AuthorDate: Fri Jul 14 08:40:21 2023 +0900 [MINOR] Removing redundant parentheses from SQL function docs ### What changes were proposed in this pull request? The pr aims to removing redundant parentheses from SQL function docs. ### Why are the changes needed? Make the document clearer, reduce misunderstandings. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - Manually test. Closes #41984 from panbingkun/minor_function_docs. Authored-by: panbingkun Signed-off-by: Hyukjin Kwon --- python/pyspark/sql/connect/functions.py | 2 +- python/pyspark/sql/functions.py | 4 ++-- .../spark/sql/catalyst/expressions/aggregate/MaxByAndMinBy.scala | 4 ++-- .../org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala | 2 +- sql/core/src/test/resources/sql-functions/sql-expression-schema.md| 4 ++-- 5 files changed, 8 insertions(+), 8 deletions(-) diff --git a/python/pyspark/sql/connect/functions.py b/python/pyspark/sql/connect/functions.py index c6445f110c0..1be759d9b6e 100644 --- a/python/pyspark/sql/connect/functions.py +++ b/python/pyspark/sql/connect/functions.py @@ -195,7 +195,7 @@ def _invoke_higher_order_function( :param name: Name of the expression :param cols: a list of columns -:param funs: a list of((*Column) -> Column functions. +:param funs: a list of (*Column) -> Column functions. :return: a Column """ diff --git a/python/pyspark/sql/functions.py b/python/pyspark/sql/functions.py index d7a2f529fa6..b2017627598 100644 --- a/python/pyspark/sql/functions.py +++ b/python/pyspark/sql/functions.py @@ -12770,7 +12770,7 @@ def arrays_zip(*cols: "ColumnOrName") -> Column: Examples >>> from pyspark.sql.functions import arrays_zip ->>> df = spark.createDataFrame([(([1, 2, 3], [2, 4, 6], [3, 6]))], ['vals1', 'vals2', 'vals3']) +>>> df = spark.createDataFrame([([1, 2, 3], [2, 4, 6], [3, 6])], ['vals1', 'vals2', 'vals3']) >>> df = df.select(arrays_zip(df.vals1, df.vals2, df.vals3).alias('zipped')) >>> df.show(truncate=False) ++ @@ -13041,7 +13041,7 @@ def _invoke_higher_order_function( :param name: Name of the expression :param cols: a list of columns -:param funs: a list of((*Column) -> Column functions. +:param funs: a list of (*Column) -> Column functions. :return: a Column """ diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/MaxByAndMinBy.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/MaxByAndMinBy.scala index 096a42686a3..56941c9de45 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/MaxByAndMinBy.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/MaxByAndMinBy.scala @@ -96,7 +96,7 @@ abstract class MaxMinBy extends DeclarativeAggregate with BinaryLike[Expression] usage = "_FUNC_(x, y) - Returns the value of `x` associated with the maximum value of `y`.", examples = """ Examples: - > SELECT _FUNC_(x, y) FROM VALUES (('a', 10)), (('b', 50)), (('c', 20)) AS tab(x, y); + > SELECT _FUNC_(x, y) FROM VALUES ('a', 10), ('b', 50), ('c', 20) AS tab(x, y); b """, group = "agg_funcs", @@ -119,7 +119,7 @@ case class MaxBy(valueExpr: Expression, orderingExpr: Expression) extends MaxMin usage = "_FUNC_(x, y) - Returns the value of `x` associated with the minimum value of `y`.", examples = """ Examples: - > SELECT _FUNC_(x, y) FROM VALUES (('a', 10)), (('b', 50)), (('c', 20)) AS tab(x, y); + > SELECT _FUNC_(x, y) FROM VALUES ('a', 10), ('b', 50), ('c', 20) AS tab(x, y); a """, group = "agg_funcs", diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala index 32c41cba4e1..0bbae04fb89 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala @@ -663,7 +663,7 @@ case class JsonToStructs( {"[1]":{"b":2}} > SELECT _FUNC_(map('a', 1)); {"a":1} - > SELECT _FUNC_(array((map('a', 1; + > SELECT
[spark] branch master updated: [SPARK-44364] [PYTHON] Add support for List[Row] data type for expected
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 61030de02f5 [SPARK-44364] [PYTHON] Add support for List[Row] data type for expected 61030de02f5 is described below commit 61030de02f5735919d6b3e3c2923831c5d2fcc61 Author: Amanda Liu AuthorDate: Fri Jul 14 08:27:41 2023 +0900 [SPARK-44364] [PYTHON] Add support for List[Row] data type for expected ### What changes were proposed in this pull request? This PR adds supported for List[Row] type for the `expected` argument in `assertDataFrameEqual`. ### Why are the changes needed? The change improves flexibility of the `assertDataFrameEqual` function by allowing for comparison between dataframe and List[Row] types. ### Does this PR introduce _any_ user-facing change? Yes, the PR modifies the user-facing API `assertDataFrameEqual`. ### How was this patch tested? Added tests to `runtime/python/pyspark/sql/tests/test_utils.py` and `runtime/python/pyspark/sql/tests/connect/test_utils.py` Closes #41924 from asl3/list-row-support. Authored-by: Amanda Liu Signed-off-by: Hyukjin Kwon --- python/pyspark/errors/error_classes.py | 5 - python/pyspark/sql/tests/test_utils.py | 266 ++--- python/pyspark/testing/utils.py| 48 +++--- 3 files changed, 209 insertions(+), 110 deletions(-) diff --git a/python/pyspark/errors/error_classes.py b/python/pyspark/errors/error_classes.py index 57447d56892..8c51024bf06 100644 --- a/python/pyspark/errors/error_classes.py +++ b/python/pyspark/errors/error_classes.py @@ -713,11 +713,6 @@ ERROR_CLASSES_JSON = """ " is only supported with pyarrow 2.0.0 and above." ] }, - "UNSUPPORTED_DATA_TYPE_FOR_IGNORE_ROW_ORDER" : { -"message" : [ - "Cannot ignore row order because undefined sorting for data type." -] - }, "UNSUPPORTED_JOIN_TYPE" : { "message" : [ "Unsupported join type: . Supported join types include: \\"inner\\", \\"outer\\", \\"full\\", \\"fullouter\\", \\"full_outer\\", \\"leftouter\\", \\"left\\", \\"left_outer\\", \\"rightouter\\", \\"right\\", \\"right_outer\\", \\"leftsemi\\", \\"left_semi\\", \\"semi\\", \\"leftanti\\", \\"left_anti\\", \\"anti\\", \\"cross\\"." diff --git a/python/pyspark/sql/tests/test_utils.py b/python/pyspark/sql/tests/test_utils.py index 1757d8dd2e1..ce8d83e6cb9 100644 --- a/python/pyspark/sql/tests/test_utils.py +++ b/python/pyspark/sql/tests/test_utils.py @@ -57,7 +57,8 @@ class UtilsTestsMixin: schema=["id", "amount"], ) -assertDataFrameEqual(df1, df2) +assertDataFrameEqual(df1, df2, checkRowOrder=False) +assertDataFrameEqual(df1, df2, checkRowOrder=True) def test_assert_equal_arraytype(self): df1 = self.spark.createDataFrame( @@ -85,7 +86,8 @@ class UtilsTestsMixin: ), ) -assertDataFrameEqual(df1, df2) +assertDataFrameEqual(df1, df2, checkRowOrder=False) +assertDataFrameEqual(df1, df2, checkRowOrder=True) def test_assert_approx_equal_arraytype_float(self): df1 = self.spark.createDataFrame( @@ -113,7 +115,8 @@ class UtilsTestsMixin: ), ) -assertDataFrameEqual(df1, df2) +assertDataFrameEqual(df1, df2, checkRowOrder=False) +assertDataFrameEqual(df1, df2, checkRowOrder=True) def test_assert_notequal_arraytype(self): df1 = self.spark.createDataFrame( @@ -167,6 +170,15 @@ class UtilsTestsMixin: message_parameters={"error_msg": expected_error_message}, ) +with self.assertRaises(PySparkAssertionError) as pe: +assertDataFrameEqual(df1, df2, checkRowOrder=True) + +self.check_error( +exception=pe.exception, +error_class="DIFFERENT_ROWS", +message_parameters={"error_msg": expected_error_message}, +) + def test_assert_equal_maptype(self): df1 = self.spark.createDataFrame( data=[ @@ -193,35 +205,8 @@ class UtilsTestsMixin: ), ) -assertDataFrameEqual(df1, df2, check_row_order=True) - -def test_assert_approx_equal_maptype_double(self): -df1 = self.spark.createDataFrame( -data=[ -("student1", {"math": 76.23, "english": 92.64}), -("student2", {"math": 87.89, "english": 84.48}), -], -schema=StructType( -[ -StructField("student", StringType(), True), -StructField("grades", MapType(StringType(), DoubleType()), True), -] -), -) -df2 = self.spark.createDataFrame( -data=[ -("student1", {"math": 76.23,
[spark] branch master updated: [SPARK-43389][SQL] Added a null check for lineSep option
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 9f07e4a747b [SPARK-43389][SQL] Added a null check for lineSep option 9f07e4a747b is described below commit 9f07e4a747b0e2a62b954db3c9be425c924da47a Author: Gurpreet Singh AuthorDate: Thu Jul 13 18:17:45 2023 -0500 [SPARK-43389][SQL] Added a null check for lineSep option ### What changes were proposed in this pull request? ### Why are the changes needed? - `spark.read.csv` throws `NullPointerException` when lineSep is set to None - More details about the issue here: https://issues.apache.org/jira/browse/SPARK-43389 ### Does this PR introduce _any_ user-facing change? ~~Users now should be able to explicitly set `lineSep` as `None` without getting an exception~~ After some discussion, it was decided to add a `require` check for `null` instead of letting it through. ### How was this patch tested? Tested the changes with a python script that explicitly sets `lineSep` to `None` ```python from pyspark.sql import SparkSession # Create a SparkSession spark = SparkSession.builder.appName("HelloWorld").getOrCreate() # Read CSV into a DataFrame df = spark.read.csv("/tmp/hello.csv", header=True, inferSchema=True, lineSep=None) # Also tested the following case when options are passed before invoking .csv #df = spark.read.option("lineSep", None).csv("/Users/gdhuper/Documents/tmp/hello.csv", header=True, inferSchema=True) # Show the DataFrame df.show() # Stop the SparkSession spark.stop() ``` Closes #41904 from gdhuper/gdhuper/SPARK-43389. Authored-by: Gurpreet Singh Signed-off-by: Sean Owen --- .../src/main/scala/org/apache/spark/sql/catalyst/csv/CSVOptions.scala| 1 + .../org/apache/spark/sql/execution/datasources/text/TextOptions.scala| 1 + 2 files changed, 2 insertions(+) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVOptions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVOptions.scala index 2b6b60fdf76..f4ad1f2f2e5 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVOptions.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVOptions.scala @@ -254,6 +254,7 @@ class CSVOptions( * A string between two consecutive JSON records. */ val lineSeparator: Option[String] = parameters.get(LINE_SEP).map { sep => +require(sep != null, "'lineSep' cannot be a null value.") require(sep.nonEmpty, "'lineSep' cannot be an empty string.") // Intentionally allow it up to 2 for Window's CRLF although multiple // characters have an issue with quotes. This is intentionally undocumented. diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/text/TextOptions.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/text/TextOptions.scala index f26f05cbe1c..468d58974ed 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/text/TextOptions.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/text/TextOptions.scala @@ -45,6 +45,7 @@ class TextOptions(@transient private val parameters: CaseInsensitiveMap[String]) val encoding: Option[String] = parameters.get(ENCODING) val lineSeparator: Option[String] = parameters.get(LINE_SEP).map { lineSep => +require(lineSep != null, s"'$LINE_SEP' cannot be a null value.") require(lineSep.nonEmpty, s"'$LINE_SEP' cannot be an empty string.") lineSep - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-44395][SQL] Update TVF arguments to require parentheses around identifier after TABLE keyword
This is an automated email from the ASF dual-hosted git repository. ueshin pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 63cef16ef77 [SPARK-44395][SQL] Update TVF arguments to require parentheses around identifier after TABLE keyword 63cef16ef77 is described below commit 63cef16ef779addaa841fa863a5797fc9f33fd82 Author: Daniel Tenedorio AuthorDate: Thu Jul 13 14:04:09 2023 -0700 [SPARK-44395][SQL] Update TVF arguments to require parentheses around identifier after TABLE keyword ### What changes were proposed in this pull request? This PR updates the parsing of table function arguments to require parentheses around identifier after the TABLE keyword. Instead of `TABLE t`, the syntax should look like `TABLE(t)` instead as specified in the SQL standard. * I kept the previous syntax without the parentheses as an optional case in the SQL grammar so that we can catch it in the `AstBuilder` and throw an informative error message telling the user to add parentheses and try the query again. * I had to swap the order of parsing table function arguments, so the `table(identifier)` syntax does not accidentally parse as a scalar function call: ``` functionTableArgument : functionTableReferenceArgument | functionArgument ; ``` ### Why are the changes needed? This syntax is written down in the SQL standard. Per the standard, `TABLE identifier` should actually be passed as `TABLE(identifier)`. ### Does this PR introduce _any_ user-facing change? Yes, SQL syntax changes slightly. ### How was this patch tested? This PR adds and updates unit test coverage. Closes #41965 from dtenedor/parentheses-table-clause. Authored-by: Daniel Tenedorio Signed-off-by: Takuya UESHIN --- .../src/main/resources/error/error-classes.json | 5 + ...ror-conditions-invalid-sql-syntax-error-class.md | 4 python/pyspark/sql/tests/test_udtf.py | 10 +- .../spark/sql/catalyst/parser/SqlBaseParser.g4 | 5 +++-- .../spark/sql/catalyst/parser/AstBuilder.scala | 6 ++ .../spark/sql/errors/QueryParsingErrors.scala | 11 +++ .../spark/sql/catalyst/parser/PlanParserSuite.scala | 21 + 7 files changed, 51 insertions(+), 11 deletions(-) diff --git a/common/utils/src/main/resources/error/error-classes.json b/common/utils/src/main/resources/error/error-classes.json index 6691e86b463..99a0a4ae4ba 100644 --- a/common/utils/src/main/resources/error/error-classes.json +++ b/common/utils/src/main/resources/error/error-classes.json @@ -1694,6 +1694,11 @@ "Expected a column reference for transform : ." ] }, + "INVALID_TABLE_FUNCTION_IDENTIFIER_ARGUMENT_MISSING_PARENTHESES" : { +"message" : [ + "Syntax error: call to table-valued function is invalid because parentheses are missing around the provided TABLE argument ; please surround this with parentheses and try again." +] + }, "INVALID_TABLE_VALUED_FUNC_NAME" : { "message" : [ "Table valued function cannot specify database name: ." diff --git a/docs/sql-error-conditions-invalid-sql-syntax-error-class.md b/docs/sql-error-conditions-invalid-sql-syntax-error-class.md index 6c9f588ba49..b1e298f7b90 100644 --- a/docs/sql-error-conditions-invalid-sql-syntax-error-class.md +++ b/docs/sql-error-conditions-invalid-sql-syntax-error-class.md @@ -49,6 +49,10 @@ Partition key `` must set value. Expected a column reference for transform ``: ``. +## INVALID_TABLE_FUNCTION_IDENTIFIER_ARGUMENT_MISSING_PARENTHESES + +Syntax error: call to table-valued function is invalid because parentheses are missing around the provided TABLE argument ``; please surround this with parentheses and try again. + ## INVALID_TABLE_VALUED_FUNC_NAME Table valued function cannot specify database name: ``. diff --git a/python/pyspark/sql/tests/test_udtf.py b/python/pyspark/sql/tests/test_udtf.py index e4db542cbb7..8f42b4123e4 100644 --- a/python/pyspark/sql/tests/test_udtf.py +++ b/python/pyspark/sql/tests/test_udtf.py @@ -545,7 +545,7 @@ class BaseUDTFTestsMixin: with self.tempView("v"): self.spark.sql("CREATE OR REPLACE TEMPORARY VIEW v as SELECT id FROM range(0, 8)") self.assertEqual( -self.spark.sql("SELECT * FROM test_udtf(TABLE v)").collect(), +self.spark.sql("SELECT * FROM test_udtf(TABLE (v))").collect(), [Row(a=6), Row(a=7)], ) @@ -561,7 +561,7 @@ class BaseUDTFTestsMixin: with self.tempView("v"): self.spark.sql("CREATE OR REPLACE TEMPORARY VIEW v as SELECT id FROM range(0, 8)") self.assertEqual( -
[spark] branch master updated: [SPARK-44279][BUILD] Upgrade `optionator` to ^0.9.3
This is an automated email from the ASF dual-hosted git repository. sarutak pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new d35fda69e49 [SPARK-44279][BUILD] Upgrade `optionator` to ^0.9.3 d35fda69e49 is described below commit d35fda69e49b06cda316ecd664acb22cb8c12266 Author: Bjørn Jørgensen AuthorDate: Fri Jul 14 03:26:56 2023 +0900 [SPARK-44279][BUILD] Upgrade `optionator` to ^0.9.3 ### What changes were proposed in this pull request? This PR proposes a change in the package.json file to update the resolution for the `optionator` package to ^0.9.3. I've added a resolutions field to package.json and specified the `optionator` package version as ^0.9.3. This will ensure that our project uses `optionator` version 0.9.3 or the latest minor or patch version (due to the caret ^), regardless of any other version that may be specified in the dependencies or nested dependencies of our project. ### Why are the changes needed? [CVE-2023-26115](https://nvd.nist.gov/vuln/detail/CVE-2023-26115) ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA Closes #41955 from bjornjorgensen/word-wrap. Authored-by: Bjørn Jørgensen Signed-off-by: Kousuke Saruta --- dev/package-lock.json | 774 ++ dev/package.json | 3 + 2 files changed, 350 insertions(+), 427 deletions(-) diff --git a/dev/package-lock.json b/dev/package-lock.json index 104a3fb7854..f676b9cec07 100644 --- a/dev/package-lock.json +++ b/dev/package-lock.json @@ -10,6 +10,15 @@ "minimatch": "^3.1.2" } }, +"node_modules/@aashutoshrathi/word-wrap": { + "version": "1.2.6", + "resolved": "https://registry.npmjs.org/@aashutoshrathi/word-wrap/-/word-wrap-1.2.6.tgz;, + "integrity": "sha512-1Yjs2SvM8TflER/OD3cOjhWWOZb58A2t7wpE2S9XfBYTiIl+XFhQG2bjy4Pu1I+EAlCNUzRDYDdFwFYUKvXcIA==", + "dev": true, + "engines": { +"node": ">=0.10.0" + } +}, "node_modules/@babel/code-frame": { "version": "7.12.11", "resolved": "https://registry.npmjs.org/@babel/code-frame/-/code-frame-7.12.11.tgz;, @@ -20,20 +29,38 @@ } }, "node_modules/@babel/helper-validator-identifier": { - "version": "7.14.0", - "resolved": "https://registry.npmjs.org/@babel/helper-validator-identifier/-/helper-validator-identifier-7.14.0.tgz;, - "integrity": "sha512-V3ts7zMSu5lfiwWDVWzRDGIN+lnCEUdaXgtVHJgLb1rGaA6jMrtB9EmE7L18foXJIE8Un/A/h6NJfGQp/e1J4A==", - "dev": true + "version": "7.22.5", + "resolved": "https://registry.npmjs.org/@babel/helper-validator-identifier/-/helper-validator-identifier-7.22.5.tgz;, + "integrity": "sha512-aJXu+6lErq8ltp+JhkJUfk1MTGyuA4v7f3pA+BJ5HLfNC6nAQ0Cpi9uOquUj8Hehg0aUiHzWQbOVJGao6ztBAQ==", + "dev": true, + "engines": { +"node": ">=6.9.0" + } }, "node_modules/@babel/highlight": { - "version": "7.14.0", - "resolved": "https://registry.npmjs.org/@babel/highlight/-/highlight-7.14.0.tgz;, - "integrity": "sha512-YSCOwxvTYEIMSGaBQb5kDDsCopDdiUGsqpatp3fOlI4+2HQSkTmEVWnVuySdAC5EWCqSWWTv0ib63RjR7dTBdg==", + "version": "7.22.5", + "resolved": "https://registry.npmjs.org/@babel/highlight/-/highlight-7.22.5.tgz;, + "integrity": "sha512-BSKlD1hgnedS5XRnGOljZawtag7H1yPfQp0tdNJCHoH6AZ+Pcm9VvkrK59/Yy593Ypg0zMxH2BxD1VPYUQ7UIw==", "dev": true, "dependencies": { -"@babel/helper-validator-identifier": "^7.14.0", +"@babel/helper-validator-identifier": "^7.22.5", "chalk": "^2.0.0", "js-tokens": "^4.0.0" + }, + "engines": { +"node": ">=6.9.0" + } +}, +"node_modules/@babel/highlight/node_modules/ansi-styles": { + "version": "3.2.1", + "resolved": "https://registry.npmjs.org/ansi-styles/-/ansi-styles-3.2.1.tgz;, + "integrity": "sha512-VT0ZI6kZRdTh8YyJw3SMbYm/u+NqfsAxEpWO0Pf9sq8/e94WxxOpPKx9FR1FlyCtOVDNOQ+8ntlqFxiRc+r5qA==", + "dev": true, + "dependencies": { +"color-convert": "^1.9.0" + }, + "engines": { +"node": ">=4" } }, "node_modules/@babel/highlight/node_modules/chalk": { @@ -50,16 +77,40 @@ "node": ">=4" } }, +"node_modules/@babel/highlight/node_modules/color-convert": { + "version": "1.9.3", + "resolved": "https://registry.npmjs.org/color-convert/-/color-convert-1.9.3.tgz;, + "integrity": "sha512-QfAUtd+vFdAtFQcC8CCyYt1fYWxSqAiK2cSD6zDB8N3cpsEBAvRxp9zOGg6G/SHHJYAT88/az/IuDGALsNVbGg==", + "dev": true, + "dependencies": { +"color-name": "1.1.3" + } +}, +"node_modules/@babel/highlight/node_modules/color-name": { + "version": "1.1.3", + "resolved":
[spark] branch master updated: [SPARK-44398][CONNECT] Scala foreachBatch API
This is an automated email from the ASF dual-hosted git repository. xinrong pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 4771853c9bc [SPARK-44398][CONNECT] Scala foreachBatch API 4771853c9bc is described below commit 4771853c9bc26b8741091d63d77c4b6487e74189 Author: Raghu Angadi AuthorDate: Thu Jul 13 10:47:49 2023 -0700 [SPARK-44398][CONNECT] Scala foreachBatch API This implements Scala foreachBatch(). The implementation basic and needs some more enhancements. The server side will be shared by Python implementation as well. One notable hack in this PR is that it runs user's `foreachBatch()` with regular(legacy) DataFrame, rather than setting up remote Spark connect session and connect DataFrame. ### Why are the changes needed? Adds foreachBatch() support in Scala Spark Connect. ### Does this PR introduce _any_ user-facing change? Yes. Adds foreachBatch() API ### How was this patch tested? - A simple unit test. Closes #41969 from rangadi/feb-scala. Authored-by: Raghu Angadi Signed-off-by: Xinrong Meng --- .../spark/sql/streaming/DataStreamWriter.scala | 28 ++- .../spark/sql/streaming/StreamingQuerySuite.scala | 52 - .../src/main/protobuf/spark/connect/commands.proto | 11 +-- .../sql/connect/planner/SparkConnectPlanner.scala | 25 +- .../planner/StreamingForeachBatchHelper.scala | 69 + python/pyspark/sql/connect/proto/commands_pb2.py | 88 +++--- python/pyspark/sql/connect/proto/commands_pb2.pyi | 46 +++ python/pyspark/sql/connect/streaming/readwriter.py | 4 +- 8 files changed, 251 insertions(+), 72 deletions(-) diff --git a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala index 9f63f68a000..ad76ab4a1bc 100644 --- a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala +++ b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala @@ -30,12 +30,15 @@ import org.apache.spark.connect.proto.Command import org.apache.spark.connect.proto.WriteStreamOperationStart import org.apache.spark.internal.Logging import org.apache.spark.sql.{Dataset, ForeachWriter} +import org.apache.spark.sql.connect.common.DataTypeProtoConverter import org.apache.spark.sql.connect.common.ForeachWriterPacket import org.apache.spark.sql.execution.streaming.AvailableNowTrigger import org.apache.spark.sql.execution.streaming.ContinuousTrigger import org.apache.spark.sql.execution.streaming.OneTimeTrigger import org.apache.spark.sql.execution.streaming.ProcessingTimeTrigger +import org.apache.spark.sql.types.NullType import org.apache.spark.util.SparkSerDeUtils +import org.apache.spark.util.Utils /** * Interface used to write a streaming `Dataset` to external storage systems (e.g. file systems, @@ -218,7 +221,30 @@ final class DataStreamWriter[T] private[sql] (ds: Dataset[T]) extends Logging { val scalaWriterBuilder = proto.ScalarScalaUDF .newBuilder() .setPayload(ByteString.copyFrom(serialized)) -sinkBuilder.getForeachWriterBuilder.setScalaWriter(scalaWriterBuilder) +sinkBuilder.getForeachWriterBuilder.setScalaFunction(scalaWriterBuilder) +this + } + + /** + * :: Experimental :: + * + * (Scala-specific) Sets the output of the streaming query to be processed using the provided + * function. This is supported only in the micro-batch execution modes (that is, when the + * trigger is not continuous). In every micro-batch, the provided function will be called in + * every micro-batch with (i) the output rows as a Dataset and (ii) the batch identifier. The + * batchId can be used to deduplicate and transactionally write the output (that is, the + * provided Dataset) to external systems. The output Dataset is guaranteed to be exactly the + * same for the same batchId (assuming all operations are deterministic in the query). + * + * @since 3.5.0 + */ + @Evolving + def foreachBatch(function: (Dataset[T], Long) => Unit): DataStreamWriter[T] = { +val serializedFn = Utils.serialize(function) +sinkBuilder.getForeachBatchBuilder.getScalaFunctionBuilder + .setPayload(ByteString.copyFrom(serializedFn)) + .setOutputType(DataTypeProtoConverter.toConnectProtoType(NullType)) // Unused. + .setNullable(true) // Unused. this } diff --git a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/streaming/StreamingQuerySuite.scala b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/streaming/StreamingQuerySuite.scala index 6ddcedf19cb..438e6e0c2fe 100644 ---
[spark] branch master updated: [SPARK-44338][SQL] Fix view schema mismatch error message
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 7e0679be8aa [SPARK-44338][SQL] Fix view schema mismatch error message 7e0679be8aa is described below commit 7e0679be8aa1aea531417907da7c66d86b74302a Author: Wenchen Fan AuthorDate: Thu Jul 13 10:32:43 2023 -0700 [SPARK-44338][SQL] Fix view schema mismatch error message ### What changes were proposed in this pull request? This fixes a minor issue that the view schema mismatch error message may have an extra space between view name and `AS SELECT`. It also takes the chance to simplify the view test a little bit, to always use `tableIdentifier` to produce the view name. ### Why are the changes needed? small fix of the error message ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? existing tests Closes #41898 from cloud-fan/view. Authored-by: Wenchen Fan Signed-off-by: Gengliang Wang --- .../sql/catalyst/catalog/SessionCatalog.scala | 6 +-- .../columnresolution-negative.sql.out | 2 +- .../results/columnresolution-negative.sql.out | 2 +- .../spark/sql/execution/SQLViewTestSuite.scala | 53 +- 4 files changed, 27 insertions(+), 36 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala index 135dc17ad95..392c911ddb8 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala @@ -909,11 +909,11 @@ class SessionCatalog( val viewText = metadata.viewText.get val userSpecifiedColumns = if (metadata.schema.fieldNames.toSeq == metadata.viewQueryColumnNames) { - "" + " " } else { - s"(${metadata.schema.fieldNames.mkString(", ")})" + s" (${metadata.schema.fieldNames.mkString(", ")}) " } - Some(s"CREATE OR REPLACE VIEW $viewName $userSpecifiedColumns AS $viewText") + Some(s"CREATE OR REPLACE VIEW $viewName${userSpecifiedColumns}AS $viewText") } } diff --git a/sql/core/src/test/resources/sql-tests/analyzer-results/columnresolution-negative.sql.out b/sql/core/src/test/resources/sql-tests/analyzer-results/columnresolution-negative.sql.out index 95f3e53ff60..c80b9e82ab0 100644 --- a/sql/core/src/test/resources/sql-tests/analyzer-results/columnresolution-negative.sql.out +++ b/sql/core/src/test/resources/sql-tests/analyzer-results/columnresolution-negative.sql.out @@ -417,7 +417,7 @@ org.apache.spark.sql.AnalysisException "actualCols" : "[]", "colName" : "i1", "expectedNum" : "1", -"suggestion" : "CREATE OR REPLACE VIEW spark_catalog.mydb1.v1 AS SELECT * FROM t1", +"suggestion" : "CREATE OR REPLACE VIEW spark_catalog.mydb1.v1 AS SELECT * FROM t1", "viewName" : "`spark_catalog`.`mydb1`.`v1`" } } diff --git a/sql/core/src/test/resources/sql-tests/results/columnresolution-negative.sql.out b/sql/core/src/test/resources/sql-tests/results/columnresolution-negative.sql.out index 4d700b0a142..b3e62faa3b3 100644 --- a/sql/core/src/test/resources/sql-tests/results/columnresolution-negative.sql.out +++ b/sql/core/src/test/resources/sql-tests/results/columnresolution-negative.sql.out @@ -460,7 +460,7 @@ org.apache.spark.sql.AnalysisException "actualCols" : "[]", "colName" : "i1", "expectedNum" : "1", -"suggestion" : "CREATE OR REPLACE VIEW spark_catalog.mydb1.v1 AS SELECT * FROM t1", +"suggestion" : "CREATE OR REPLACE VIEW spark_catalog.mydb1.v1 AS SELECT * FROM t1", "viewName" : "`spark_catalog`.`mydb1`.`v1`" } } diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/SQLViewTestSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/SQLViewTestSuite.scala index 2956a6345bf..1bee93fa429 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/execution/SQLViewTestSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/SQLViewTestSuite.scala @@ -41,10 +41,7 @@ abstract class SQLViewTestSuite extends QueryTest with SQLTestUtils { protected def viewTypeString: String protected def formattedViewName(viewName: String): String - protected def fullyQualifiedViewName(viewName: String): String protected def tableIdentifier(viewName: String): TableIdentifier - protected def tableIdentQuotedString(viewName: String): String = -tableIdentifier(viewName).quotedString def createView( viewName: String, @@ -185,10 +182,10 @@ abstract class SQLViewTestSuite extends QueryTest with
[spark] branch master updated: [SPARK-44402][SQL][TESTS] Add tests for schema pruning in delta-based DELETEs
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new af27400bd1f [SPARK-44402][SQL][TESTS] Add tests for schema pruning in delta-based DELETEs af27400bd1f is described below commit af27400bd1f674c59564eb66570209a783859f6e Author: aokolnychyi AuthorDate: Thu Jul 13 10:18:51 2023 -0700 [SPARK-44402][SQL][TESTS] Add tests for schema pruning in delta-based DELETEs ### What changes were proposed in this pull request? This PR adds tests for schema pruning for delta-based DELETEs, similar to the ones we have for MERGE and UPDATE. ### Why are the changes needed? These changes are needed to verify schema pruning works as expected. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? This PR comes with tests. Closes #41975 from aokolnychyi/spark-44402. Authored-by: aokolnychyi Signed-off-by: Dongjoon Hyun --- .../connector/DeltaBasedDeleteFromTableSuite.scala | 21 - 1 file changed, 20 insertions(+), 1 deletion(-) diff --git a/sql/core/src/test/scala/org/apache/spark/sql/connector/DeltaBasedDeleteFromTableSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/connector/DeltaBasedDeleteFromTableSuite.scala index 4da85a5ce05..7336b3a6e92 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/connector/DeltaBasedDeleteFromTableSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/connector/DeltaBasedDeleteFromTableSuite.scala @@ -17,7 +17,7 @@ package org.apache.spark.sql.connector -import org.apache.spark.sql.AnalysisException +import org.apache.spark.sql.{AnalysisException, Row} class DeltaBasedDeleteFromTableSuite extends DeleteFromTableSuiteBase { @@ -59,4 +59,23 @@ class DeltaBasedDeleteFromTableSuite extends DeleteFromTableSuiteBase { } assert(exception.message.contains("Row ID attributes cannot be nullable")) } + + test("delete with schema pruning") { +createAndInitTable("pk INT NOT NULL, id INT, country STRING, dep STRING", + """{ "pk": 1, "id": 1, "country": "uk", "dep": "hr" } +|{ "pk": 2, "id": 2, "country": "us", "dep": "software" } +|{ "pk": 3, "id": 3, "country": "canada", "dep": "hr" } +|""".stripMargin) + +executeAndCheckScan( + s"DELETE FROM $tableNameAsString WHERE id <= 1", + // `pk` is used to encode deletes + // `id` is used in the condition + // `_partition` is used in the requested write distribution + expectedScanSchema = "pk INT, id INT, _partition STRING") + +checkAnswer( + sql(s"SELECT * FROM $tableNameAsString"), + Row(2, 2, "us", "software") :: Row(3, 3, "canada", "hr") :: Nil) + } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-42309][SQL] Introduce `INCOMPATIBLE_DATA_TO_TABLE` and sub classes
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new efed39516c0 [SPARK-42309][SQL] Introduce `INCOMPATIBLE_DATA_TO_TABLE` and sub classes efed39516c0 is described below commit efed39516c0c4e9654aec447ce91676026368384 Author: itholic AuthorDate: Thu Jul 13 17:21:29 2023 +0300 [SPARK-42309][SQL] Introduce `INCOMPATIBLE_DATA_TO_TABLE` and sub classes ### What changes were proposed in this pull request? This PR proposes to assign name to _LEGACY_ERROR_TEMP_1204, "INCOMPATIBLE_DATA_TO_TABLE" and its sub classes: - CANNOT_FIND_DATA - AMBIGUOUS_COLUMN_NAME - EXTRA_STRUCT_FIELDS - NULLABLE_COLUMN - NULLABLE_ARRAY_ELEMENTS - NULLABLE_MAP_VALUES - CANNOT_SAFELY_CAST - STRUCT_MISSING_FIELDS - UNEXPECTED_COLUMN_NAME ### Why are the changes needed? We should assign proper name to _LEGACY_ERROR_TEMP_* ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? `./build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite*` Closes #39937 from itholic/LEGACY_1204. Authored-by: itholic Signed-off-by: Max Gekk --- common/utils/src/main/resources/error/README.md| 14 + .../src/main/resources/error/error-classes.json| 59 ++- docs/_data/menu-sql.yaml | 2 +- ...ions-incompatible-data-for-table-error-class.md | 64 +++ ...tions-incompatible-data-to-table-error-class.md | 64 +++ docs/sql-error-conditions.md | 8 + docs/sql-ref-ansi-compliance.md| 3 +- .../sql/catalyst/analysis/AssignmentUtils.scala| 5 +- .../catalyst/analysis/TableOutputResolver.scala| 97 +++-- .../spark/sql/catalyst/types/DataTypeUtils.scala | 59 +-- .../spark/sql/errors/QueryCompilationErrors.scala | 110 +- .../catalyst/analysis/V2WriteAnalysisSuite.scala | 267 ++--- .../types/DataTypeWriteCompatibilitySuite.scala| 429 - .../apache/spark/sql/DataFrameWriterV2Suite.scala | 39 +- .../org/apache/spark/sql/SQLInsertTestSuite.scala | 5 +- .../command/AlignMergeAssignmentsSuite.scala | 78 +++- .../command/AlignUpdateAssignmentsSuite.scala | 54 ++- .../org/apache/spark/sql/sources/InsertSuite.scala | 98 +++-- .../sql/test/DataFrameReaderWriterSuite.scala | 47 ++- .../spark/sql/hive/client/HiveClientSuite.scala| 22 +- 20 files changed, 1100 insertions(+), 424 deletions(-) diff --git a/common/utils/src/main/resources/error/README.md b/common/utils/src/main/resources/error/README.md index 838991c2b6a..dfcb42d49e7 100644 --- a/common/utils/src/main/resources/error/README.md +++ b/common/utils/src/main/resources/error/README.md @@ -1294,6 +1294,20 @@ The following SQLSTATEs are collated from: |IM013|IM |ODBC driver |013 |Trace file error|SQL Server |N |SQL Server | |IM014|IM |ODBC driver |014 |Invalid name of File DSN|SQL Server |N |SQL Server | |IM015|IM |ODBC driver |015 |Corrupt file data source|SQL Server |N |SQL Server | +|KD000 |KD |datasource specific errors|000 |datasource specific errors |Databricks |N |Databricks | +|KD001 |KD |datasource specific errors|001 |Cannot read file footer |Databricks |N |Databricks | +|KD002 |KD |datasource specific errors|002 |Unexpected version |Databricks |N |Databricks | +|KD003 |KD |datasource specific errors|003 |Incorrect access to data type |Databricks |N |Databricks | +|KD004 |KD |datasource specific errors|004 |Delta protocol version error |Databricks |N |Databricks
[spark] branch master updated: [SPARK-44384][SQL][TESTS] Use checkError() to check Exception in *View*Suite, *Namespace*Suite, *DataSource*Suite
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 4dedb4ad2c9 [SPARK-44384][SQL][TESTS] Use checkError() to check Exception in *View*Suite, *Namespace*Suite, *DataSource*Suite 4dedb4ad2c9 is described below commit 4dedb4ad2c9b2ecd75dd9ccec5f565805752ad8e Author: panbingkun AuthorDate: Thu Jul 13 16:26:34 2023 +0300 [SPARK-44384][SQL][TESTS] Use checkError() to check Exception in *View*Suite, *Namespace*Suite, *DataSource*Suite ### What changes were proposed in this pull request? The pr aims to use `checkError()` to check `Exception` in `*View*Suite`, `*Namespace*Suite`, `*DataSource*Suite`, include: - sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite - sql/core/src/test/scala/org/apache/spark/sql/NestedDataSourceSuite - sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2DataFrameSuite - sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2FunctionSuite - sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite - sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2Suite - sql/core/src/test/scala/org/apache/spark/sql/execution/SQLViewSuite - sql/core/src/test/scala/org/apache/spark/sql/execution/command/v2/ShowNamespacesSuite - sql/core/src/test/scala/org/apache/spark/sql/sources/ResolvedDataSourceSuite - sql/core/src/test/scala/org/apache/spark/sql/streaming/sources/StreamingDataSourceV2Suite - sql/hive/src/test/scala/org/apache/spark/sql/hive/MetastoreDataSourcesSuite - sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveSQLViewSuite - sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/command/AlterNamespaceSetLocationSuite ### Why are the changes needed? Migration on checkError() will make the tests independent from the text of error messages. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - Manually test. - Pass GA. Closes #41952 from panbingkun/view_and_namespace_checkerror. Authored-by: panbingkun Signed-off-by: Max Gekk --- .../spark/sql/FileBasedDataSourceSuite.scala | 67 +++-- .../apache/spark/sql/NestedDataSourceSuite.scala | 24 +- .../sql/connector/DataSourceV2DataFrameSuite.scala | 21 +- .../sql/connector/DataSourceV2FunctionSuite.scala | 189 +++-- .../spark/sql/connector/DataSourceV2SQLSuite.scala | 54 +++- .../spark/sql/connector/DataSourceV2Suite.scala| 38 ++- .../apache/spark/sql/execution/SQLViewSuite.scala | 313 ++--- .../execution/command/v2/ShowNamespacesSuite.scala | 28 +- .../sql/sources/ResolvedDataSourceSuite.scala | 24 +- .../sources/StreamingDataSourceV2Suite.scala | 68 +++-- .../spark/sql/hive/MetastoreDataSourcesSuite.scala | 88 +++--- .../sql/hive/execution/HiveSQLViewSuite.scala | 31 +- .../command/AlterNamespaceSetLocationSuite.scala | 11 +- 13 files changed, 671 insertions(+), 285 deletions(-) diff --git a/sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala index e7e53285d62..d69a68f5726 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala @@ -26,7 +26,7 @@ import scala.collection.mutable import org.apache.hadoop.conf.Configuration import org.apache.hadoop.fs.{LocalFileSystem, Path} -import org.apache.spark.SparkException +import org.apache.spark.{SparkException, SparkFileNotFoundException, SparkRuntimeException} import org.apache.spark.scheduler.{SparkListener, SparkListenerTaskEnd} import org.apache.spark.sql.TestingUDT.{IntervalUDT, NullData, NullUDT} import org.apache.spark.sql.catalyst.expressions.{AttributeReference, GreaterThan, Literal} @@ -129,11 +129,13 @@ class FileBasedDataSourceSuite extends QueryTest allFileBasedDataSources.foreach { format => test(s"SPARK-23372 error while writing empty schema files using $format") { withTempPath { outputPath => -val errMsg = intercept[AnalysisException] { - spark.emptyDataFrame.write.format(format).save(outputPath.toString) -} -assert(errMsg.getMessage.contains( - "Datasource does not support writing empty or nested empty schemas")) +checkError( + exception = intercept[AnalysisException] { +spark.emptyDataFrame.write.format(format).save(outputPath.toString) + }, + errorClass = "_LEGACY_ERROR_TEMP_1142", + parameters = Map.empty +) } // Nested empty schema @@ -144,11 +146,13 @@ class
[spark] branch master updated (9aa42a970c4 -> 8bb07388ea6)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 9aa42a970c4 [SPARK-41811][PYTHON][CONNECT] Implement SparkSession.sql's string formatter add 8bb07388ea6 [SPARK-44145][SQL] Callback when ready for execution No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/QueryPlanningTracker.scala | 52 - .../sql/catalyst/QueryPlanningTrackerSuite.scala | 19 +++ .../org/apache/spark/sql/DataFrameWriter.scala | 3 +- .../org/apache/spark/sql/DataFrameWriterV2.scala | 3 +- .../scala/org/apache/spark/sql/SparkSession.scala | 84 ++ .../spark/sql/execution/QueryExecution.scala | 27 - .../spark/sql/execution/QueryExecutionSuite.scala | 128 - 7 files changed, 283 insertions(+), 33 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-41811][PYTHON][CONNECT] Implement SparkSession.sql's string formatter
This is an automated email from the ASF dual-hosted git repository. ruifengz pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 9aa42a970c4 [SPARK-41811][PYTHON][CONNECT] Implement SparkSession.sql's string formatter 9aa42a970c4 is described below commit 9aa42a970c4bd8e54603b1795a0f449bd556b11b Author: Ruifeng Zheng AuthorDate: Thu Jul 13 17:58:00 2023 +0800 [SPARK-41811][PYTHON][CONNECT] Implement SparkSession.sql's string formatter ### What changes were proposed in this pull request? Implement SparkSession.sql's string formatter ### Why are the changes needed? for parity ### Does this PR introduce _any_ user-facing change? yes before: ``` In [1]: spark.createDataFrame([("Alice", 6), ("Bob", 7), ("John", 10)], ['name', 'age']).createOrReplaceTempView("person") In [2]: spark.sql("""SELECT * FROM person WHERE age < {age}""", age = 9).show() --- TypeError Traceback (most recent call last) Cell In[2], line 1 > 1 spark.sql("""SELECT * FROM person WHERE age < {age}""", age = 9).show() TypeError: sql() got an unexpected keyword argument 'age' ``` after: ``` In [1]: spark.createDataFrame([("Alice", 6), ("Bob", 7), ("John", 10)], ['name', 'age']).createOrReplaceTempView("person") In [2]: spark.sql("""SELECT * FROM person WHERE age < {age}""", age = 9).show() +-+---+ | name|age| +-+---+ |Alice| 6| | Bob| 7| +-+---+ ``` ### How was this patch tested? enabled doc test Closes #41980 from zhengruifeng/py_connect_sql_formatter. Authored-by: Ruifeng Zheng Signed-off-by: Ruifeng Zheng --- python/pyspark/pandas/sql_formatter.py| 7 ++--- python/pyspark/sql/connect/session.py | 35 --- python/pyspark/sql/{ => connect}/sql_formatter.py | 30 --- python/pyspark/sql/sql_formatter.py | 5 ++-- python/pyspark/sql/utils.py | 8 ++ 5 files changed, 51 insertions(+), 34 deletions(-) diff --git a/python/pyspark/pandas/sql_formatter.py b/python/pyspark/pandas/sql_formatter.py index 8593703bd94..7501e19c038 100644 --- a/python/pyspark/pandas/sql_formatter.py +++ b/python/pyspark/pandas/sql_formatter.py @@ -264,10 +264,9 @@ class PandasSQLStringFormatter(string.Formatter): val._to_spark().createOrReplaceTempView(df_name) return df_name elif isinstance(val, str): -# This is matched to behavior from JVM implementation. -# See `sql` definition from `sql/catalyst/src/main/scala/org/apache/spark/ -# sql/catalyst/expressions/literals.scala` -return "'" + val.replace("\\", "").replace("'", "\\'") + "'" +from pyspark.sql.utils import get_lit_sql_str + +return get_lit_sql_str(val) else: return val diff --git a/python/pyspark/sql/connect/session.py b/python/pyspark/sql/connect/session.py index ea88d60d760..13868263174 100644 --- a/python/pyspark/sql/connect/session.py +++ b/python/pyspark/sql/connect/session.py @@ -489,13 +489,31 @@ class SparkSession: createDataFrame.__doc__ = PySparkSession.createDataFrame.__doc__ -def sql(self, sqlQuery: str, args: Optional[Union[Dict[str, Any], List]] = None) -> "DataFrame": -cmd = SQL(sqlQuery, args) -data, properties = self.client.execute_command(cmd.command(self._client)) -if "sql_command_result" in properties: -return DataFrame.withPlan(CachedRelation(properties["sql_command_result"]), self) -else: -return DataFrame.withPlan(SQL(sqlQuery, args), self) +def sql( +self, +sqlQuery: str, +args: Optional[Union[Dict[str, Any], List]] = None, +**kwargs: Any, +) -> "DataFrame": + +if len(kwargs) > 0: +from pyspark.sql.connect.sql_formatter import SQLStringFormatter + +formatter = SQLStringFormatter(self) +sqlQuery = formatter.format(sqlQuery, **kwargs) + +try: +cmd = SQL(sqlQuery, args) +data, properties = self.client.execute_command(cmd.command(self._client)) +if "sql_command_result" in properties: +return DataFrame.withPlan(CachedRelation(properties["sql_command_result"]), self) +else: +return DataFrame.withPlan(SQL(sqlQuery, args), self) +finally: +if len(kwargs) > 0: +# TODO: should drop temp views after SPARK-44406 get resolved +# formatter.clear() +pass sql.__doc__ = PySparkSession.sql.__doc__ @@ -808,9 +826,6 @@
[spark] branch master updated: [SPARK-44388][CONNECT] Fix protobuf cast issue when UDF instance is updated
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 357bcac823c [SPARK-44388][CONNECT] Fix protobuf cast issue when UDF instance is updated 357bcac823c is described below commit 357bcac823c0fafbdcb95458327d35e4a492046c Author: vicennial AuthorDate: Thu Jul 13 18:55:10 2023 +0900 [SPARK-44388][CONNECT] Fix protobuf cast issue when UDF instance is updated ### What changes were proposed in this pull request? This PR modifies the arguments of the `ScalarUserDefinedFunction` case class to take in `serializedUdfPacket`, `inputTypes` and `outputType` instead of calculating these types internally from `function`, `inputEncoders` and `outputEncoder`. This allows the class to copy over the serialized udf value from the parent instance instead of re-creating it. Through this, we avoid hitting a variant of the issue mentioned in [SPARK-43198](https://issues.apache.org/jira/browse/SPARK-43198) whic [...] ### Why are the changes needed? Bugfix. Consider the following code: ``` class A(x: Int) { def get = x * 7 } val myUdf = udf((x: Int) => new A(x).get) val modifiedUdf = myUdf.withName("myUdf").asNondeterministic() spark.range(5).select(modifiedUdf(col("id"))).as[Int].collect() ``` Executing this code currently results in hitting the following error: ``` java.lang.ClassCastException: org.apache.spark.connect.proto.ScalarScalaUDF cannot be cast to com.google.protobuf.MessageLite at com.google.protobuf.GeneratedMessageLite$SerializedForm.readResolve(GeneratedMessageLite.java:1462) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at java.io.ObjectStreamClass.invokeReadResolve(ObjectStreamClass.java:1274) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2196) ... ``` If we do not include the `myUdf.withName("myUdf").asNondeterministic()`, the UDF runs as expected. ### Does this PR introduce _any_ user-facing change? Yes, fixes the bug mentioned above. ### How was this patch tested? Added a new E2E test in `ReplE2ESuite`. Closes #41959 from vicennial/SPARK-44388. Lead-authored-by: vicennial Co-authored-by: Venkata Sai Akhil Gudesa Co-authored-by: Venkata Sai Akhil Gudesa Signed-off-by: Hyukjin Kwon --- .../sql/expressions/UserDefinedFunction.scala | 29 +++--- .../spark/sql/application/ReplE2ESuite.scala | 11 2 files changed, 25 insertions(+), 15 deletions(-) diff --git a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala index d911b7efe29..7bce4b5b31a 100644 --- a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala +++ b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala @@ -92,26 +92,23 @@ sealed abstract class UserDefinedFunction { /** * Holder class for a scalar user-defined function and it's input/output encoder(s). */ -case class ScalarUserDefinedFunction( -function: AnyRef, -inputEncoders: Seq[AgnosticEncoder[_]], -outputEncoder: AgnosticEncoder[_], +case class ScalarUserDefinedFunction private ( +// SPARK-43198: Eagerly serialize to prevent the UDF from containing a reference to this class. +serializedUdfPacket: Array[Byte], +inputTypes: Seq[proto.DataType], +outputType: proto.DataType, name: Option[String], override val nullable: Boolean, override val deterministic: Boolean) extends UserDefinedFunction { - // SPARK-43198: Eagerly serialize to prevent the UDF from containing a reference to this class. - private[this] val udf = { -val udfPacketBytes = - SparkSerDeUtils.serialize(UdfPacket(function, inputEncoders, outputEncoder)) + private[this] lazy val udf = { val scalaUdfBuilder = proto.ScalarScalaUDF .newBuilder() - .setPayload(ByteString.copyFrom(udfPacketBytes)) + .setPayload(ByteString.copyFrom(serializedUdfPacket)) // Send the real inputs and return types to obtain the types without deser the udf bytes. - .addAllInputTypes( - inputEncoders.map(_.dataType).map(DataTypeProtoConverter.toConnectProtoType).asJava) - .setOutputType(DataTypeProtoConverter.toConnectProtoType(outputEncoder.dataType)) +
[spark] branch master updated: [SPARK-44391][SQL] Check the number of argument types in `InvokeLike`
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 3e82ac6ea3d [SPARK-44391][SQL] Check the number of argument types in `InvokeLike` 3e82ac6ea3d is described below commit 3e82ac6ea3d9f87c8ac09e481235beefaa1bf758 Author: Max Gekk AuthorDate: Thu Jul 13 12:17:20 2023 +0300 [SPARK-44391][SQL] Check the number of argument types in `InvokeLike` ### What changes were proposed in this pull request? In the PR, I propose to check the number of argument types in the `InvokeLike` expressions. If the input types are provided, the number of types should be exactly the same as the number of argument expressions. ### Why are the changes needed? 1. This PR checks the contract described in the comment explicitly: https://github.com/apache/spark/blob/d9248e83bbb3af49333608bebe7149b1aaeca738/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala#L247 that can prevent the errors of expression implementations, and improve code maintainability. 2. Also it fixes the issue in the `UrlEncode` and `UrlDecode`. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? By running the related tests: ``` $ build/sbt "test:testOnly *UrlFunctionsSuite" $ build/sbt "test:testOnly *DataSourceV2FunctionSuite" ``` Closes #41954 from MaxGekk/fix-url_decode. Authored-by: Max Gekk Signed-off-by: Max Gekk --- common/utils/src/main/resources/error/error-classes.json | 5 + .../explain-results/function_url_decode.explain | 2 +- .../explain-results/function_url_encode.explain | 2 +- .../sql-error-conditions-datatype-mismatch-error-class.md | 4 .../spark/sql/catalyst/analysis/CheckAnalysis.scala | 5 +++-- .../spark/sql/catalyst/expressions/objects/objects.scala | 15 +++ .../spark/sql/catalyst/expressions/urlExpressions.scala | 4 ++-- 7 files changed, 31 insertions(+), 6 deletions(-) diff --git a/common/utils/src/main/resources/error/error-classes.json b/common/utils/src/main/resources/error/error-classes.json index 347ce026476..2c4d2b533a6 100644 --- a/common/utils/src/main/resources/error/error-classes.json +++ b/common/utils/src/main/resources/error/error-classes.json @@ -657,6 +657,11 @@ "The must be between (current value = )." ] }, + "WRONG_NUM_ARG_TYPES" : { +"message" : [ + "The expression requires argument types but the actual number is ." +] + }, "WRONG_NUM_ENDPOINTS" : { "message" : [ "The number of endpoints must be >= 2 to construct intervals but the actual number is ." diff --git a/connector/connect/common/src/test/resources/query-tests/explain-results/function_url_decode.explain b/connector/connect/common/src/test/resources/query-tests/explain-results/function_url_decode.explain index 36b21e27c10..d612190396d 100644 --- a/connector/connect/common/src/test/resources/query-tests/explain-results/function_url_decode.explain +++ b/connector/connect/common/src/test/resources/query-tests/explain-results/function_url_decode.explain @@ -1,2 +1,2 @@ -Project [staticinvoke(class org.apache.spark.sql.catalyst.expressions.UrlCodec$, StringType, decode, g#0, UTF-8, StringType, true, true, true) AS url_decode(g)#0] +Project [staticinvoke(class org.apache.spark.sql.catalyst.expressions.UrlCodec$, StringType, decode, g#0, UTF-8, StringType, StringType, true, true, true) AS url_decode(g)#0] +- LocalRelation , [id#0L, a#0, b#0, d#0, e#0, f#0, g#0] diff --git a/connector/connect/common/src/test/resources/query-tests/explain-results/function_url_encode.explain b/connector/connect/common/src/test/resources/query-tests/explain-results/function_url_encode.explain index 70a0f628fc9..bd2c63e19c6 100644 --- a/connector/connect/common/src/test/resources/query-tests/explain-results/function_url_encode.explain +++ b/connector/connect/common/src/test/resources/query-tests/explain-results/function_url_encode.explain @@ -1,2 +1,2 @@ -Project [staticinvoke(class org.apache.spark.sql.catalyst.expressions.UrlCodec$, StringType, encode, g#0, UTF-8, StringType, true, true, true) AS url_encode(g)#0] +Project [staticinvoke(class org.apache.spark.sql.catalyst.expressions.UrlCodec$, StringType, encode, g#0, UTF-8, StringType, StringType, true, true, true) AS url_encode(g)#0] +- LocalRelation , [id#0L, a#0, b#0, d#0, e#0, f#0, g#0] diff --git a/docs/sql-error-conditions-datatype-mismatch-error-class.md b/docs/sql-error-conditions-datatype-mismatch-error-class.md index 3bd63925323..ddc3e0c2b1b 100644 --- a/docs/sql-error-conditions-datatype-mismatch-error-class.md +++
[GitHub] [spark-website] wangyum closed pull request #448: Updated downloads.js
wangyum closed pull request #448: Updated downloads.js URL: https://github.com/apache/spark-website/pull/448 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark-website] branch asf-site updated: Fix the link of PR-41535 in Spark 3.4.1 release note (#466)
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/spark-website.git The following commit(s) were added to refs/heads/asf-site by this push: new 4be5a1ea33 Fix the link of PR-41535 in Spark 3.4.1 release note (#466) 4be5a1ea33 is described below commit 4be5a1ea33dcaa05aa78dc4b0594493c7b496642 Author: Junbo wang <1042815...@qq.com> AuthorDate: Thu Jul 13 16:04:51 2023 +0800 Fix the link of PR-41535 in Spark 3.4.1 release note (#466) * Fix the link of PR-41535 in Spark 3.4.1 release note * build the html website --- releases/_posts/2023-06-23-spark-release-3-4-1.md | 2 +- site/releases/spark-release-3-4-1.html| 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/releases/_posts/2023-06-23-spark-release-3-4-1.md b/releases/_posts/2023-06-23-spark-release-3-4-1.md index 47e231ce8c..6c19f23fcf 100644 --- a/releases/_posts/2023-06-23-spark-release-3-4-1.md +++ b/releases/_posts/2023-06-23-spark-release-3-4-1.md @@ -15,7 +15,7 @@ Spark 3.4.1 is a maintenance release containing stability fixes. This release is ### Notable changes - - [[SPARK-32559]](https://issues.apache.org/jira/browse/SPARK-32559): Fix the trim logic did't handle ASCII control characters correctly + - [[SPARK-44383]](https://issues.apache.org/jira/browse/SPARK-44383): Fix the trim logic did't handle ASCII control characters correctly - [[SPARK-37829]](https://issues.apache.org/jira/browse/SPARK-37829): Dataframe.joinWith outer-join should return a null value for unmatched row - [[SPARK-42078]](https://issues.apache.org/jira/browse/SPARK-42078): Add `CapturedException` to utils - [[SPARK-42290]](https://issues.apache.org/jira/browse/SPARK-42290): Fix the OOM error can't be reported when AQE on diff --git a/site/releases/spark-release-3-4-1.html b/site/releases/spark-release-3-4-1.html index de2cef6b56..85e1cc7caa 100644 --- a/site/releases/spark-release-3-4-1.html +++ b/site/releases/spark-release-3-4-1.html @@ -130,7 +130,7 @@ Notable changes - https://issues.apache.org/jira/browse/SPARK-32559;>[SPARK-32559]: Fix the trim logic didt handle ASCII control characters correctly + https://issues.apache.org/jira/browse/SPARK-44383;>[SPARK-44383]: Fix the trim logic didt handle ASCII control characters correctly https://issues.apache.org/jira/browse/SPARK-37829;>[SPARK-37829]: Dataframe.joinWith outer-join should return a null value for unmatched row https://issues.apache.org/jira/browse/SPARK-42078;>[SPARK-42078]: Add CapturedException to utils https://issues.apache.org/jira/browse/SPARK-42290;>[SPARK-42290]: Fix the OOM error cant be reported when AQE on - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] yaooqinn merged pull request #466: Fix the link of PR-41535 in Spark 3.4.1 release note
yaooqinn merged PR #466: URL: https://github.com/apache/spark-website/pull/466 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (27a3dccfa04 -> fbb85ac1bbb)
This is an automated email from the ASF dual-hosted git repository. yao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 27a3dccfa04 [SPARK-44390][CORE][SQL] Rename `SparkSerDerseUtils` to `SparkSerDeUtils` add fbb85ac1bbb [SPARK-43321][CONNECT][FOLLOWUP] Better names for APIs used in Scala Client joinWith No new revisions were added by this update. Summary of changes: .../main/scala/org/apache/spark/sql/Dataset.scala | 4 +- .../main/protobuf/spark/connect/relations.proto| 8 +- .../sql/connect/planner/SparkConnectPlanner.scala | 4 +- python/pyspark/sql/connect/proto/relations_pb2.py | 212 ++--- python/pyspark/sql/connect/proto/relations_pb2.pyi | 21 +- .../sql/catalyst/encoders/AgnosticEncoder.scala| 15 +- 6 files changed, 130 insertions(+), 134 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] Kwafoor commented on pull request #466: Fix the link of PR-41535 in Spark 3.4.1 release note
Kwafoor commented on PR #466: URL: https://github.com/apache/spark-website/pull/466#issuecomment-1633609429 > Please check the README.md to build and update the releated html file too fixed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org