[spark] branch master updated: [SPARK-41291][CONNECT][PYTHON] DataFrame.explain` should print and return None
This is an automated email from the ASF dual-hosted git repository. ruifengz pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 1a8f2952225 [SPARK-41291][CONNECT][PYTHON] DataFrame.explain` should print and return None 1a8f2952225 is described below commit 1a8f295222503f869ea9bb919f97202103ddd3c4 Author: Ruifeng Zheng AuthorDate: Mon Nov 28 15:48:32 2022 +0800 [SPARK-41291][CONNECT][PYTHON] DataFrame.explain` should print and return None ### What changes were proposed in this pull request? `DataFrame.explain` should print and return None ### Why are the changes needed? to match the behavior in PySpark ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? updated tests Closes #38816 from zhengruifeng/connect_explain_print. Authored-by: Ruifeng Zheng Signed-off-by: Ruifeng Zheng --- python/pyspark/sql/connect/dataframe.py| 47 -- .../sql/tests/connect/test_connect_basic.py| 10 +++-- 2 files changed, 32 insertions(+), 25 deletions(-) diff --git a/python/pyspark/sql/connect/dataframe.py b/python/pyspark/sql/connect/dataframe.py index 862bb566ebe..d77a3717248 100644 --- a/python/pyspark/sql/connect/dataframe.py +++ b/python/pyspark/sql/connect/dataframe.py @@ -975,29 +975,9 @@ class DataFrame(object): ), "Func returned an instance of type [%s], " "should have been DataFrame." % type(result) return result -def explain( +def _explain_string( self, extended: Optional[Union[bool, str]] = None, mode: Optional[str] = None ) -> str: -"""Retruns plans in string for debugging purpose. - -.. versionadded:: 3.4.0 - -Parameters --- -extended : bool, optional -default ``False``. If ``False``, returns only the physical plan. -When this is a string without specifying the ``mode``, it works as the mode is -specified. -mode : str, optional -specifies the expected output format of plans. - -* ``simple``: Print only a physical plan. -* ``extended``: Print both logical and physical plans. -* ``codegen``: Print a physical plan and generated codes if they are available. -* ``cost``: Print a logical plan and statistics if they are available. -* ``formatted``: Split explain output into two sections: a physical plan outline \ -and node details. -""" if extended is not None and mode is not None: raise ValueError("extended and mode should not be set together.") @@ -1042,6 +1022,31 @@ class DataFrame(object): else: return "" +def explain( +self, extended: Optional[Union[bool, str]] = None, mode: Optional[str] = None +) -> None: +"""Retruns plans in string for debugging purpose. + +.. versionadded:: 3.4.0 + +Parameters +-- +extended : bool, optional +default ``False``. If ``False``, returns only the physical plan. +When this is a string without specifying the ``mode``, it works as the mode is +specified. +mode : str, optional +specifies the expected output format of plans. + +* ``simple``: Print only a physical plan. +* ``extended``: Print both logical and physical plans. +* ``codegen``: Print a physical plan and generated codes if they are available. +* ``cost``: Print a logical plan and statistics if they are available. +* ``formatted``: Split explain output into two sections: a physical plan outline \ +and node details. +""" +print(self._explain_string(extended=extended, mode=mode)) + def createGlobalTempView(self, name: str) -> None: """Creates a global temporary view with this :class:`DataFrame`. diff --git a/python/pyspark/sql/tests/connect/test_connect_basic.py b/python/pyspark/sql/tests/connect/test_connect_basic.py index 187788fd6ab..0b07a8328a1 100644 --- a/python/pyspark/sql/tests/connect/test_connect_basic.py +++ b/python/pyspark/sql/tests/connect/test_connect_basic.py @@ -133,7 +133,7 @@ class SparkConnectTests(SparkConnectSQLTestCase): def test_simple_explain_string(self): df = self.connect.read.table(self.tbl_name).limit(10) -result = df.explain() +result = df._explain_string() self.assertGreater(len(result), 0) def test_schema(self): @@ -330,7 +330,9 @@ class SparkConnectTests(SparkConnectSQLTestCase): def test_subquery_alias(self) -> None: # SPARK-40938: test subquery alias. plan_text = ( - self.connect.read.table(self.tbl_name).alias("special_alias").e
[spark] branch master updated: [SPARK-41225][CONNECT][PYTHON][FOLLOWUP] Disable `semanticHash`, `sameSemantics`, `_repr_html_ `
This is an automated email from the ASF dual-hosted git repository. ruifengz pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new e384cf9f0eb [SPARK-41225][CONNECT][PYTHON][FOLLOWUP] Disable `semanticHash`, `sameSemantics`, `_repr_html_ ` e384cf9f0eb is described below commit e384cf9f0eb16fb017f92efd01db5b05e6edf58a Author: Ruifeng Zheng AuthorDate: Mon Nov 28 14:49:42 2022 +0800 [SPARK-41225][CONNECT][PYTHON][FOLLOWUP] Disable `semanticHash`, `sameSemantics`, `_repr_html_ ` ### What changes were proposed in this pull request? Disable `semanticHash`, `sameSemantics`, `_repr_html_ ` ### Why are the changes needed? 1, Disable `semanticHash`, `sameSemantics` according to the discussions in https://github.com/apache/spark/pull/38742 2, Disable `_repr_html_ ` since it requires [eager mode](https://github.com/apache/spark/blob/40a9a6ef5b89f0c3d19db4a43b8a73decaa173c3/python/pyspark/sql/dataframe.py#L878), otherwise, it just returns `None` ``` In [2]: spark.range(start=0, end=10)._repr_html_() is None Out[2]: True ``` ### Does this PR introduce _any_ user-facing change? for these three methods, throw `NotImplementedError` ### How was this patch tested? added test cases Closes #38815 from zhengruifeng/connect_disable_repr_html_sematic. Authored-by: Ruifeng Zheng Signed-off-by: Ruifeng Zheng --- python/pyspark/sql/connect/dataframe.py| 9 + python/pyspark/sql/tests/connect/test_connect_plan_only.py | 3 +++ 2 files changed, 12 insertions(+) diff --git a/python/pyspark/sql/connect/dataframe.py b/python/pyspark/sql/connect/dataframe.py index 3679c9cb979..862bb566ebe 100644 --- a/python/pyspark/sql/connect/dataframe.py +++ b/python/pyspark/sql/connect/dataframe.py @@ -1133,6 +1133,15 @@ class DataFrame(object): def toJSON(self, *args: Any, **kwargs: Any) -> None: raise NotImplementedError("toJSON() is not implemented.") +def _repr_html_(self, *args: Any, **kwargs: Any) -> None: +raise NotImplementedError("_repr_html_() is not implemented.") + +def semanticHash(self, *args: Any, **kwargs: Any) -> None: +raise NotImplementedError("semanticHash() is not implemented.") + +def sameSemantics(self, *args: Any, **kwargs: Any) -> None: +raise NotImplementedError("sameSemantics() is not implemented.") + class DataFrameNaFunctions: """Functionality for working with missing data in :class:`DataFrame`. diff --git a/python/pyspark/sql/tests/connect/test_connect_plan_only.py b/python/pyspark/sql/tests/connect/test_connect_plan_only.py index 4d9ee04bf3d..109702af7e8 100644 --- a/python/pyspark/sql/tests/connect/test_connect_plan_only.py +++ b/python/pyspark/sql/tests/connect/test_connect_plan_only.py @@ -328,6 +328,9 @@ class SparkConnectTestsPlanOnly(PlanOnlyTestFixture): "toLocalIterator", "checkpoint", "localCheckpoint", +"_repr_html_", +"semanticHash", +"sameSemantics", ): with self.assertRaises(NotImplementedError): getattr(df, f)() - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (cfc950531f1 -> 919e556d182)
This is an automated email from the ASF dual-hosted git repository. ruifengz pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from cfc950531f1 [SPARK-41280][CONNECT] Implement DataFrame.tail add 919e556d182 [SPARK-41263][CONNECT][INFRA] Upgrade buf to 1.9.0 No new revisions were added by this update. Summary of changes: .github/workflows/build_and_test.yml | 2 +- connector/connect/README.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (b20d7d60469 -> cfc950531f1)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from b20d7d60469 [SPARK-41278][CONNECT] Clean up unused QualifiedAttribute in Expression.proto add cfc950531f1 [SPARK-41280][CONNECT] Implement DataFrame.tail No new revisions were added by this update. Summary of changes: .../main/protobuf/spark/connect/relations.proto| 10 ++ .../org/apache/spark/sql/connect/dsl/package.scala | 11 ++ .../sql/connect/planner/SparkConnectPlanner.scala | 7 + python/pyspark/sql/connect/dataframe.py| 24 python/pyspark/sql/connect/plan.py | 28 python/pyspark/sql/connect/proto/relations_pb2.py | 154 +++-- python/pyspark/sql/connect/proto/relations_pb2.pyi | 36 + .../sql/tests/connect/test_connect_basic.py| 5 + 8 files changed, 205 insertions(+), 70 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (b5e130f6eb6 -> b20d7d60469)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from b5e130f6eb6 [SPARK-41275][BUILD] Upgrade pickle to 1.3 add b20d7d60469 [SPARK-41278][CONNECT] Clean up unused QualifiedAttribute in Expression.proto No new revisions were added by this update. Summary of changes: .../main/protobuf/spark/connect/expressions.proto | 7 -- .../org/apache/spark/sql/connect/dsl/package.scala | 27 -- .../sql/connect/planner/SparkConnectPlanner.scala | 6 + .../connect/planner/SparkConnectProtoSuite.scala | 10 .../pyspark/sql/connect/proto/expressions_pb2.py | 21 - .../pyspark/sql/connect/proto/expressions_pb2.pyi | 25 6 files changed, 10 insertions(+), 86 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-41275][BUILD] Upgrade pickle to 1.3
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new b5e130f6eb6 [SPARK-41275][BUILD] Upgrade pickle to 1.3 b5e130f6eb6 is described below commit b5e130f6eb6c2dda7f58ca3955302e6f7df9e8cf Author: yangjie01 AuthorDate: Mon Nov 28 13:21:25 2022 +0900 [SPARK-41275][BUILD] Upgrade pickle to 1.3 ### What changes were proposed in this pull request? This pr aims upgrade pickle from 1.2 to 1.3. ### Why are the changes needed? New version include a bug fix: - https://github.com/irmen/pickle/issues/9 All changes as follows: - https://github.com/irmen/pickle/compare/pickle-1.2...pickle-1.3 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass Github Actions Closes #38810 from LuciferYang/pickle-1.3. Lead-authored-by: yangjie01 Co-authored-by: YangJie Signed-off-by: Hyukjin Kwon --- core/pom.xml | 2 +- dev/deps/spark-deps-hadoop-2-hive-2.3 | 2 +- dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/core/pom.xml b/core/pom.xml index 97cf2ec9d24..182cab90427 100644 --- a/core/pom.xml +++ b/core/pom.xml @@ -449,7 +449,7 @@ net.razorvine pickle - 1.2 + 1.3 net.sf.py4j diff --git a/dev/deps/spark-deps-hadoop-2-hive-2.3 b/dev/deps/spark-deps-hadoop-2-hive-2.3 index 4b4c3e11fbb..8a1aee6f2b9 100644 --- a/dev/deps/spark-deps-hadoop-2-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-2-hive-2.3 @@ -235,7 +235,7 @@ parquet-encoding/1.12.3//parquet-encoding-1.12.3.jar parquet-format-structures/1.12.3//parquet-format-structures-1.12.3.jar parquet-hadoop/1.12.3//parquet-hadoop-1.12.3.jar parquet-jackson/1.12.3//parquet-jackson-1.12.3.jar -pickle/1.2//pickle-1.2.jar +pickle/1.3//pickle-1.3.jar protobuf-java/2.5.0//protobuf-java-2.5.0.jar py4j/0.10.9.7//py4j-0.10.9.7.jar remotetea-oncrpc/1.1.2//remotetea-oncrpc-1.1.2.jar diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 b/dev/deps/spark-deps-hadoop-3-hive-2.3 index 55e5e005436..89729481eac 100644 --- a/dev/deps/spark-deps-hadoop-3-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3-hive-2.3 @@ -222,7 +222,7 @@ parquet-encoding/1.12.3//parquet-encoding-1.12.3.jar parquet-format-structures/1.12.3//parquet-format-structures-1.12.3.jar parquet-hadoop/1.12.3//parquet-hadoop-1.12.3.jar parquet-jackson/1.12.3//parquet-jackson-1.12.3.jar -pickle/1.2//pickle-1.2.jar +pickle/1.3//pickle-1.3.jar protobuf-java/2.5.0//protobuf-java-2.5.0.jar py4j/0.10.9.7//py4j-0.10.9.7.jar remotetea-oncrpc/1.1.2//remotetea-oncrpc-1.1.2.jar - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark-docker] branch master updated: [SPARK-41269][INFRA] Move image matrix into version's workflow
This is an automated email from the ASF dual-hosted git repository. yikun pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark-docker.git The following commit(s) were added to refs/heads/master by this push: new d58e178 [SPARK-41269][INFRA] Move image matrix into version's workflow d58e178 is described below commit d58e17890f07b4c8c8d212775a53c48dc3a6ce42 Author: Yikun Jiang AuthorDate: Mon Nov 28 09:36:54 2022 +0800 [SPARK-41269][INFRA] Move image matrix into version's workflow ### What changes were proposed in this pull request? This patch refactors main workflow: - Move image matrix into version's workflow to make the main workflow more clear. And also will help downstream repo to only validate specified image type. - Move build steps into a same section ### Why are the changes needed? This will help downstream repo to only validate specified image type. After this patch, we will add a test to reuse spark docker workflow like: https://github.com/yikun/spark-docker/commit/45044cee2e8919de7e7353e74f8ca612ad16629a to help developers/users test their self build image. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed Closes #25 from Yikun/matrix-refactor. Authored-by: Yikun Jiang Signed-off-by: Yikun Jiang --- .github/workflows/build_3.3.0.yaml | 4 ++ .github/workflows/build_3.3.1.yaml | 4 ++ .github/workflows/main.yml | 76 -- .github/workflows/publish.yml | 2 + 4 files changed, 51 insertions(+), 35 deletions(-) diff --git a/.github/workflows/build_3.3.0.yaml b/.github/workflows/build_3.3.0.yaml index 7e7ce39..a4f8224 100644 --- a/.github/workflows/build_3.3.0.yaml +++ b/.github/workflows/build_3.3.0.yaml @@ -30,6 +30,9 @@ on: jobs: run-build: +strategy: + matrix: +image-type: ["all", "python", "scala", "r"] name: Run secrets: inherit uses: ./.github/workflows/main.yml @@ -37,3 +40,4 @@ jobs: spark: 3.3.0 scala: 2.12 java: 11 + image-type: ${{ matrix.image-type }} diff --git a/.github/workflows/build_3.3.1.yaml b/.github/workflows/build_3.3.1.yaml index f6a4b7d..9e5c082 100644 --- a/.github/workflows/build_3.3.1.yaml +++ b/.github/workflows/build_3.3.1.yaml @@ -30,6 +30,9 @@ on: jobs: run-build: +strategy: + matrix: +image-type: ["all", "python", "scala", "r"] name: Run secrets: inherit uses: ./.github/workflows/main.yml @@ -37,3 +40,4 @@ jobs: spark: 3.3.1 scala: 2.12 java: 11 + image-type: ${{ matrix.image-type }} diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml index 024b853..ebafcdc 100644 --- a/.github/workflows/main.yml +++ b/.github/workflows/main.yml @@ -47,6 +47,11 @@ on: required: false type: string default: ghcr.io/apache/spark-docker + image-type: +description: The image type of the image (all, python, scala, r). +required: false +type: string +default: python jobs: main: @@ -60,41 +65,33 @@ jobs: image: registry:2 ports: - 5000:5000 -strategy: - matrix: -spark_version: - - ${{ inputs.spark }} -scala_version: - - ${{ inputs.scala }} -java_version: - - ${{ inputs.java }} -image_suffix: [python3-ubuntu, ubuntu, r-ubuntu, python3-r-ubuntu] steps: - name: Checkout Spark Docker repository uses: actions/checkout@v3 - - name: Set up QEMU -uses: docker/setup-qemu-action@v2 - - - name: Set up Docker Buildx -uses: docker/setup-buildx-action@v2 -with: - # This required by local registry - driver-opts: network=host - - - name: Generate tags + - name: Prepare - Generate tags run: | - TAG=scala${{ matrix.scala_version }}-java${{ matrix.java_version }}-${{ matrix.image_suffix }} + case "${{ inputs.image-type }}" in + all) SUFFIX=python3-r-ubuntu + ;; + python) SUFFIX=python3-ubuntu + ;; + r) SUFFIX=r-ubuntu + ;; + scala) SUFFIX=ubuntu + ;; + esac + TAG=scala${{ inputs.scala }}-java${{ inputs.java }}-$SUFFIX REPO_OWNER=$(echo "${{ github.repository_owner }}" | tr '[:upper:]' '[:lower:]') TEST_REPO=localhost:5000/$REPO_OWNER/spark-docker IMAGE_NAME=spark - IMAGE_PATH=${{ matrix.spark_version }}/$TAG - UNIQUE_IMAGE_TAG=${{ matrix.spark_version }}-$TAG + IMAGE_PATH=${{ inputs.spark }}/$TAG + UNIQUE_IMAGE_TAG=${{ inputs.spark }}-$TAG IMAGE_URL=$TEST_REPO/$IMAGE_NAME:$UNIQUE_IMAGE_TAG PUBLISH_REPO=${{ inputs.reposito
[spark] branch master updated (ed3775704bb -> e4b5eec6e27)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from ed3775704bb [MINOR][SQL][TESTS] Restore the code style check of `QueryExecutionErrorsSuite` add e4b5eec6e27 [SPARK-38728][SQL] Test the error class: FAILED_RENAME_PATH No new revisions were added by this update. Summary of changes: .../sql/errors/QueryExecutionErrorsSuite.scala | 35 ++ 1 file changed, 35 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [MINOR][SQL][TESTS] Restore the code style check of `QueryExecutionErrorsSuite`
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new ed3775704bb [MINOR][SQL][TESTS] Restore the code style check of `QueryExecutionErrorsSuite` ed3775704bb is described below commit ed3775704bbdc9a9c479dc06565c8bf8c4d9640c Author: yangjie01 AuthorDate: Sun Nov 27 15:03:35 2022 +0300 [MINOR][SQL][TESTS] Restore the code style check of `QueryExecutionErrorsSuite` ### What changes were proposed in this pull request? https://github.com/apache/spark/blob/9af216d7ac26f0ec916833c2e80a01aef8933529/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryExecutionErrorsSuite.scala#L451-L454 As above code, line 451 in `QueryExecutionErrorsSuite.scala` turn off all scala style check and line 454 just turn on `throwerror` check, so the code after line 454 of the `QueryExecutionErrorsSuite.scala` will not be checked for code style except `throwerror`. This pr restore the code style check and fix a existing `File line length exceeds 100 characters.` case. ### Why are the changes needed? Restore the code style check of `QueryExecutionErrorsSuite` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass GitHub Actions Closes #38812 from LuciferYang/minor-checkstyle. Authored-by: yangjie01 Signed-off-by: Max Gekk --- .../org/apache/spark/sql/errors/QueryExecutionErrorsSuite.scala | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryExecutionErrorsSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryExecutionErrorsSuite.scala index aa0f720d4de..807188bee3a 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryExecutionErrorsSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryExecutionErrorsSuite.scala @@ -448,7 +448,7 @@ class QueryExecutionErrorsSuite override def getResources(name: String): java.util.Enumeration[URL] = { if (name.equals("META-INF/services/org.apache.spark.sql.sources.DataSourceRegister")) { - // scalastyle:off + // scalastyle:off throwerror throw new ServiceConfigurationError(s"Illegal configuration-file syntax: $name", new NoClassDefFoundError("org.apache.spark.sql.sources.HadoopFsRelationProvider")) // scalastyle:on throwerror @@ -632,7 +632,8 @@ class QueryExecutionErrorsSuite }, errorClass = "UNSUPPORTED_DATATYPE", parameters = Map( -"typeName" -> "StructType()[1.1] failure: 'TimestampType' expected but 'S' found\n\nStructType()\n^" +"typeName" -> + "StructType()[1.1] failure: 'TimestampType' expected but 'S' found\n\nStructType()\n^" ), sqlState = "0A000") } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org