[spark] branch master updated: [SPARK-42657][CONNECT][FOLLOWUP] Correct the API version in scaladoc
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new fc75dab0876 [SPARK-42657][CONNECT][FOLLOWUP] Correct the API version in scaladoc fc75dab0876 is described below commit fc75dab087696bdc9001a10bd053d52bee8f0ef4 Author: Cheng Pan AuthorDate: Tue Apr 18 15:34:12 2023 +0900 [SPARK-42657][CONNECT][FOLLOWUP] Correct the API version in scaladoc ### What changes were proposed in this pull request? SPARK-42657 is for Spark 3.5.0. ### Why are the changes needed? Fix the wrong API version ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Passing GA. Closes #40832 from pan3793/SPARK-42657-followup. Authored-by: Cheng Pan Signed-off-by: Hyukjin Kwon --- .../client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala index e285db39e80..d68988cd435 100644 --- a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala +++ b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala @@ -498,7 +498,7 @@ class SparkSession private[sql] ( /** * Register a [[ClassFinder]] for dynamically generated classes. * - * @since 3.4.0 + * @since 3.5.0 */ @Experimental def registerClassFinder(finder: ClassFinder): Unit = client.registerClassFinder(finder) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-43111][PS][CONNECT][PYTHON] Merge nested `if` statements into single `if` statements
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 462d4565cd2 [SPARK-43111][PS][CONNECT][PYTHON] Merge nested `if` statements into single `if` statements 462d4565cd2 is described below commit 462d4565cd2782fe805c9871eeab2d969c79369f Author: Bjørn Jørgensen AuthorDate: Tue Apr 18 13:13:10 2023 +0900 [SPARK-43111][PS][CONNECT][PYTHON] Merge nested `if` statements into single `if` statements ### What changes were proposed in this pull request? This PR aims to simplify the code by merging nested `if` statements into single `if` statements using the `and` operator. There are 7 of these according to [Sonarcloud](https://sonarcloud.io/project/issues?languages=py&resolved=false&rules=python%3AS1066&id=spark-python&open=AYQdnXXBRrJbVxW9ZDpw). And this PR fix them all. ### Why are the changes needed? The changes do not affect the functionality of the code, but they improve readability and maintainability. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. Closes #40759 from bjornjorgensen/Merge-if-with-the-enclosing-one. Lead-authored-by: Bjørn Jørgensen Co-authored-by: bjornjorgensen Signed-off-by: Hyukjin Kwon --- python/pyspark/accumulators.py | 5 ++--- python/pyspark/pandas/frame.py | 17 ++--- python/pyspark/pandas/groupby.py | 17 - python/pyspark/pandas/indexes/base.py | 9 - python/pyspark/pandas/namespace.py | 5 ++--- python/pyspark/sql/connect/streaming/readwriter.py | 11 +-- 6 files changed, 31 insertions(+), 33 deletions(-) diff --git a/python/pyspark/accumulators.py b/python/pyspark/accumulators.py index fe775a37ed8..ce4bb561814 100644 --- a/python/pyspark/accumulators.py +++ b/python/pyspark/accumulators.py @@ -249,9 +249,8 @@ class _UpdateRequestHandler(SocketServer.StreamRequestHandler): while not self.server.server_shutdown: # type: ignore[attr-defined] # Poll every 1 second for new data -- don't block in case of shutdown. r, _, _ = select.select([self.rfile], [], [], 1) -if self.rfile in r: -if func(): -break +if self.rfile in r and func(): +break def accum_updates() -> bool: num_updates = read_int(self.rfile) diff --git a/python/pyspark/pandas/frame.py b/python/pyspark/pandas/frame.py index 8bddcb6bae8..d1c10223432 100644 --- a/python/pyspark/pandas/frame.py +++ b/python/pyspark/pandas/frame.py @@ -8915,15 +8915,19 @@ defaultdict(, {'col..., 'col...})] if len(index_scols) != other._internal.index_level: raise ValueError("Both DataFrames have to have the same number of index levels") -if verify_integrity and len(index_scols) > 0: -if ( +if ( +verify_integrity +and len(index_scols) > 0 +and ( self._internal.spark_frame.select(index_scols) .intersect( other._internal.spark_frame.select(other._internal.index_spark_columns) ) .count() -) > 0: -raise ValueError("Indices have overlapping values") +) +> 0 +): +raise ValueError("Indices have overlapping values") # Lazy import to avoid circular dependency issues from pyspark.pandas.namespace import concat @@ -11581,9 +11585,8 @@ defaultdict(, {'col..., 'col...})] index_columns = psdf._internal.index_spark_column_names num_indices = len(index_columns) -if level: -if level < 0 or level >= num_indices: -raise ValueError("level should be an integer between [0, %s)" % num_indices) +if level is not None and (level < 0 or level >= num_indices): +raise ValueError("level should be an integer between [0, %s)" % num_indices) @pandas_udf(returnType=index_mapper_ret_stype) # type: ignore[call-overload] def index_mapper_udf(s: pd.Series) -> pd.Series: diff --git a/python/pyspark/pandas/groupby.py b/python/pyspark/pandas/groupby.py index 01687c3fd16..01bc72cd809 100644 --- a/python/pyspark/pandas/groupby.py +++ b/python/pyspark/pandas/groupby.py @@ -3550,15 +3550,14 @@ class GroupBy(Generic[FrameLike], metaclass=ABCMeta): if isinstance(self, SeriesGroupBy): raise TypeError("Only numeric aggregation column is accepted.")
[spark] branch branch-3.4 updated: [SPARK-43113][SQL] Evaluate stream-side variables when generating code for a bound condition
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.4 by this push: new 55e152acceb [SPARK-43113][SQL] Evaluate stream-side variables when generating code for a bound condition 55e152acceb is described below commit 55e152accebb1100e18a0d51d44bb6552953139c Author: Bruce Robbins AuthorDate: Tue Apr 18 13:09:41 2023 +0900 [SPARK-43113][SQL] Evaluate stream-side variables when generating code for a bound condition ### What changes were proposed in this pull request? In `JoinCodegenSupport#getJoinCondition`, evaluate any referenced stream-side variables before using them in the generated code. This patch doesn't evaluate the passed stream-side variables directly, but instead evaluates a copy (`streamVars2`). This is because `SortMergeJoin#codegenFullOuter` will want to evaluate the stream-side vars within a different scope than the condition check, so we mustn't delete the initialization code from the original `ExprCode` instances. ### Why are the changes needed? When a bound condition of a full outer join references the same stream-side column more than once, wholestage codegen generates bad code. For example, the following query fails with a compilation error: ``` create or replace temp view v1 as select * from values (1, 1), (2, 2), (3, 1) as v1(key, value); create or replace temp view v2 as select * from values (1, 22, 22), (3, -1, -1), (7, null, null) as v2(a, b, c); select * from v1 full outer join v2 on key = a and value > b and value > c; ``` The error is: ``` org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 277, Column 9: Redefinition of local variable "smj_isNull_7" ``` The same error occurs with code generated from ShuffleHashJoinExec: ``` select /*+ SHUFFLE_HASH(v2) */ * from v1 full outer join v2 on key = a and value > b and value > c; ``` In this case, the error is: ``` org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 174, Column 5: Redefinition of local variable "shj_value_1" ``` Neither `SortMergeJoin#codegenFullOuter` nor `ShuffledHashJoinExec#doProduce` evaluate the stream-side variables before calling `consumeFullOuterJoinRow#getJoinCondition`. As a result, `getJoinCondition` generates definition/initialization code for each referenced stream-side variable at the point of use. If a stream-side variable is used more than once in the bound condition, the definition/initialization code is generated more than once, resulting in the "Redefinition of local varia [...] In the end, the query succeeds, since Spark disables wholestage codegen and tries again. (In the case other join-type/strategy pairs, either the implementations don't call `JoinCodegenSupport#getJoinCondition`, or the stream-side variables are pre-evaluated before the call is made, so no error happens in those cases). ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? New unit tests. Closes #40766 from bersprockets/full_join_codegen_issue. Authored-by: Bruce Robbins Signed-off-by: Hyukjin Kwon (cherry picked from commit 119ec5b2ea86b73afaeabcb1d52136029326cac7) Signed-off-by: Hyukjin Kwon --- .../sql/execution/joins/JoinCodegenSupport.scala | 8 +++-- .../scala/org/apache/spark/sql/JoinSuite.scala | 35 ++ 2 files changed, 40 insertions(+), 3 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/JoinCodegenSupport.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/JoinCodegenSupport.scala index 75f0a359a79..ae91615da0f 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/JoinCodegenSupport.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/JoinCodegenSupport.scala @@ -42,13 +42,15 @@ trait JoinCodegenSupport extends CodegenSupport with BaseJoinExec { buildRow: Option[String] = None): (String, String, Seq[ExprCode]) = { val buildSideRow = buildRow.getOrElse(ctx.freshName("buildRow")) val buildVars = genOneSideJoinVars(ctx, buildSideRow, buildPlan, setDefaultValue = false) +val streamVars2 = streamVars.map(_.copy()) val checkCondition = if (condition.isDefined) { val expr = condition.get - // evaluate the variables from build side that used by condition - val eval = evaluateRequiredVariables(buildPlan.output, buildVars, expr.references) + // evaluate the variables that are used by the condition + val eval = evaluateRequiredVariables(streamPlan.output ++
[spark] branch master updated (dc84e529ba9 -> 119ec5b2ea8)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from dc84e529ba9 [SPARK-43122][CONNECT][PYTHON][ML][TESTS] Reenable TorchDistributorLocalUnitTestsOnConnect and TorchDistributorLocalUnitTestsIIOnConnect add 119ec5b2ea8 [SPARK-43113][SQL] Evaluate stream-side variables when generating code for a bound condition No new revisions were added by this update. Summary of changes: .../sql/execution/joins/JoinCodegenSupport.scala | 8 +++-- .../scala/org/apache/spark/sql/JoinSuite.scala | 35 ++ 2 files changed, 40 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-43122][CONNECT][PYTHON][ML][TESTS] Reenable TorchDistributorLocalUnitTestsOnConnect and TorchDistributorLocalUnitTestsIIOnConnect
This is an automated email from the ASF dual-hosted git repository. ruifengz pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new dc84e529ba9 [SPARK-43122][CONNECT][PYTHON][ML][TESTS] Reenable TorchDistributorLocalUnitTestsOnConnect and TorchDistributorLocalUnitTestsIIOnConnect dc84e529ba9 is described below commit dc84e529ba96ba8afc24216e5fc28d95ce8ce290 Author: Ruifeng Zheng AuthorDate: Tue Apr 18 11:22:58 2023 +0800 [SPARK-43122][CONNECT][PYTHON][ML][TESTS] Reenable TorchDistributorLocalUnitTestsOnConnect and TorchDistributorLocalUnitTestsIIOnConnect ### What changes were proposed in this pull request? `TorchDistributorLocalUnitTestsOnConnect` and `TorchDistributorLocalUnitTestsIIOnConnect` were not stable and occasionally got stuck. However, I can not reproduce the issue locally. The two UTs were disabled, and this PR is to reenable them. I found that the all the tests for PyTorch set up the regular sessions or connect sessions in `setUp` and close them in `tearDown`, however such session operations are very expensive and should be placed into `setUpClass` and `tearDownClass` instead. After this change, the related tests seems much stable. So I think the root cause is still related to the resources, since TorchDistributor works on barrier mode, when there is n [...] ### Why are the changes needed? for test coverage ### Does this PR introduce _any_ user-facing change? No, test-only ### How was this patch tested? CI Closes #40793 from zhengruifeng/torch_reenable. Lead-authored-by: Ruifeng Zheng Co-authored-by: Ruifeng Zheng Signed-off-by: Ruifeng Zheng --- .../tests/connect/test_parity_torch_distributor.py | 111 +++-- python/pyspark/ml/torch/tests/test_distributor.py | 177 +++-- 2 files changed, 158 insertions(+), 130 deletions(-) diff --git a/python/pyspark/ml/tests/connect/test_parity_torch_distributor.py b/python/pyspark/ml/tests/connect/test_parity_torch_distributor.py index 8f5699afdf2..55ea99a6540 100644 --- a/python/pyspark/ml/tests/connect/test_parity_torch_distributor.py +++ b/python/pyspark/ml/tests/connect/test_parity_torch_distributor.py @@ -17,7 +17,6 @@ import os import shutil -import tempfile import unittest have_torch = True @@ -33,6 +32,9 @@ from pyspark.ml.torch.tests.test_distributor import ( TorchDistributorLocalUnitTestsMixin, TorchDistributorDistributedUnitTestsMixin, TorchWrapperUnitTestsMixin, +set_up_test_dirs, +get_local_mode_conf, +get_distributed_mode_conf, ) @@ -40,31 +42,35 @@ from pyspark.ml.torch.tests.test_distributor import ( class TorchDistributorBaselineUnitTestsOnConnect( TorchDistributorBaselineUnitTestsMixin, unittest.TestCase ): -def setUp(self) -> None: -self.spark = SparkSession.builder.remote("local[4]").getOrCreate() +@classmethod +def setUpClass(cls): +cls.spark = SparkSession.builder.remote("local[4]").getOrCreate() -def tearDown(self) -> None: -self.spark.stop() +@classmethod +def tearDownClass(cls): +cls.spark.stop() -@unittest.skip("unstable, ignore for now") +@unittest.skipIf(not have_torch, "torch is required") class TorchDistributorLocalUnitTestsOnConnect( TorchDistributorLocalUnitTestsMixin, unittest.TestCase ): -def setUp(self) -> None: -class_name = self.__class__.__name__ -conf = self._get_spark_conf() -builder = SparkSession.builder.appName(class_name) -for k, v in conf.getAll(): -if k not in ["spark.master", "spark.remote", "spark.app.name"]: -builder = builder.config(k, v) -self.spark = builder.remote("local-cluster[2,2,1024]").getOrCreate() -self.mnist_dir_path = tempfile.mkdtemp() - -def tearDown(self) -> None: -shutil.rmtree(self.mnist_dir_path) -os.unlink(self.gpu_discovery_script_file.name) -self.spark.stop() +@classmethod +def setUpClass(cls): +(cls.gpu_discovery_script_file_name, cls.mnist_dir_path) = set_up_test_dirs() +builder = SparkSession.builder.appName(cls.__name__) +for k, v in get_local_mode_conf().items(): +builder = builder.config(k, v) +builder = builder.config( +"spark.driver.resource.gpu.discoveryScript", cls.gpu_discovery_script_file_name +) +cls.spark = builder.remote("local-cluster[2,2,1024]").getOrCreate() + +@classmethod +def tearDownClass(cls): +shutil.rmtree(cls.mnist_dir_path) +os.unlink(cls.gpu_discovery_script_file_name) +cls.spark.stop() def _get_inputs_for_test_local_training_succeeds(self): return [ @@ -75,24 +81,27 @@ class TorchDistributorLocalUnitTestsOnConnect( ] -@unittest.skip("unstable, ign
[spark-docker] branch master updated: [SPARK-43148] Add Apache Spark 3.4.0 Dockerfiles
This is an automated email from the ASF dual-hosted git repository. yikun pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark-docker.git The following commit(s) were added to refs/heads/master by this push: new fe05e38 [SPARK-43148] Add Apache Spark 3.4.0 Dockerfiles fe05e38 is described below commit fe05e38f0ffad271edccd6ae40a77d5f14f3eef7 Author: Yikun Jiang AuthorDate: Tue Apr 18 10:58:59 2023 +0800 [SPARK-43148] Add Apache Spark 3.4.0 Dockerfiles ### What changes were proposed in this pull request? Add Apache Spark 3.4.0 Dockerfiles. - Add 3.4.0 GPG key - Add .github/workflows/build_3.4.0.yaml - ./add-dockerfiles.sh 3.4.0 ### Why are the changes needed? Apache Spark 3.4.0 released: https://spark.apache.org/releases/spark-release-3-4-0.html ### Does this PR introduce _any_ user-facing change? Yes in future, new image will publised in future (after DOI reviewed) ### How was this patch tested? Add workflow and CI passed Closes #33 from Yikun/3.4.0. Authored-by: Yikun Jiang Signed-off-by: Yikun Jiang --- .github/workflows/build_3.4.0.yaml | 43 .github/workflows/publish.yml | 6 +- .github/workflows/test.yml | 6 +- 3.4.0/scala2.12-java11-python3-r-ubuntu/Dockerfile | 86 .../entrypoint.sh | 114 + 3.4.0/scala2.12-java11-python3-ubuntu/Dockerfile | 83 +++ .../scala2.12-java11-python3-ubuntu/entrypoint.sh | 114 + 3.4.0/scala2.12-java11-r-ubuntu/Dockerfile | 82 +++ 3.4.0/scala2.12-java11-r-ubuntu/entrypoint.sh | 107 +++ 3.4.0/scala2.12-java11-ubuntu/Dockerfile | 79 ++ 3.4.0/scala2.12-java11-ubuntu/entrypoint.sh| 107 +++ tools/template.py | 2 + versions.json | 42 ++-- 13 files changed, 860 insertions(+), 11 deletions(-) diff --git a/.github/workflows/build_3.4.0.yaml b/.github/workflows/build_3.4.0.yaml new file mode 100644 index 000..8dd4e1e --- /dev/null +++ b/.github/workflows/build_3.4.0.yaml @@ -0,0 +1,43 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# + +name: "Build and Test (3.4.0)" + +on: + pull_request: +branches: + - 'master' +paths: + - '3.4.0/**' + - '.github/workflows/build_3.4.0.yaml' + - '.github/workflows/main.yml' + +jobs: + run-build: +strategy: + matrix: +image-type: ["all", "python", "scala", "r"] +name: Run +secrets: inherit +uses: ./.github/workflows/main.yml +with: + spark: 3.4.0 + scala: 2.12 + java: 11 + image-type: ${{ matrix.image-type }} diff --git a/.github/workflows/publish.yml b/.github/workflows/publish.yml index 2941cfb..70b88b8 100644 --- a/.github/workflows/publish.yml +++ b/.github/workflows/publish.yml @@ -25,11 +25,13 @@ on: spark: description: 'The Spark version of Spark image.' required: true -default: '3.3.0' +default: '3.4.0' type: choice options: -- 3.3.0 +- 3.4.0 +- 3.3.2 - 3.3.1 +- 3.3.0 publish: description: 'Publish the image or not.' default: false diff --git a/.github/workflows/test.yml b/.github/workflows/test.yml index efb401b..06e2321 100644 --- a/.github/workflows/test.yml +++ b/.github/workflows/test.yml @@ -25,11 +25,13 @@ on: spark: description: 'The Spark version of Spark image.' required: true -default: '3.3.1' +default: '3.4.0' type: choice options: -- 3.3.0 +- 3.4.0 +- 3.3.2 - 3.3.1 +- 3.3.0 java: description: 'The Java version of Spark image.' default: 11 diff --git a/3.4.0/scala2.12-java11-python3-r-ubuntu/Dockerfile b/3.4.0/scala2.12-java11-python3-r-ubuntu/Dockerfile new file mode 100644 index 000..4f62e8d --- /dev/null +++ b/3.4.0/scala2.12-java11-python3-r-ubuntu/Doc
[spark] branch master updated: [SPARK-43168][SQL] Remove get PhysicalDataType method from Datatype class
This is an automated email from the ASF dual-hosted git repository. hvanhovell pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new db2625c70a8 [SPARK-43168][SQL] Remove get PhysicalDataType method from Datatype class db2625c70a8 is described below commit db2625c70a8c3aff64e6a9466981c8dd49a4ca51 Author: Rui Wang AuthorDate: Mon Apr 17 22:16:25 2023 -0400 [SPARK-43168][SQL] Remove get PhysicalDataType method from Datatype class ### What changes were proposed in this pull request? DataType is public API while we can leave PhysicalDataType as internal API/implementation thus we can remove PhysicalDataType from DataType. So DataType does not need to have a class dependency on PhysicalDataType. ### Why are the changes needed? Simplify DataType. ### Does this PR introduce _any_ user-facing change? NO ### How was this patch tested? UT Closes #40826 from amaliujia/catalyst_datatype_refactor_8. Authored-by: Rui Wang Signed-off-by: Herman van Hovell --- .../spark/sql/catalyst/expressions/SpecializedGettersReader.java | 2 +- .../java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java| 4 ++-- .../main/java/org/apache/spark/sql/vectorized/ColumnarBatchRow.java | 2 +- .../src/main/java/org/apache/spark/sql/vectorized/ColumnarRow.java | 2 +- .../src/main/scala/org/apache/spark/sql/catalyst/InternalRow.scala | 2 +- .../spark/sql/catalyst/expressions/InterpretedUnsafeProjection.scala | 2 +- .../spark/sql/catalyst/expressions/codegen/CodeGenerator.scala | 4 ++-- .../scala/org/apache/spark/sql/catalyst/expressions/literals.scala | 2 +- .../org/apache/spark/sql/catalyst/expressions/namedExpressions.scala | 2 +- .../scala/org/apache/spark/sql/catalyst/types/PhysicalDataType.scala | 5 - .../src/main/scala/org/apache/spark/sql/types/ArrayType.scala| 4 .../src/main/scala/org/apache/spark/sql/types/BinaryType.scala | 3 --- .../src/main/scala/org/apache/spark/sql/types/BooleanType.scala | 3 --- .../src/main/scala/org/apache/spark/sql/types/ByteType.scala | 3 --- .../main/scala/org/apache/spark/sql/types/CalendarIntervalType.scala | 3 --- .../src/main/scala/org/apache/spark/sql/types/CharType.scala | 2 -- .../src/main/scala/org/apache/spark/sql/types/DataType.scala | 4 +--- .../src/main/scala/org/apache/spark/sql/types/DateType.scala | 3 --- .../main/scala/org/apache/spark/sql/types/DayTimeIntervalType.scala | 3 --- .../src/main/scala/org/apache/spark/sql/types/DecimalType.scala | 4 .../src/main/scala/org/apache/spark/sql/types/DoubleType.scala | 3 --- .../src/main/scala/org/apache/spark/sql/types/FloatType.scala| 3 --- .../src/main/scala/org/apache/spark/sql/types/IntegerType.scala | 3 --- .../src/main/scala/org/apache/spark/sql/types/LongType.scala | 3 --- sql/catalyst/src/main/scala/org/apache/spark/sql/types/MapType.scala | 4 .../src/main/scala/org/apache/spark/sql/types/NullType.scala | 3 --- .../src/main/scala/org/apache/spark/sql/types/ShortType.scala| 3 --- .../src/main/scala/org/apache/spark/sql/types/StringType.scala | 3 --- .../src/main/scala/org/apache/spark/sql/types/StructType.scala | 4 +--- .../src/main/scala/org/apache/spark/sql/types/TimestampNTZType.scala | 3 --- .../src/main/scala/org/apache/spark/sql/types/TimestampType.scala| 3 --- .../src/main/scala/org/apache/spark/sql/types/VarcharType.scala | 3 --- .../scala/org/apache/spark/sql/types/YearMonthIntervalType.scala | 3 --- .../org/apache/spark/sql/execution/vectorized/ColumnVectorUtils.java | 2 +- 34 files changed, 18 insertions(+), 84 deletions(-) diff --git a/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/SpecializedGettersReader.java b/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/SpecializedGettersReader.java index c5a7d34281f..be50350b106 100644 --- a/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/SpecializedGettersReader.java +++ b/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/SpecializedGettersReader.java @@ -29,7 +29,7 @@ public final class SpecializedGettersReader { DataType dataType, boolean handleNull, boolean handleUserDefinedType) { -PhysicalDataType physicalDataType = dataType.physicalDataType(); +PhysicalDataType physicalDataType = PhysicalDataType.apply(dataType); if (handleNull && (obj.isNullAt(ordinal) || physicalDataType instanceof PhysicalNullType)) { return null; } diff --git a/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java b/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java index a3097cd6770..d2433292fc7 1
[spark] branch master updated: [SPARK-42984][CONNECT][PYTHON][TESTS] Enable test_createDataFrame_with_single_data_type
This is an automated email from the ASF dual-hosted git repository. ruifengz pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 61e8c5b1fe4 [SPARK-42984][CONNECT][PYTHON][TESTS] Enable test_createDataFrame_with_single_data_type 61e8c5b1fe4 is described below commit 61e8c5b1fe46173464e2e04acf77806086c89a8d Author: Takuya UESHIN AuthorDate: Tue Apr 18 10:13:56 2023 +0800 [SPARK-42984][CONNECT][PYTHON][TESTS] Enable test_createDataFrame_with_single_data_type ### What changes were proposed in this pull request? Enables `ArrowParityTests.test_createDataFrame_with_single_data_type`. ### Why are the changes needed? The test is already fixed by previous commits. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Enabled/updated the related tests. Closes #40828 from ueshin/issues/SPARK-42984/test. Authored-by: Takuya UESHIN Signed-off-by: Ruifeng Zheng --- python/pyspark/sql/tests/connect/test_parity_arrow.py | 2 -- python/pyspark/sql/tests/test_arrow.py| 6 -- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/python/pyspark/sql/tests/connect/test_parity_arrow.py b/python/pyspark/sql/tests/connect/test_parity_arrow.py index ec33bb22a4b..fd05821f052 100644 --- a/python/pyspark/sql/tests/connect/test_parity_arrow.py +++ b/python/pyspark/sql/tests/connect/test_parity_arrow.py @@ -43,8 +43,6 @@ class ArrowParityTests(ArrowTestsMixin, ReusedConnectTestCase): def test_createDataFrame_with_ndarray(self): self.check_createDataFrame_with_ndarray(True) -# TODO(SPARK-42984): ValueError not raised -@unittest.skip("Fails in Spark Connect, should enable.") def test_createDataFrame_with_single_data_type(self): self.check_createDataFrame_with_single_data_type() diff --git a/python/pyspark/sql/tests/test_arrow.py b/python/pyspark/sql/tests/test_arrow.py index 1c96273a22f..cf28d32c903 100644 --- a/python/pyspark/sql/tests/test_arrow.py +++ b/python/pyspark/sql/tests/test_arrow.py @@ -533,8 +533,10 @@ class ArrowTestsMixin: self.check_createDataFrame_with_single_data_type() def check_createDataFrame_with_single_data_type(self): -with self.assertRaisesRegex(ValueError, ".*IntegerType.*not supported.*"): -self.spark.createDataFrame(pd.DataFrame({"a": [1]}), schema="int").collect() +for schema in ["int", IntegerType()]: +with self.subTest(schema=schema): +with self.assertRaisesRegex(ValueError, ".*IntegerType.*not supported.*"): +self.spark.createDataFrame(pd.DataFrame({"a": [1]}), schema=schema).collect() def test_createDataFrame_does_not_modify_input(self): # Some series get converted for Spark to consume, this makes sure input is unchanged - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-41210][K8S] Port executor failure tracker from Spark on YARN to K8s
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 40872e9a094 [SPARK-41210][K8S] Port executor failure tracker from Spark on YARN to K8s 40872e9a094 is described below commit 40872e9a094f8459b0b6f626937ced48a8d98efb Author: Cheng Pan AuthorDate: Tue Apr 18 09:51:28 2023 +0800 [SPARK-41210][K8S] Port executor failure tracker from Spark on YARN to K8s ### What changes were proposed in this pull request? Fail Spark Application when the number of executor failures reaches the threshold. ### Why are the changes needed? Sometimes, the executors can not launch successfully because of the wrong configuration, but in K8s, Driver does not know that, and just keep requesting new executors. This PR ports the window-based executor failure tracking mechanism to K8s(only takes effect when `spark.kubernetes.allocation.pods.allocator` is set to 'direct'), to reduce functionality gap between YARN and K8s. Note that, YARN mode also supports host-based executor allocation failure tracking and application terminating mechanism[2], this PR does not port such functionalities to Kubernetes since it's kind of an independent and big feature, and relies on some YARN features which I'm not sure if K8s has similar one. [1] [SPARK-6735](https://issues.apache.org/jira/browse/SPARK-6735) [2] [SPARK-17675](https://issues.apache.org/jira/browse/SPARK-17675) ### Does this PR introduce _any_ user-facing change? Yes, this PR provides two new configurations - `spark.executor.maxNumFailures` - `spark.executor.failuresValidityInterval` which takes effect on YARN, or on Kubernetes when `spark.kubernetes.allocation.pods.allocator` is set to 'direct'. ### How was this patch tested? New UT added, and manually tested in internal K8s cluster. Closes #40774 from pan3793/SPARK-41210. Authored-by: Cheng Pan Signed-off-by: Kent Yao --- .../main/scala/org/apache/spark/SparkConf.scala| 6 +- .../spark/deploy/ExecutorFailureTracker.scala | 102 + .../org/apache/spark/internal/config/package.scala | 18 .../org/apache/spark/util/SparkExitCode.scala | 3 + .../spark/deploy/ExecutorFailureTrackerSuite.scala | 10 +- .../cluster/k8s/ExecutorPodsAllocator.scala| 36 +++- .../cluster/k8s/ExecutorPodsAllocatorSuite.scala | 44 - .../k8s/KubernetesClusterManagerSuite.scala| 6 +- .../spark/deploy/yarn/ApplicationMaster.scala | 27 +- .../apache/spark/deploy/yarn/YarnAllocator.scala | 3 +- .../yarn/YarnAllocatorNodeHealthTracker.scala | 61 +--- .../org/apache/spark/deploy/yarn/config.scala | 13 --- .../yarn/YarnAllocatorHealthTrackerSuite.scala | 3 +- .../deploy/yarn/YarnShuffleIntegrationSuite.scala | 1 - 14 files changed, 224 insertions(+), 109 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/SparkConf.scala b/core/src/main/scala/org/apache/spark/SparkConf.scala index 08344d8e547..813a14acd19 100644 --- a/core/src/main/scala/org/apache/spark/SparkConf.scala +++ b/core/src/main/scala/org/apache/spark/SparkConf.scala @@ -709,7 +709,11 @@ private[spark] object SparkConf extends Logging { AlternateConfig("spark.yarn.access.namenodes", "2.2"), AlternateConfig("spark.yarn.access.hadoopFileSystems", "3.0")), "spark.kafka.consumer.cache.capacity" -> Seq( - AlternateConfig("spark.sql.kafkaConsumerCache.capacity", "3.0")) + AlternateConfig("spark.sql.kafkaConsumerCache.capacity", "3.0")), +MAX_EXECUTOR_FAILURES.key -> Seq( + AlternateConfig("spark.yarn.max.executor.failures", "3.5")), +EXECUTOR_ATTEMPT_FAILURE_VALIDITY_INTERVAL_MS.key -> Seq( + AlternateConfig("spark.yarn.executor.failuresValidityInterval", "3.5")) ) /** diff --git a/core/src/main/scala/org/apache/spark/deploy/ExecutorFailureTracker.scala b/core/src/main/scala/org/apache/spark/deploy/ExecutorFailureTracker.scala new file mode 100644 index 000..7c7b9c60b47 --- /dev/null +++ b/core/src/main/scala/org/apache/spark/deploy/ExecutorFailureTracker.scala @@ -0,0 +1,102 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License i
[spark] branch master updated (3941369d13a -> cbe94a172ca)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 3941369d13a [SPARK-42657][CONNECT] Support to find and transfer client-side REPL classfiles to server as artifacts add cbe94a172ca [SPARK-43084][SS] Add applyInPandasWithState support for spark connect No new revisions were added by this update. Summary of changes: .../main/protobuf/spark/connect/relations.proto| 24 ++ .../sql/connect/planner/SparkConnectPlanner.scala | 23 ++ dev/sparktestsupport/modules.py| 1 + python/pyspark/sql/connect/_typing.py | 5 + python/pyspark/sql/connect/group.py| 46 +++- python/pyspark/sql/connect/plan.py | 39 python/pyspark/sql/connect/proto/relations_pb2.py | 258 +++-- python/pyspark/sql/connect/proto/relations_pb2.pyi | 80 +++ python/pyspark/sql/pandas/group_ops.py | 3 + .../sql/tests/connect/test_connect_basic.py| 7 - .../test_parity_pandas_grouped_map_with_state.py} | 19 +- .../pandas/test_pandas_grouped_map_with_state.py | 8 +- 12 files changed, 375 insertions(+), 138 deletions(-) copy python/pyspark/{pandas/tests/connect/test_parity_dataframe_spark_io.py => sql/tests/connect/test_parity_pandas_grouped_map_with_state.py} (59%) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] viirya commented on a diff in pull request #454: Improve instructions for the release process
viirya commented on code in PR #454: URL: https://github.com/apache/spark-website/pull/454#discussion_r1169138426 ## release-process.md: ## @@ -186,6 +186,18 @@ that looks something like `[VOTE][RESULT] ...`. Finalize the release +Note that `dev/create-release/do-release-docker.sh` script (`finalize` step ) automates most of the following steps **except** for: +- Publish to CRAN Review Comment: Hm? Last time I did release I remember I still need to do `twine upload`, has it changed? EDIT: it was changed by SPARK-35872. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] viirya commented on a diff in pull request #454: Improve instructions for the release process
viirya commented on code in PR #454: URL: https://github.com/apache/spark-website/pull/454#discussion_r1169148125 ## release-process.md: ## @@ -186,6 +186,18 @@ that looks something like `[VOTE][RESULT] ...`. Finalize the release +Note that `dev/create-release/do-release-docker.sh` script (`finalize` step ) automates most of the following steps **except** for: +- Publish to CRAN +- Update the configuration of Algolia Crawler +- Remove old releases from Mirror Network +- Update the rest of the Spark website +- Create and upload Spark Docker Images +- Create an announcement + +Please manually verify the result after each step. + +Upload to Apache release directory Review Comment: Oh, it was changed by SPARK-35872. 👍 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] viirya commented on a diff in pull request #454: Improve instructions for the release process
viirya commented on code in PR #454: URL: https://github.com/apache/spark-website/pull/454#discussion_r1169138426 ## release-process.md: ## @@ -186,6 +186,18 @@ that looks something like `[VOTE][RESULT] ...`. Finalize the release +Note that `dev/create-release/do-release-docker.sh` script (`finalize` step ) automates most of the following steps **except** for: +- Publish to CRAN Review Comment: Hm? Last time I did release I remember I still need to do `twine upload`, has it changed? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] xinrong-meng commented on a diff in pull request #454: Improve instructions for the release process
xinrong-meng commented on code in PR #454: URL: https://github.com/apache/spark-website/pull/454#discussion_r1169034199 ## release-process.md: ## @@ -186,6 +186,18 @@ that looks something like `[VOTE][RESULT] ...`. Finalize the release +Note that `dev/create-release/do-release-docker.sh` script (`finalize` step ) automates most of the following steps **except** for: +- Publish to CRAN +- Update the configuration of Algolia Crawler +- Remove old releases from Mirror Network +- Update the rest of the Spark website +- Create and upload Spark Docker Images +- Create an announcement + +Please manually verify the result after each step. + +Upload to Apache release directory Review Comment: See more at https://github.com/xinrong-meng/spark/blob/master/dev/create-release/release-build.sh#L97. : ) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] xinrong-meng commented on a diff in pull request #454: Improve instructions for the release process
xinrong-meng commented on code in PR #454: URL: https://github.com/apache/spark-website/pull/454#discussion_r1169032103 ## release-process.md: ## @@ -186,6 +186,18 @@ that looks something like `[VOTE][RESULT] ...`. Finalize the release +Note that `dev/create-release/do-release-docker.sh` script (`finalize` step ) automates most of the following steps **except** for: +- Publish to CRAN Review Comment: "Uploading PySpark to PyPi" is automated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] xinrong-meng commented on pull request #454: Improve instructions for the release process
xinrong-meng commented on PR #454: URL: https://github.com/apache/spark-website/pull/454#issuecomment-1511752634 Thanks @srowen ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] xinrong-meng commented on a diff in pull request #454: Improve instructions for the release process
xinrong-meng commented on code in PR #454: URL: https://github.com/apache/spark-website/pull/454#discussion_r1169032739 ## release-process.md: ## @@ -186,6 +186,18 @@ that looks something like `[VOTE][RESULT] ...`. Finalize the release +Note that `dev/create-release/do-release-docker.sh` script (`finalize` step ) automates most of the following steps **except** for: +- Publish to CRAN +- Update the configuration of Algolia Crawler +- Remove old releases from Mirror Network +- Update the rest of the Spark website +- Create and upload Spark Docker Images +- Create an announcement + +Please manually verify the result after each step. + +Upload to Apache release directory Review Comment: "Moving Spark binaries to the release directory" is automated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-42657][CONNECT] Support to find and transfer client-side REPL classfiles to server as artifacts
This is an automated email from the ASF dual-hosted git repository. hvanhovell pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 3941369d13a [SPARK-42657][CONNECT] Support to find and transfer client-side REPL classfiles to server as artifacts 3941369d13a is described below commit 3941369d13ad885eac21bd8ac1769aaf1a325c5a Author: vicennial AuthorDate: Mon Apr 17 09:18:00 2023 -0400 [SPARK-42657][CONNECT] Support to find and transfer client-side REPL classfiles to server as artifacts ### What changes were proposed in this pull request? This PR introduces the concept of a `ClassFinder` that is able to scrape the REPL output (either file-based or in-memory based) for generated class files. The `ClassFinder` is registered during initialization of the REPL and aids in uploading the generated class files as artifacts to the Spark Connect server. ### Why are the changes needed? To run UDFs which are defined on the client side REPL, we require a mechanism that can find the local REPL classfiles and then utilise the mechanism from https://issues.apache.org/jira/browse/SPARK-42653 to transfer them to the server as artifacts. ### Does this PR introduce _any_ user-facing change? Yes, users can now run UDFs on the default (ammonite) REPL with spark connect. Input (in REPL): ``` class A(x: Int) { def get = x * 5 + 19 } def dummyUdf(x: Int): Int = new A(x).get val myUdf = udf(dummyUdf _) spark.range(5).select(myUdf(col("id"))).as[Int].collect() ``` Output: ``` Array[Int] = Array(19, 24, 29, 34, 39) ``` ### How was this patch tested? Unit tests + E2E tests. Closes #40675 from vicennial/SPARK-42657. Lead-authored-by: vicennial Co-authored-by: Venkata Sai Akhil Gudesa Signed-off-by: Herman van Hovell --- .github/workflows/build_and_test.yml | 3 + .../scala/org/apache/spark/sql/SparkSession.scala | 10 +- .../apache/spark/sql/application/ConnectRepl.scala | 37 -- .../spark/sql/connect/client/ArtifactManager.scala | 32 +- .../spark/sql/connect/client/ClassFinder.scala | 80 + .../sql/connect/client/SparkConnectClient.scala| 10 +- .../org/apache/spark/sql/ClientE2ETestSuite.scala | 11 -- .../spark/sql/application/ReplE2ESuite.scala | 128 + .../sql/connect/client/ClassFinderSuite.scala | 57 + .../connect/client/util/RemoteSparkSession.scala | 6 +- .../sql/connect/SimpleSparkConnectService.scala| 2 +- 11 files changed, 349 insertions(+), 27 deletions(-) diff --git a/.github/workflows/build_and_test.yml b/.github/workflows/build_and_test.yml index 311274c9203..630956a9e73 100644 --- a/.github/workflows/build_and_test.yml +++ b/.github/workflows/build_and_test.yml @@ -249,7 +249,10 @@ jobs: # Run the tests. - name: Run tests env: ${{ fromJSON(inputs.envs) }} + shell: 'script -q -e -c "bash {0}"' run: | +# Fix for TTY related issues when launching the Ammonite REPL in tests. +export TERM=vt100 && script -qfc 'echo exit | amm -s' && rm typescript # Hive "other tests" test needs larger metaspace size based on experiment. if [[ "$MODULES_TO_TEST" == "hive" ]] && [[ "$EXCLUDED_TAGS" == "org.apache.spark.tags.SlowHiveTest" ]]; then export METASPACE_SIZE=2g; fi export SERIAL_SBT_TESTS=1 diff --git a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala index b1b779f0f08..e285db39e80 100644 --- a/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala +++ b/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala @@ -33,7 +33,7 @@ import org.apache.spark.sql.catalog.Catalog import org.apache.spark.sql.catalyst.{JavaTypeInference, ScalaReflection} import org.apache.spark.sql.catalyst.encoders.{AgnosticEncoder, RowEncoder} import org.apache.spark.sql.catalyst.encoders.AgnosticEncoders.{BoxedLongEncoder, UnboundRowEncoder} -import org.apache.spark.sql.connect.client.{SparkConnectClient, SparkResult} +import org.apache.spark.sql.connect.client.{ClassFinder, SparkConnectClient, SparkResult} import org.apache.spark.sql.connect.client.util.{Cleaner, ConvertToArrow} import org.apache.spark.sql.connect.common.LiteralValueProtoConverter.toLiteralProto import org.apache.spark.sql.internal.CatalogImpl @@ -495,6 +495,14 @@ class SparkSession private[sql] ( @scala.annotation.varargs def addArtifacts(uri: URI*): Unit = client.addArtifacts(uri) + /** + * Register a [[ClassFinder]] for dynamically generated classes. + * + * @since 3.4.0 + */ + @Experimental + def registerClassFi
[spark] branch master updated: [MINOR][CONNECT][PYTHON] Add missing `super().__init__()` in expressions
This is an automated email from the ASF dual-hosted git repository. ruifengz pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 7a5b6c837a4 [MINOR][CONNECT][PYTHON] Add missing `super().__init__()` in expressions 7a5b6c837a4 is described below commit 7a5b6c837a448f7ede4ef679cac6fd4a6f8babcd Author: Ruifeng Zheng AuthorDate: Mon Apr 17 16:13:00 2023 +0800 [MINOR][CONNECT][PYTHON] Add missing `super().__init__()` in expressions ### What changes were proposed in this pull request? Add missing `super().__init__()` in expressions ### Why are the changes needed? to make IDEA happy: https://user-images.githubusercontent.com/7322292/232402659-20e7f740-7816-495f-967f-d90c3ac7eedc.png";> ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? existing UT Closes #40818 from zhengruifeng/connect_super. Authored-by: Ruifeng Zheng Signed-off-by: Ruifeng Zheng --- python/pyspark/sql/connect/expressions.py | 3 +++ 1 file changed, 3 insertions(+) diff --git a/python/pyspark/sql/connect/expressions.py b/python/pyspark/sql/connect/expressions.py index 1c332e56226..8ed365091fc 100644 --- a/python/pyspark/sql/connect/expressions.py +++ b/python/pyspark/sql/connect/expressions.py @@ -106,6 +106,7 @@ class CaseWhen(Expression): def __init__( self, branches: Sequence[Tuple[Expression, Expression]], else_value: Optional[Expression] ): +super().__init__() assert isinstance(branches, list) for branch in branches: @@ -142,6 +143,7 @@ class CaseWhen(Expression): class ColumnAlias(Expression): def __init__(self, parent: Expression, alias: Sequence[str], metadata: Any): +super().__init__() self._alias = alias self._metadata = metadata @@ -649,6 +651,7 @@ class CommonInlineUserDefinedFunction(Expression): deterministic: bool = False, arguments: Sequence[Expression] = [], ): +super().__init__() self._function_name = function_name self._deterministic = deterministic self._arguments = arguments - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org