[spark] branch master updated (d2f72a21677 -> 0ed48feab65)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from d2f72a21677 [SPARK-43901][SQL] Avro to Support custom decimal type backed by Long add 0ed48feab65 [SPARK-43985][PROTOBUF] spark protobuf: fix enums as ints bug No new revisions were added by this update. Summary of changes: .../spark/sql/protobuf/ProtobufDeserializer.scala | 2 +- .../test/resources/protobuf/functions_suite.desc | Bin 11087 -> 11190 bytes .../test/resources/protobuf/functions_suite.proto | 1 + .../sql/protobuf/ProtobufFunctionsSuite.scala | 26 ++--- 4 files changed, 19 insertions(+), 10 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-43901][SQL] Avro to Support custom decimal type backed by Long
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new d2f72a21677 [SPARK-43901][SQL] Avro to Support custom decimal type backed by Long d2f72a21677 is described below commit d2f72a21677748793f0bb329630d72bc91449587 Author: Siying Dong AuthorDate: Tue Jun 6 19:41:09 2023 -0700 [SPARK-43901][SQL] Avro to Support custom decimal type backed by Long ### What changes were proposed in this pull request? Add a logical type "custom-decimal" in Avro, which can only be backed by physical type long, and will be convert into decimal type. ### Why are the changes needed? A user would like to represent currency (for money) after loading Avro into SQL type. However, there isn't a good way to represent it in Avro. This custom type will allow them to do that. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Added several unit test cases to test the new "custom-decimal" to be loaded successfully and also exception cases. Closes #41409 from siying/customdecimal. Authored-by: Siying Dong Signed-off-by: Dongjoon Hyun --- .../org/apache/spark/sql/avro/CustomDecimal.scala | 79 ++ .../apache/spark/sql/avro/AvroDeserializer.scala | 4 + .../org/apache/spark/sql/avro/AvroFileFormat.scala | 9 +++ .../apache/spark/sql/avro/SchemaConverters.scala | 2 + .../spark/sql/avro/AvroLogicalTypeSuite.scala | 94 +- 5 files changed, 187 insertions(+), 1 deletion(-) diff --git a/connector/avro/src/main/java/org/apache/spark/sql/avro/CustomDecimal.scala b/connector/avro/src/main/java/org/apache/spark/sql/avro/CustomDecimal.scala new file mode 100644 index 000..d76f40c7635 --- /dev/null +++ b/connector/avro/src/main/java/org/apache/spark/sql/avro/CustomDecimal.scala @@ -0,0 +1,79 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.avro + +import org.apache.avro.LogicalType +import org.apache.avro.Schema + +import org.apache.spark.sql.types.DecimalType + +object CustomDecimal { + val TYPE_NAME = "custom-decimal" +} + +// A customized logical type, which will be registered to Avro. This logical type is similar to +// Avro's builtin Decimal type, but is meant to be registered for long type. It indicates that +// the long type should be converted to Spark's Decimal type, with provided precision and scale. +private class CustomDecimal(schema: Schema) extends LogicalType(CustomDecimal.TYPE_NAME) { + val scale : Int = { +val obj = schema.getObjectProp("scale") +obj match { + case null => +throw new IllegalArgumentException(s"Invalid ${CustomDecimal.TYPE_NAME}: missing scale"); + case i : Integer => +i + case other => +throw new IllegalArgumentException(s"Expected int ${CustomDecimal.TYPE_NAME}:scale") +} + } + val precision : Int = { +val obj = schema.getObjectProp("precision") +obj match { + case null => +throw new IllegalArgumentException( + s"Invalid ${CustomDecimal.TYPE_NAME}: missing precision"); + case i: Integer => +i + case other => +throw new IllegalArgumentException(s"Expected int ${CustomDecimal.TYPE_NAME}:precision") +} + } + val className : String = schema.getProp("className") + + override def validate(schema: Schema): Unit = { +super.validate(schema) +if (schema.getType != Schema.Type.LONG) { + throw new IllegalArgumentException( +s"${CustomDecimal.TYPE_NAME} can only be used with an underlying long type") +} +if (precision <= 0) { + throw new IllegalArgumentException(s"Invalid decimal precision: $precision" + +" (must be positive)"); +} else if (precision > DecimalType.MAX_PRECISION) { + throw new IllegalArgumentException( +s"cannot store $precision digits (max ${DecimalType.MAX_PRECISION})") +} +if (scale < 0) { + throw new IllegalArgumentException(s"Invalid decimal scale:
[spark] branch master updated: [SPARK-42750][SQL] Support Insert By Name statement
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new e6adc67d43d [SPARK-42750][SQL] Support Insert By Name statement e6adc67d43d is described below commit e6adc67d43d6beccf21013ee00aa274bed13107c Author: Jia Fan AuthorDate: Wed Jun 7 10:30:59 2023 +0800 [SPARK-42750][SQL] Support Insert By Name statement ### What changes were proposed in this pull request? In some use cases, users have incoming dataframes with fixed column names which might differ from the canonical order. Currently there's no way to handle this easily through the INSERT INTO API - the user has to make sure the columns are in the right order as they would when inserting a tuple. We should add an optional BY NAME clause, such that: `INSERT INTO tgt BY NAME ` takes each column of and inserts it into the column in `tgt` which has the same name according to the configured `resolver` logic. Some definitions need to be clarified: 1. `BY NAME` and specified column insertion (`INSERT INTO t1 (a,b)`... ) is a mutually exclusive operation 2. But it supports to define partition while using `BY NAME`: `INSERT INTO t PARTITION(a=1) BY NAME ` At now don't support `INSERT OVERWRITE BY NAME` (I will supported in follow up) ### Why are the changes needed? Add new feature `INSERT INTO BY NAME` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Add new test. Closes #40908 from Hisoka-X/SPARK-42750_insert_into_by_name. Lead-authored-by: Jia Fan Co-authored-by: Hisoka Signed-off-by: Wenchen Fan --- docs/sql-ref-ansi-compliance.md| 3 +- .../spark/sql/catalyst/parser/SqlBaseLexer.g4 | 1 + .../spark/sql/catalyst/parser/SqlBaseParser.g4 | 4 +- .../spark/sql/catalyst/analysis/Analyzer.scala | 7 ++-- .../sql/catalyst/analysis/CheckAnalysis.scala | 2 +- .../spark/sql/catalyst/parser/AstBuilder.scala | 19 ++ .../sql/catalyst/plans/logical/statements.scala| 7 +++- .../spark/sql/catalyst/parser/DDLParserSuite.scala | 34 + .../execution/datasources/DataSourceStrategy.scala | 9 +++-- .../datasources/FallBackFileSourceV2.scala | 2 +- .../spark/sql/execution/datasources/rules.scala| 14 --- .../sql-tests/analyzer-results/explain-aqe.sql.out | 2 +- .../sql-tests/analyzer-results/explain.sql.out | 2 +- .../sql-tests/results/ansi/keywords.sql.out| 1 + .../sql-tests/results/explain-aqe.sql.out | 2 +- .../resources/sql-tests/results/explain.sql.out| 2 +- .../resources/sql-tests/results/keywords.sql.out | 1 + .../org/apache/spark/sql/SQLInsertTestSuite.scala | 43 -- .../execution/command/PlanResolutionSuite.scala| 4 +- .../ThriftServerWithSparkContextSuite.scala| 2 +- .../org/apache/spark/sql/hive/HiveStrategies.scala | 8 ++-- 21 files changed, 129 insertions(+), 40 deletions(-) diff --git a/docs/sql-ref-ansi-compliance.md b/docs/sql-ref-ansi-compliance.md index 76b5d5aef73..f9c6f5ea6aa 100644 --- a/docs/sql-ref-ansi-compliance.md +++ b/docs/sql-ref-ansi-compliance.md @@ -350,7 +350,7 @@ By default, both `spark.sql.ansi.enabled` and `spark.sql.ansi.enforceReservedKey Below is a list of all the keywords in Spark SQL. |Keyword|Spark SQLANSI Mode|Spark SQLDefault Mode|SQL-2016| -|---|--|-|| +|--|--|-|| |ADD|non-reserved|non-reserved|non-reserved| |AFTER|non-reserved|non-reserved|non-reserved| |ALL|reserved|non-reserved|reserved| @@ -527,6 +527,7 @@ Below is a list of all the keywords in Spark SQL. |MONTH|non-reserved|non-reserved|non-reserved| |MONTHS|non-reserved|non-reserved|non-reserved| |MSCK|non-reserved|non-reserved|non-reserved| +|NAME|non-reserved|non-reserved|non-reserved| |NAMESPACE|non-reserved|non-reserved|non-reserved| |NAMESPACES|non-reserved|non-reserved|non-reserved| |NANOSECOND|non-reserved|non-reserved|non-reserved| diff --git a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseLexer.g4 b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseLexer.g4 index 6300221b542..ecd5f5912fd 100644 --- a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseLexer.g4 +++ b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseLexer.g4 @@ -264,6 +264,7 @@ MINUTES: 'MINUTES'; MONTH: 'MONTH'; MONTHS: 'MONTHS'; MSCK: 'MSCK'; +NAME: 'NAME'; NAMESPACE: 'NAMESPACE'; NAMESPACES: 'NAMESPACES'; NANOSECOND: 'NANOSECOND'; diff --git a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4
[spark] branch master updated (747db6675da -> 54304493e0a)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 747db6675da [SPARK-43930][SQL][PYTHON][CONNECT] Add unix_* functions to Scala and Python add 54304493e0a [SPARK-43615][TESTS][PS][CONNECT] Enable unit test `test_eval` No new revisions were added by this update. Summary of changes: python/pyspark/pandas/tests/connect/computation/test_parity_eval.py | 4 1 file changed, 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (1e197b5bc22 -> 747db6675da)
This is an automated email from the ASF dual-hosted git repository. ruifengz pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 1e197b5bc22 [SPARK-43356][K8S] Migrate deprecated createOrReplace to serverSideApply add 747db6675da [SPARK-43930][SQL][PYTHON][CONNECT] Add unix_* functions to Scala and Python No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/sql/functions.scala | 34 +++ .../apache/spark/sql/PlanGenerationTestSuite.scala | 16 ++ .../explain-results/function_unix_date.explain | 2 + .../explain-results/function_unix_micros.explain | 2 + .../explain-results/function_unix_millis.explain | 2 + .../explain-results/function_unix_seconds.explain | 2 + .../query-tests/queries/function_unix_date.json| 34 +++ .../queries/function_unix_date.proto.bin | Bin 0 -> 153 bytes .../query-tests/queries/function_unix_micros.json | 34 +++ .../queries/function_unix_micros.proto.bin | Bin 0 -> 174 bytes .../query-tests/queries/function_unix_millis.json | 34 +++ .../queries/function_unix_millis.proto.bin | Bin 0 -> 174 bytes .../query-tests/queries/function_unix_seconds.json | 34 +++ .../queries/function_unix_seconds.proto.bin| Bin 0 -> 175 bytes .../source/reference/pyspark.sql/functions.rst | 4 ++ python/pyspark/sql/connect/functions.py| 28 ++ python/pyspark/sql/functions.py| 62 + .../scala/org/apache/spark/sql/functions.scala | 42 ++ .../org/apache/spark/sql/DateFunctionsSuite.scala | 44 +++ 19 files changed, 374 insertions(+) create mode 100644 connector/connect/common/src/test/resources/query-tests/explain-results/function_unix_date.explain create mode 100644 connector/connect/common/src/test/resources/query-tests/explain-results/function_unix_micros.explain create mode 100644 connector/connect/common/src/test/resources/query-tests/explain-results/function_unix_millis.explain create mode 100644 connector/connect/common/src/test/resources/query-tests/explain-results/function_unix_seconds.explain create mode 100644 connector/connect/common/src/test/resources/query-tests/queries/function_unix_date.json create mode 100644 connector/connect/common/src/test/resources/query-tests/queries/function_unix_date.proto.bin create mode 100644 connector/connect/common/src/test/resources/query-tests/queries/function_unix_micros.json create mode 100644 connector/connect/common/src/test/resources/query-tests/queries/function_unix_micros.proto.bin create mode 100644 connector/connect/common/src/test/resources/query-tests/queries/function_unix_millis.json create mode 100644 connector/connect/common/src/test/resources/query-tests/queries/function_unix_millis.proto.bin create mode 100644 connector/connect/common/src/test/resources/query-tests/queries/function_unix_seconds.json create mode 100644 connector/connect/common/src/test/resources/query-tests/queries/function_unix_seconds.proto.bin - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-43356][K8S] Migrate deprecated createOrReplace to serverSideApply
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 1e197b5bc22 [SPARK-43356][K8S] Migrate deprecated createOrReplace to serverSideApply 1e197b5bc22 is described below commit 1e197b5bc2258f9c0657cf9d792005c540ccb7f4 Author: Cheng Pan AuthorDate: Tue Jun 6 19:06:49 2023 -0700 [SPARK-43356][K8S] Migrate deprecated createOrReplace to serverSideApply ### What changes were proposed in this pull request? The deprecation message of `createOrReplace` indicates that we should change `createOrReplace` to `serverSideApply` instead. ``` deprecated please use {link ServerSideApplicable#serverSideApply()} or attempt a create and edit/patch operation. ``` The change is not fully equivalent, but I believe it's reasonable. > With the caveat that the user may choose not to use forcing if they want to know when there are conflicting changes. > > Also unlike createOrReplace if the resourceVersion is set on the resource and a replace is attempted, it will be optimistically locked. See more details at https://github.com/fabric8io/kubernetes-client/pull/5073 ### Why are the changes needed? Remove usage of deprecated API. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. Closes #41136 from pan3793/SPARK-43356. Authored-by: Cheng Pan Signed-off-by: Dongjoon Hyun --- .../spark/deploy/k8s/submit/KubernetesClientApplication.scala | 6 +++--- .../test/scala/org/apache/spark/deploy/k8s/submit/ClientSuite.scala | 5 - 2 files changed, 7 insertions(+), 4 deletions(-) diff --git a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientApplication.scala b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientApplication.scala index 9f9b5655e26..2b2dad1cf13 100644 --- a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientApplication.scala +++ b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientApplication.scala @@ -137,7 +137,7 @@ private[spark] class Client( // setup resources before pod creation val preKubernetesResources = resolvedDriverSpec.driverPreKubernetesResources try { - kubernetesClient.resourceList(preKubernetesResources: _*).createOrReplace() + kubernetesClient.resourceList(preKubernetesResources: _*).forceConflicts().serverSideApply() } catch { case NonFatal(e) => logError("Please check \"kubectl auth can-i create [resource]\" first." + @@ -161,7 +161,7 @@ private[spark] class Client( // Refresh all pre-resources' owner references try { addOwnerReference(createdDriverPod, preKubernetesResources) - kubernetesClient.resourceList(preKubernetesResources: _*).createOrReplace() + kubernetesClient.resourceList(preKubernetesResources: _*).forceConflicts().serverSideApply() } catch { case NonFatal(e) => kubernetesClient.pods().resource(createdDriverPod).delete() @@ -173,7 +173,7 @@ private[spark] class Client( try { val otherKubernetesResources = resolvedDriverSpec.driverKubernetesResources ++ Seq(configMap) addOwnerReference(createdDriverPod, otherKubernetesResources) - kubernetesClient.resourceList(otherKubernetesResources: _*).createOrReplace() + kubernetesClient.resourceList(otherKubernetesResources: _*).forceConflicts().serverSideApply() } catch { case NonFatal(e) => kubernetesClient.pods().resource(createdDriverPod).delete() diff --git a/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/submit/ClientSuite.scala b/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/submit/ClientSuite.scala index 8c2be6c142d..a813b3a876f 100644 --- a/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/submit/ClientSuite.scala +++ b/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/submit/ClientSuite.scala @@ -181,6 +181,8 @@ class ClientSuite extends SparkFunSuite with BeforeAndAfter { createdPodArgumentCaptor = ArgumentCaptor.forClass(classOf[Pod]) createdResourcesArgumentCaptor = ArgumentCaptor.forClass(classOf[HasMetadata]) when(podsWithNamespace.resource(fullExpectedPod())).thenReturn(namedPods) +when(resourceList.forceConflicts()).thenReturn(resourceList) +when(namedPods.serverSideApply()).thenReturn(podWithOwnerReference()) when(namedPods.create()).thenReturn(podWithOwnerReference())
[spark] branch master updated: [SPARK-43906][PYTHON][CONNECT] Implement the file support in SparkSession.addArtifacts
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 6fd1c649c72 [SPARK-43906][PYTHON][CONNECT] Implement the file support in SparkSession.addArtifacts 6fd1c649c72 is described below commit 6fd1c649c72d4b53ecf83c1643d38002d80c9288 Author: Hyukjin Kwon AuthorDate: Wed Jun 7 10:54:24 2023 +0900 [SPARK-43906][PYTHON][CONNECT] Implement the file support in SparkSession.addArtifacts ### What changes were proposed in this pull request? This PR proposes to add the support of the regular files in `SparkSession.addArtifacts`. ### Why are the changes needed? So users can add the regular files in the worker nodes. ### Does this PR introduce _any_ user-facing change? Yes, it adds the support of arbitrary regular files in `SparkSession.addArtifacts`. ### How was this patch tested? Added a couple of unittests. Also manually tested in `local-cluster`: ```bash ./sbin/start-connect-server.sh --jars `ls connector/connect/server/target/**/spark-connect*SNAPSHOT.jar` --master "local-cluster[2,2,1024]" ./bin/pyspark --remote "sc://localhost:15002" ``` ```python import os import tempfile from pyspark.sql.functions import udf from pyspark import SparkFiles with tempfile.TemporaryDirectory() as d: file_path = os.path.join(d, "my_file.txt") with open(file_path, "w") as f: f.write("Hello world!!") udf("string") def func(x): with open( os.path.join(SparkFiles.getRootDirectory(), "my_file.txt"), "r" ) as my_file: return my_file.read().strip() spark.addArtifacts(file_path, file=True) spark.range(1).select(func("id")).show() ``` Closes #41415 from HyukjinKwon/addFile. Authored-by: Hyukjin Kwon Signed-off-by: Hyukjin Kwon --- .../artifact/SparkConnectArtifactManager.scala | 6 +++-- python/pyspark/sql/connect/client/artifact.py | 21 +++ python/pyspark/sql/connect/client/core.py | 4 +-- python/pyspark/sql/connect/session.py | 13 ++--- .../sql/tests/connect/client/test_artifact.py | 31 +++--- 5 files changed, 58 insertions(+), 17 deletions(-) diff --git a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/artifact/SparkConnectArtifactManager.scala b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/artifact/SparkConnectArtifactManager.scala index 604108f68d2..47c48d8e083 100644 --- a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/artifact/SparkConnectArtifactManager.scala +++ b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/artifact/SparkConnectArtifactManager.scala @@ -97,6 +97,7 @@ class SparkConnectArtifactManager private[connect] { * @param session * @param remoteRelativePath * @param serverLocalStagingPath + * @param fragment */ private[connect] def addArtifact( sessionHolder: SessionHolder, @@ -135,8 +136,7 @@ class SparkConnectArtifactManager private[connect] { // previously added, if (Files.exists(target)) { throw new RuntimeException( - s"Duplicate Jar: $remoteRelativePath. " + -s"Jars cannot be overwritten.") + s"Duplicate file: $remoteRelativePath. Files cannot be overwritten.") } Files.move(serverLocalStagingPath, target) if (remoteRelativePath.startsWith(s"jars${File.separator}")) { @@ -154,6 +154,8 @@ class SparkConnectArtifactManager private[connect] { val canonicalUri = fragment.map(UriBuilder.fromUri(target.toUri).fragment).getOrElse(target.toUri) sessionHolder.session.sparkContext.addArchive(canonicalUri.toString) + } else if (remoteRelativePath.startsWith(s"files${File.separator}")) { +sessionHolder.session.sparkContext.addFile(target.toString) } } } diff --git a/python/pyspark/sql/connect/client/artifact.py b/python/pyspark/sql/connect/client/artifact.py index 64f89119e4f..9a848bd96b8 100644 --- a/python/pyspark/sql/connect/client/artifact.py +++ b/python/pyspark/sql/connect/client/artifact.py @@ -39,6 +39,7 @@ import pyspark.sql.connect.proto.base_pb2_grpc as grpc_lib JAR_PREFIX: str = "jars" PYFILE_PREFIX: str = "pyfiles" ARCHIVE_PREFIX: str = "archives" +FILE_PREFIX: str = "files" class LocalData(metaclass=abc.ABCMeta): @@ -107,6 +108,10 @@ def new_archive_artifact(file_name: str, storage: LocalData) -> Artifact: return _new_artifact(ARCHIVE_PREFIX, "", file_name, storage) +def new_file_artifact(file_name: str, storage: LocalData) -> Artifact: +return _new_artifact(FILE_PREFIX, "", file_name,
[spark] branch master updated: [SPARK-43970][PYTHON][CONNECT] Hide unsupported dataframe methods from auto-completion
This is an automated email from the ASF dual-hosted git repository. ruifengz pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new e3957ce8718 [SPARK-43970][PYTHON][CONNECT] Hide unsupported dataframe methods from auto-completion e3957ce8718 is described below commit e3957ce8718697be2fcb2a95ede439bb49ceadad Author: Ruifeng Zheng AuthorDate: Wed Jun 7 09:39:27 2023 +0800 [SPARK-43970][PYTHON][CONNECT] Hide unsupported dataframe methods from auto-completion ### What changes were proposed in this pull request? Hide unsupported dataframe methods from auto-completion ### Why are the changes needed? for better UX before https://github.com/apache/spark/assets/7322292/8f4f228c-e30e-4027-8d52-768f1657f19e;> after https://github.com/apache/spark/assets/7322292/1d308937-dd57-4ca6-b37a-29b09348bda5;> ### Does this PR introduce _any_ user-facing change? yes ### How was this patch tested? manually check in ipython Closes #41462 from zhengruifeng/connect_hide_unsupported_df_functions. Authored-by: Ruifeng Zheng Signed-off-by: Ruifeng Zheng --- python/pyspark/sql/connect/dataframe.py | 48 + 1 file changed, 12 insertions(+), 36 deletions(-) diff --git a/python/pyspark/sql/connect/dataframe.py b/python/pyspark/sql/connect/dataframe.py index a8a4612aec7..6429645f0e0 100644 --- a/python/pyspark/sql/connect/dataframe.py +++ b/python/pyspark/sql/connect/dataframe.py @@ -1572,6 +1572,18 @@ class DataFrame: raise PySparkAttributeError( error_class="JVM_ATTRIBUTE_NOT_SUPPORTED", message_parameters={"attr_name": name} ) +elif name in [ +"rdd", +"toJSON", +"foreach", +"foreachPartition", +"checkpoint", +"localCheckpoint", +]: +raise PySparkNotImplementedError( +error_class="NOT_IMPLEMENTED", +message_parameters={"feature": f"{name}()"}, +) return self[name] @overload @@ -1817,12 +1829,6 @@ class DataFrame: createOrReplaceGlobalTempView.__doc__ = PySparkDataFrame.createOrReplaceGlobalTempView.__doc__ -def rdd(self, *args: Any, **kwargs: Any) -> None: -raise PySparkNotImplementedError( -error_class="NOT_IMPLEMENTED", -message_parameters={"feature": "RDD Support for Spark Connect"}, -) - def cache(self) -> "DataFrame": if self._plan is None: raise Exception("Cannot cache on empty plan.") @@ -1870,18 +1876,6 @@ class DataFrame: def is_cached(self) -> bool: return self.storageLevel != StorageLevel.NONE -def foreach(self, *args: Any, **kwargs: Any) -> None: -raise PySparkNotImplementedError( -error_class="NOT_IMPLEMENTED", -message_parameters={"feature": "foreach()"}, -) - -def foreachPartition(self, *args: Any, **kwargs: Any) -> None: -raise PySparkNotImplementedError( -error_class="NOT_IMPLEMENTED", -message_parameters={"feature": "foreachPartition()"}, -) - def toLocalIterator(self, prefetchPartitions: bool = False) -> Iterator[Row]: from pyspark.sql.connect.conversion import ArrowTableToRowsConversion @@ -1905,18 +1899,6 @@ class DataFrame: toLocalIterator.__doc__ = PySparkDataFrame.toLocalIterator.__doc__ -def checkpoint(self, *args: Any, **kwargs: Any) -> None: -raise PySparkNotImplementedError( -error_class="NOT_IMPLEMENTED", -message_parameters={"feature": "checkpoint()"}, -) - -def localCheckpoint(self, *args: Any, **kwargs: Any) -> None: -raise PySparkNotImplementedError( -error_class="NOT_IMPLEMENTED", -message_parameters={"feature": "localCheckpoint()"}, -) - def to_pandas_on_spark( self, index_col: Optional[Union[str, List[str]]] = None ) -> "PandasOnSparkDataFrame": @@ -2001,12 +1983,6 @@ class DataFrame: writeStream.__doc__ = PySparkDataFrame.writeStream.__doc__ -def toJSON(self, *args: Any, **kwargs: Any) -> None: -raise PySparkNotImplementedError( -error_class="NOT_IMPLEMENTED", -message_parameters={"feature": "toJSON()"}, -) - def sameSemantics(self, other: "DataFrame") -> bool: assert self._plan is not None assert other._plan is not None - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-43893][PYTHON][CONNECT] Non-atomic data type support in Arrow-optimized Python UDF
This is an automated email from the ASF dual-hosted git repository. xinrong pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 94098853592 [SPARK-43893][PYTHON][CONNECT] Non-atomic data type support in Arrow-optimized Python UDF 94098853592 is described below commit 94098853592b524f52e9a340166b96ddeda4e898 Author: Xinrong Meng AuthorDate: Tue Jun 6 15:48:14 2023 -0700 [SPARK-43893][PYTHON][CONNECT] Non-atomic data type support in Arrow-optimized Python UDF ### What changes were proposed in this pull request? Support non-atomic data types in input and output of Arrow-optimized Python UDF. Non-atomic data types refer to: ArrayType, MapType, and StructType. ### Why are the changes needed? Parity with pickled Python UDFs. ### Does this PR introduce _any_ user-facing change? Non-atomic data types are accepted as both input and output of Arrow-optimized Python UDF. For example, ```py >>> df = spark.range(1).selectExpr("struct(1, struct('John', 30, ('value', 10))) as nested_struct") >>> df.select(udf(lambda x: str(x))("nested_struct")).first() Row((nested_struct)="Row(col1=1, col2=Row(col1='John', col2=30, col3=Row(col1='value', col2=10)))") ``` ### How was this patch tested? Unit tests. Closes #41321 from xinrong-meng/arrow_udf_struct. Authored-by: Xinrong Meng Signed-off-by: Xinrong Meng --- python/pyspark/sql/pandas/serializers.py | 22 --- python/pyspark/sql/tests/test_arrow_python_udf.py | 17 - python/pyspark/sql/tests/test_udf.py | 45 +++ python/pyspark/sql/udf.py | 15 +--- python/pyspark/worker.py | 13 +-- 5 files changed, 79 insertions(+), 33 deletions(-) diff --git a/python/pyspark/sql/pandas/serializers.py b/python/pyspark/sql/pandas/serializers.py index 84471143367..12d0bee88ad 100644 --- a/python/pyspark/sql/pandas/serializers.py +++ b/python/pyspark/sql/pandas/serializers.py @@ -172,7 +172,7 @@ class ArrowStreamPandasSerializer(ArrowStreamSerializer): self._timezone = timezone self._safecheck = safecheck -def arrow_to_pandas(self, arrow_column): +def arrow_to_pandas(self, arrow_column, struct_in_pandas="dict"): # If the given column is a date type column, creates a series of datetime.date directly # instead of creating datetime64[ns] as intermediate data to avoid overflow caused by # datetime64[ns] type handling. @@ -184,7 +184,7 @@ class ArrowStreamPandasSerializer(ArrowStreamSerializer): data_type=from_arrow_type(arrow_column.type, prefer_timestamp_ntz=True), nullable=True, timezone=self._timezone, -struct_in_pandas="dict", +struct_in_pandas=struct_in_pandas, error_on_duplicated_field_names=True, ) return converter(s) @@ -310,10 +310,18 @@ class ArrowStreamPandasUDFSerializer(ArrowStreamPandasSerializer): Serializer used by Python worker to evaluate Pandas UDFs """ -def __init__(self, timezone, safecheck, assign_cols_by_name, df_for_struct=False): +def __init__( +self, +timezone, +safecheck, +assign_cols_by_name, +df_for_struct=False, +struct_in_pandas="dict", +): super(ArrowStreamPandasUDFSerializer, self).__init__(timezone, safecheck) self._assign_cols_by_name = assign_cols_by_name self._df_for_struct = df_for_struct +self._struct_in_pandas = struct_in_pandas def arrow_to_pandas(self, arrow_column): import pyarrow.types as types @@ -323,13 +331,15 @@ class ArrowStreamPandasUDFSerializer(ArrowStreamPandasSerializer): series = [ super(ArrowStreamPandasUDFSerializer, self) -.arrow_to_pandas(column) +.arrow_to_pandas(column, self._struct_in_pandas) .rename(field.name) for column, field in zip(arrow_column.flatten(), arrow_column.type) ] s = pd.concat(series, axis=1) else: -s = super(ArrowStreamPandasUDFSerializer, self).arrow_to_pandas(arrow_column) +s = super(ArrowStreamPandasUDFSerializer, self).arrow_to_pandas( +arrow_column, self._struct_in_pandas +) return s def _create_batch(self, series): @@ -360,7 +370,7 @@ class ArrowStreamPandasUDFSerializer(ArrowStreamPandasSerializer): arrs = [] for s, t in series: -if t is not None and pa.types.is_struct(t): +if self._struct_in_pandas == "dict" and t is not None and pa.types.is_struct(t): if not isinstance(s, pd.DataFrame): raise
[spark] branch branch-3.4 updated: [SPARK-43973][SS][UI][TESTS][FOLLOWUP][3.4] Fix compilation by switching QueryTerminatedEvent constructor
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.4 by this push: new 77e077a96f7 [SPARK-43973][SS][UI][TESTS][FOLLOWUP][3.4] Fix compilation by switching QueryTerminatedEvent constructor 77e077a96f7 is described below commit 77e077a96f7dafb1f43af7e9f7b495dce6b1e18a Author: Dongjoon Hyun AuthorDate: Tue Jun 6 11:30:32 2023 -0700 [SPARK-43973][SS][UI][TESTS][FOLLOWUP][3.4] Fix compilation by switching QueryTerminatedEvent constructor ### What changes were proposed in this pull request? This is a follow-up of https://github.com/apache/spark/pull/41468 to fix `branch-3.4`'s compilation issue. ### Why are the changes needed? To recover the compilation. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. Closes #41484 from dongjoon-hyun/SPARK-43973. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- .../spark/sql/streaming/ui/StreamingQueryStatusListenerSuite.scala| 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/sql/core/src/test/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatusListenerSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatusListenerSuite.scala index 4b6d2dc4562..b01abd8032b 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatusListenerSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatusListenerSuite.scala @@ -279,7 +279,7 @@ class StreamingQueryStatusListenerSuite extends StreamTest { val startEvent1 = new StreamingQueryListener.QueryStartedEvent( id1, runId1, "test1", "2023-01-01T20:50:00.800Z") listener.onQueryStarted(startEvent1) -val terminateEvent1 = new StreamingQueryListener.QueryTerminatedEvent(id1, runId1, None, None) +val terminateEvent1 = new StreamingQueryListener.QueryTerminatedEvent(id1, runId1, None) listener.onQueryTerminated(terminateEvent1) // failure (has exception) case @@ -289,7 +289,7 @@ class StreamingQueryStatusListenerSuite extends StreamTest { id2, runId2, "test2", "2023-01-02T20:54:20.827Z") listener.onQueryStarted(startEvent2) val terminateEvent2 = new StreamingQueryListener.QueryTerminatedEvent( - id2, runId2, Option("ExampleException"), Option("EXAMPLE_ERROR_CLASS")) + id2, runId2, Option("ExampleException")) listener.onQueryTerminated(terminateEvent2) // check results - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (b4ab34bf9b2 -> 957f0e4c9f2)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from b4ab34bf9b2 [SPARK-43976][CORE] Handle the case where modifiedConfigs doesn't exist in event logs add 957f0e4c9f2 [SPARK-43959][SQL][TESTS] Make RowLevelOperationSuiteBase and AlignAssignmentsSuite abstract No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/connector/RowLevelOperationSuiteBase.scala | 2 +- .../{AlignAssignmentsSuite.scala => AlignAssignmentsSuiteBase.scala}| 2 +- .../apache/spark/sql/execution/command/AlignMergeAssignmentsSuite.scala | 2 +- .../spark/sql/execution/command/AlignUpdateAssignmentsSuite.scala | 2 +- 4 files changed, 4 insertions(+), 4 deletions(-) rename sql/core/src/test/scala/org/apache/spark/sql/execution/command/{AlignAssignmentsSuite.scala => AlignAssignmentsSuiteBase.scala} (99%) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.3 updated: [SPARK-43976][CORE] Handle the case where modifiedConfigs doesn't exist in event logs
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.3 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.3 by this push: new ababc572b66 [SPARK-43976][CORE] Handle the case where modifiedConfigs doesn't exist in event logs ababc572b66 is described below commit ababc572b66e0ccffa5fefed3c87b18e0f60cc50 Author: Dongjoon Hyun AuthorDate: Tue Jun 6 09:34:40 2023 -0700 [SPARK-43976][CORE] Handle the case where modifiedConfigs doesn't exist in event logs ### What changes were proposed in this pull request? This prevents NPE by handling the case where `modifiedConfigs` doesn't exist in event logs. ### Why are the changes needed? Basically, this is the same solution for that case. - https://github.com/apache/spark/pull/34907 The new code was added here, but we missed the corner case. - https://github.com/apache/spark/pull/35972 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. Closes #41472 from dongjoon-hyun/SPARK-43976. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun (cherry picked from commit b4ab34bf9b22d0f0ca4ab13f9b6106f38ccfaebe) Signed-off-by: Dongjoon Hyun --- .../scala/org/apache/spark/sql/execution/ui/ExecutionPage.scala| 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/ui/ExecutionPage.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/ui/ExecutionPage.scala index 498bb2a6c1c..c0e6b65d634 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/ui/ExecutionPage.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/ui/ExecutionPage.scala @@ -84,15 +84,14 @@ class ExecutionPage(parent: SQLTab) extends WebUIPage("execution") with Logging val metrics = sqlStore.executionMetrics(executionId) val graph = sqlStore.planGraph(executionId) + val configs = Option(executionUIData.modifiedConfigs).getOrElse(Map.empty) summary ++ planVisualization(request, metrics, graph) ++ physicalPlanDescription(executionUIData.physicalPlanDescription) ++ -modifiedConfigs( - executionUIData.modifiedConfigs.filterKeys( -!_.startsWith(pandasOnSparkConfPrefix)).toMap) ++ + modifiedConfigs(configs.filterKeys(!_.startsWith(pandasOnSparkConfPrefix)).toMap) ++ modifiedPandasOnSparkConfigs( - executionUIData.modifiedConfigs.filterKeys(_.startsWith(pandasOnSparkConfPrefix)).toMap) + configs.filterKeys(_.startsWith(pandasOnSparkConfPrefix)).toMap) }.getOrElse { No information to display for query {executionId} } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (8c6a54d70a7 -> b4ab34bf9b2)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 8c6a54d70a7 [SPARK-43919][SQL] Extract JSON functionality out of Row add b4ab34bf9b2 [SPARK-43976][CORE] Handle the case where modifiedConfigs doesn't exist in event logs No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/sql/execution/ui/ExecutionPage.scala| 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.4 updated: [SPARK-43976][CORE] Handle the case where modifiedConfigs doesn't exist in event logs
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.4 by this push: new 778beb3bc67 [SPARK-43976][CORE] Handle the case where modifiedConfigs doesn't exist in event logs 778beb3bc67 is described below commit 778beb3bc67ea784ef0c4314a1f998482eea2e5f Author: Dongjoon Hyun AuthorDate: Tue Jun 6 09:34:40 2023 -0700 [SPARK-43976][CORE] Handle the case where modifiedConfigs doesn't exist in event logs ### What changes were proposed in this pull request? This prevents NPE by handling the case where `modifiedConfigs` doesn't exist in event logs. ### Why are the changes needed? Basically, this is the same solution for that case. - https://github.com/apache/spark/pull/34907 The new code was added here, but we missed the corner case. - https://github.com/apache/spark/pull/35972 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. Closes #41472 from dongjoon-hyun/SPARK-43976. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun (cherry picked from commit b4ab34bf9b22d0f0ca4ab13f9b6106f38ccfaebe) Signed-off-by: Dongjoon Hyun --- .../scala/org/apache/spark/sql/execution/ui/ExecutionPage.scala| 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/ui/ExecutionPage.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/ui/ExecutionPage.scala index 498bb2a6c1c..c0e6b65d634 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/ui/ExecutionPage.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/ui/ExecutionPage.scala @@ -84,15 +84,14 @@ class ExecutionPage(parent: SQLTab) extends WebUIPage("execution") with Logging val metrics = sqlStore.executionMetrics(executionId) val graph = sqlStore.planGraph(executionId) + val configs = Option(executionUIData.modifiedConfigs).getOrElse(Map.empty) summary ++ planVisualization(request, metrics, graph) ++ physicalPlanDescription(executionUIData.physicalPlanDescription) ++ -modifiedConfigs( - executionUIData.modifiedConfigs.filterKeys( -!_.startsWith(pandasOnSparkConfPrefix)).toMap) ++ + modifiedConfigs(configs.filterKeys(!_.startsWith(pandasOnSparkConfPrefix)).toMap) ++ modifiedPandasOnSparkConfigs( - executionUIData.modifiedConfigs.filterKeys(_.startsWith(pandasOnSparkConfPrefix)).toMap) + configs.filterKeys(_.startsWith(pandasOnSparkConfPrefix)).toMap) }.getOrElse { No information to display for query {executionId} } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (ae8aaec9cde -> 8c6a54d70a7)
This is an automated email from the ASF dual-hosted git repository. hvanhovell pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from ae8aaec9cde [SPARK-43977][CONNECT] Fix unexpected check result of `dev/connect-jvm-client-mima-check` add 8c6a54d70a7 [SPARK-43919][SQL] Extract JSON functionality out of Row No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/streaming/progress.scala | 5 +- docs/sql-migration-guide.md| 1 + project/MimaExcludes.scala | 6 + .../src/main/scala/org/apache/spark/sql/Row.scala | 106 + .../org/apache/spark/sql/util/ToJsonUtil.scala | 129 + .../scala/org/apache/spark/sql/RowJsonSuite.scala | 7 +- .../test/scala/org/apache/spark/sql/RowTest.scala | 3 +- .../org/apache/spark/sql/streaming/progress.scala | 3 +- 8 files changed, 149 insertions(+), 111 deletions(-) create mode 100644 sql/catalyst/src/main/scala/org/apache/spark/sql/util/ToJsonUtil.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (2a82b42bcb1 -> ae8aaec9cde)
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 2a82b42bcb1 [SPARK-43097][ML] New pyspark ML logistic regression estimator implemented on top of distributor add ae8aaec9cde [SPARK-43977][CONNECT] Fix unexpected check result of `dev/connect-jvm-client-mima-check` No new revisions were added by this update. Summary of changes: .../spark/sql/connect/client/CheckConnectJvmClientCompatibility.scala | 2 +- dev/connect-jvm-client-mima-check | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-43097][ML] New pyspark ML logistic regression estimator implemented on top of distributor
This is an automated email from the ASF dual-hosted git repository. weichenxu123 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 2a82b42bcb1 [SPARK-43097][ML] New pyspark ML logistic regression estimator implemented on top of distributor 2a82b42bcb1 is described below commit 2a82b42bcb100fade622d3d08ef5d316425d3e5a Author: Weichen Xu AuthorDate: Tue Jun 6 21:53:46 2023 +0800 [SPARK-43097][ML] New pyspark ML logistic regression estimator implemented on top of distributor ### What changes were proposed in this pull request? This PR takes over https://github.com/apache/spark/pull/40748 Closes https://github.com/apache/spark/pull/40748 ### Why are the changes needed? Distributed ML on spark connect project. ### Does this PR introduce _any_ user-facing change? Yes. ### How was this patch tested? Unit tests. Manually testing code: Start local pyspark shell by: ``` bin/pyspark --remote "local[2]"# spark connect mode ``` or ``` bin/pyspark --master "local[2]" # legacy mode ``` to launch pyspark shell, Then, paste following code to pyspark shell: ``` from pyspark.mlv2.classification import LogisticRegression as LORV2 lorv2 = LORV2(maxIter=2, numTrainWorkers=2) from pyspark.ml.linalg import Vectors df = spark.createDataFrame( [ (1.0, Vectors.dense(0.0, 5.0)), (0.0, Vectors.dense(1.0, 2.0)), (1.0, Vectors.dense(2.0, 1.0)), (0.0, Vectors.dense(3.0, 3.0)), ] * 100, ["label", "features"], ) model.transform(df).show(truncate=False) model.transform(df.toPandas()) model.set(model.probabilityCol, "") model.transform(df).show(truncate=False) model.transform(df.toPandas()) ``` Closes #41383 from WeichenXu123/lor-torch-2. Authored-by: Weichen Xu Signed-off-by: Weichen Xu --- dev/sparktestsupport/modules.py| 2 + python/pyspark/ml/param/_shared_params_code_gen.py | 24 ++ python/pyspark/ml/param/shared.py | 89 ++ python/pyspark/ml/torch/data.py| 13 +- python/pyspark/ml/torch/distributor.py | 13 +- python/pyspark/mlv2/base.py| 75 - python/pyspark/mlv2/classification.py | 306 + .../tests/connect/test_parity_classification.py| 41 +++ python/pyspark/mlv2/tests/test_classification.py | 139 ++ 9 files changed, 696 insertions(+), 6 deletions(-) diff --git a/dev/sparktestsupport/modules.py b/dev/sparktestsupport/modules.py index aa755033aa4..ecc471fd700 100644 --- a/dev/sparktestsupport/modules.py +++ b/dev/sparktestsupport/modules.py @@ -609,6 +609,7 @@ pyspark_ml = Module( "pyspark.mlv2.tests.test_summarizer", "pyspark.mlv2.tests.test_evaluation", "pyspark.mlv2.tests.test_feature", +"pyspark.mlv2.tests.test_classification", ], excluded_python_implementations=[ "PyPy" # Skip these tests under PyPy since they require numpy and it isn't available there @@ -823,6 +824,7 @@ pyspark_connect = Module( "pyspark.mlv2.tests.connect.test_parity_summarizer", "pyspark.mlv2.tests.connect.test_parity_evaluation", "pyspark.mlv2.tests.connect.test_parity_feature", +"pyspark.mlv2.tests.connect.test_parity_classification", ], excluded_python_implementations=[ "PyPy" # Skip these tests under PyPy since they require numpy, pandas, and pyarrow and diff --git a/python/pyspark/ml/param/_shared_params_code_gen.py b/python/pyspark/ml/param/_shared_params_code_gen.py index 5df1782084a..2bec3a5053f 100644 --- a/python/pyspark/ml/param/_shared_params_code_gen.py +++ b/python/pyspark/ml/param/_shared_params_code_gen.py @@ -332,6 +332,30 @@ if __name__ == "__main__": "0.0", "TypeConverters.toFloat", ), +( +"numTrainWorkers", +"number of training workers", +"1", +"TypeConverters.toInt", +), +( +"batchSize", +"number of training batch size", +None, +"TypeConverters.toInt", +), +( +"learningRate", +"learning rate for training", +None, +"TypeConverters.toFloat", +), +( +"momentum", +"momentum for training optimizer", +None, +"TypeConverters.toFloat", +), ] code = [] diff --git a/python/pyspark/ml/param/shared.py b/python/pyspark/ml/param/shared.py index fcfced2e566..d61d206d219 100644 --- a/python/pyspark/ml/param/shared.py +++
[spark] branch master updated (89d44d092af -> 0a4f226280c)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 89d44d092af [SPARK-43510][YARN] Fix YarnAllocator internal state when adding running executor after processing completed containers add 0a4f226280c [SPARK-43205][SQL][FOLLOWUP] add ExpressionWithUnresolvedIdentifier to simplify code No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/analysis/Analyzer.scala | 20 +-- ...useUtil.scala => ResolveIdentifierClause.scala} | 37 ++-- .../spark/sql/catalyst/analysis/unresolved.scala | 83 +++-- .../spark/sql/catalyst/parser/AstBuilder.scala | 192 ++--- .../spark/sql/catalyst/trees/TreePatterns.scala| 2 +- .../double-quoted-identifiers-disabled.sql.out | 4 +- .../ansi/double-quoted-identifiers-enabled.sql.out | 8 +- .../double-quoted-identifiers.sql.out | 4 +- .../analyzer-results/identifier-clause.sql.out | 6 +- .../postgreSQL/create_view.sql.out | 4 +- .../double-quoted-identifiers-disabled.sql.out | 4 +- .../ansi/double-quoted-identifiers-enabled.sql.out | 8 +- .../results/double-quoted-identifiers.sql.out | 4 +- .../sql-tests/results/identifier-clause.sql.out| 6 +- .../results/postgreSQL/create_view.sql.out | 4 +- .../org/apache/spark/sql/hive/InsertSuite.scala| 2 +- 16 files changed, 178 insertions(+), 210 deletions(-) rename sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/{IdentifierClauseUtil.scala => ResolveIdentifierClause.scala} (56%) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.4 updated: [SPARK-43510][YARN] Fix YarnAllocator internal state when adding running executor after processing completed containers
This is an automated email from the ASF dual-hosted git repository. tgraves pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.4 by this push: new 63d59956024 [SPARK-43510][YARN] Fix YarnAllocator internal state when adding running executor after processing completed containers 63d59956024 is described below commit 63d59956024781b062791dda9990a6043b6a10c1 Author: manuzhang AuthorDate: Tue Jun 6 08:28:52 2023 -0500 [SPARK-43510][YARN] Fix YarnAllocator internal state when adding running executor after processing completed containers ### What changes were proposed in this pull request? Keep track of completed container ids in YarnAllocator and don't update internal state of a container if it's already completed. ### Why are the changes needed? YarnAllocator updates internal state adding running executors after executor launch in a separate thread. That can happen after the containers are already completed (e.g. preempted) and processed by YarnAllocator. Then YarnAllocator mistakenly thinks there are still running executors which are already lost. As a result, application hangs without any running executors. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Added UT. Closes #41173 from manuzhang/spark-43510. Authored-by: manuzhang Signed-off-by: Thomas Graves (cherry picked from commit 89d44d092af4ae53fec296ca6569e240ad4c2bc5) Signed-off-by: Thomas Graves --- .../apache/spark/deploy/yarn/YarnAllocator.scala | 42 ++ .../spark/deploy/yarn/YarnAllocatorSuite.scala | 28 ++- 2 files changed, 54 insertions(+), 16 deletions(-) diff --git a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala index 313b19f919d..dede5501a39 100644 --- a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala +++ b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala @@ -90,6 +90,9 @@ private[yarn] class YarnAllocator( @GuardedBy("this") private val releasedContainers = collection.mutable.HashSet[ContainerId]() + @GuardedBy("this") + private val launchingExecutorContainerIds = collection.mutable.HashSet[ContainerId]() + @GuardedBy("this") private val runningExecutorsPerResourceProfileId = new HashMap[Int, mutable.Set[String]]() @@ -742,19 +745,6 @@ private[yarn] class YarnAllocator( logInfo(s"Launching container $containerId on host $executorHostname " + s"for executor with ID $executorId for ResourceProfile Id $rpId") - def updateInternalState(): Unit = synchronized { -getOrUpdateRunningExecutorForRPId(rpId).add(executorId) -getOrUpdateNumExecutorsStartingForRPId(rpId).decrementAndGet() -executorIdToContainer(executorId) = container -containerIdToExecutorIdAndResourceProfileId(container.getId) = (executorId, rpId) - -val localallocatedHostToContainersMap = getOrUpdateAllocatedHostToContainersMapForRPId(rpId) -val containerSet = localallocatedHostToContainersMap.getOrElseUpdate(executorHostname, - new HashSet[ContainerId]) -containerSet += containerId -allocatedContainerToHostMap.put(containerId, executorHostname) - } - val rp = rpIdToResourceProfile(rpId) val defaultResources = ResourceProfile.getDefaultProfileExecutorResources(sparkConf) val containerMem = rp.executorResources.get(ResourceProfile.MEMORY). @@ -767,6 +757,7 @@ private[yarn] class YarnAllocator( val rpRunningExecs = getOrUpdateRunningExecutorForRPId(rpId).size if (rpRunningExecs < getOrUpdateTargetNumExecutorsForRPId(rpId)) { getOrUpdateNumExecutorsStartingForRPId(rpId).incrementAndGet() +launchingExecutorContainerIds.add(containerId) if (launchContainers) { launcherPool.execute(() => { try { @@ -784,10 +775,11 @@ private[yarn] class YarnAllocator( localResources, rp.id ).run() - updateInternalState() + updateInternalState(rpId, executorId, container) } catch { case e: Throwable => getOrUpdateNumExecutorsStartingForRPId(rpId).decrementAndGet() +launchingExecutorContainerIds.remove(containerId) if (NonFatal(e)) { logError(s"Failed to launch executor $executorId on container $containerId", e) // Assigned container should be released immediately @@ -800,7 +792,7 @@ private[yarn] class YarnAllocator( }) } else { // For test only - updateInternalState() +
[spark] branch master updated: [SPARK-43510][YARN] Fix YarnAllocator internal state when adding running executor after processing completed containers
This is an automated email from the ASF dual-hosted git repository. tgraves pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 89d44d092af [SPARK-43510][YARN] Fix YarnAllocator internal state when adding running executor after processing completed containers 89d44d092af is described below commit 89d44d092af4ae53fec296ca6569e240ad4c2bc5 Author: manuzhang AuthorDate: Tue Jun 6 08:28:52 2023 -0500 [SPARK-43510][YARN] Fix YarnAllocator internal state when adding running executor after processing completed containers ### What changes were proposed in this pull request? Keep track of completed container ids in YarnAllocator and don't update internal state of a container if it's already completed. ### Why are the changes needed? YarnAllocator updates internal state adding running executors after executor launch in a separate thread. That can happen after the containers are already completed (e.g. preempted) and processed by YarnAllocator. Then YarnAllocator mistakenly thinks there are still running executors which are already lost. As a result, application hangs without any running executors. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Added UT. Closes #41173 from manuzhang/spark-43510. Authored-by: manuzhang Signed-off-by: Thomas Graves --- .../apache/spark/deploy/yarn/YarnAllocator.scala | 42 ++ .../spark/deploy/yarn/YarnAllocatorSuite.scala | 28 ++- 2 files changed, 54 insertions(+), 16 deletions(-) diff --git a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala index b6ee21ed817..19c06f95731 100644 --- a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala +++ b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala @@ -91,6 +91,9 @@ private[yarn] class YarnAllocator( @GuardedBy("this") private val releasedContainers = collection.mutable.HashSet[ContainerId]() + @GuardedBy("this") + private val launchingExecutorContainerIds = collection.mutable.HashSet[ContainerId]() + @GuardedBy("this") private val runningExecutorsPerResourceProfileId = new HashMap[Int, mutable.Set[String]]() @@ -738,19 +741,6 @@ private[yarn] class YarnAllocator( logInfo(s"Launching container $containerId on host $executorHostname " + s"for executor with ID $executorId for ResourceProfile Id $rpId") - def updateInternalState(): Unit = synchronized { -getOrUpdateRunningExecutorForRPId(rpId).add(executorId) -getOrUpdateNumExecutorsStartingForRPId(rpId).decrementAndGet() -executorIdToContainer(executorId) = container -containerIdToExecutorIdAndResourceProfileId(container.getId) = (executorId, rpId) - -val localallocatedHostToContainersMap = getOrUpdateAllocatedHostToContainersMapForRPId(rpId) -val containerSet = localallocatedHostToContainersMap.getOrElseUpdate(executorHostname, - new HashSet[ContainerId]) -containerSet += containerId -allocatedContainerToHostMap.put(containerId, executorHostname) - } - val rp = rpIdToResourceProfile(rpId) val defaultResources = ResourceProfile.getDefaultProfileExecutorResources(sparkConf) val containerMem = rp.executorResources.get(ResourceProfile.MEMORY). @@ -763,6 +753,7 @@ private[yarn] class YarnAllocator( val rpRunningExecs = getOrUpdateRunningExecutorForRPId(rpId).size if (rpRunningExecs < getOrUpdateTargetNumExecutorsForRPId(rpId)) { getOrUpdateNumExecutorsStartingForRPId(rpId).incrementAndGet() +launchingExecutorContainerIds.add(containerId) if (launchContainers) { launcherPool.execute(() => { try { @@ -780,10 +771,11 @@ private[yarn] class YarnAllocator( localResources, rp.id ).run() - updateInternalState() + updateInternalState(rpId, executorId, container) } catch { case e: Throwable => getOrUpdateNumExecutorsStartingForRPId(rpId).decrementAndGet() +launchingExecutorContainerIds.remove(containerId) if (NonFatal(e)) { logError(s"Failed to launch executor $executorId on container $containerId", e) // Assigned container should be released immediately @@ -796,7 +788,7 @@ private[yarn] class YarnAllocator( }) } else { // For test only - updateInternalState() + updateInternalState(rpId, executorId, container) } } else { logInfo(("Skip launching
[spark] branch master updated: [SPARK-43913][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_[2426-2432]
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 0cd5ca5a7b3 [SPARK-43913][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_[2426-2432] 0cd5ca5a7b3 is described below commit 0cd5ca5a7b31f65a005c8ee2e90a6b4a29623ba7 Author: Jiaan Geng AuthorDate: Tue Jun 6 10:28:48 2023 +0300 [SPARK-43913][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_[2426-2432] ### What changes were proposed in this pull request? The pr aims to assign names to the error class `_LEGACY_ERROR_TEMP_[2426-2432]`. ### Why are the changes needed? Improve the error framework. ### Does this PR introduce _any_ user-facing change? 'No'. ### How was this patch tested? Exists test cases. Closes #41424 from beliefer/SPARK-43913. Authored-by: Jiaan Geng Signed-off-by: Max Gekk --- core/src/main/resources/error/error-classes.json | 58 -- .../sql/catalyst/analysis/CheckAnalysis.scala | 51 +++ .../sql/catalyst/analysis/AnalysisErrorSuite.scala | 20 .../CreateTablePartitioningValidationSuite.scala | 22 .../negative-cases/invalid-correlation.sql.out | 6 ++- .../negative-cases/invalid-correlation.sql.out | 6 ++- 6 files changed, 93 insertions(+), 70 deletions(-) diff --git a/core/src/main/resources/error/error-classes.json b/core/src/main/resources/error/error-classes.json index de80415d85b..8c3c076ce74 100644 --- a/core/src/main/resources/error/error-classes.json +++ b/core/src/main/resources/error/error-classes.json @@ -660,6 +660,11 @@ "The event time has the invalid type , but expected \"TIMESTAMP\"." ] }, + "EXPRESSION_TYPE_IS_NOT_ORDERABLE" : { +"message" : [ + "Column expression cannot be sorted because its type is not orderable." +] + }, "FAILED_EXECUTE_UDF" : { "message" : [ "Failed to execute user defined function (: () => )." @@ -1541,6 +1546,24 @@ ], "sqlState" : "42803" }, + "MISSING_ATTRIBUTES" : { +"message" : [ + "Resolved attribute(s) missing from in operator ." +], +"subClass" : { + "RESOLVED_ATTRIBUTE_APPEAR_IN_OPERATION" : { +"message" : [ + "Attribute(s) with the same name appear in the operation: .", + "Please check if the right attribute(s) are used." +] + }, + "RESOLVED_ATTRIBUTE_MISSING_FROM_INPUT" : { +"message" : [ + "" +] + } +} + }, "MISSING_GROUP_BY" : { "message" : [ "The query does not include a GROUP BY clause. Add GROUP BY or turn it into the window functions using OVER clauses." @@ -1945,6 +1968,11 @@ "Query [id = , runId = ] terminated with exception: " ] }, + "SUM_OF_LIMIT_AND_OFFSET_EXCEEDS_MAX_INT" : { +"message" : [ + "The sum of the LIMIT clause and the OFFSET clause must not be greater than the maximum 32-bit integer value (2,147,483,647) but found limit = , offset = ." +] + }, "TABLE_OR_VIEW_ALREADY_EXISTS" : { "message" : [ "Cannot create table or view because it already exists.", @@ -2310,6 +2338,11 @@ "Parameter markers in unexpected statement: . Parameter markers must only be used in a query, or DML statement." ] }, + "PARTITION_WITH_NESTED_COLUMN_IS_UNSUPPORTED" : { +"message" : [ + "Invalid partitioning: is missing or is in a map or array." +] + }, "PIVOT_AFTER_GROUP_BY" : { "message" : [ "PIVOT clause following a GROUP BY clause. Consider pushing the GROUP BY into a subquery." @@ -5525,31 +5558,6 @@ "failed to evaluate expression : " ] }, - "_LEGACY_ERROR_TEMP_2426" : { -"message" : [ - "nondeterministic expression should not appear in grouping expression." -] - }, - "_LEGACY_ERROR_TEMP_2427" : { -"message" : [ - "sorting is not supported for columns of type ." -] - }, - "_LEGACY_ERROR_TEMP_2428" : { -"message" : [ - "The sum of the LIMIT clause and the OFFSET clause must not be greater than the maximum 32-bit integer value (2,147,483,647) but found limit = , offset = ." -] - }, - "_LEGACY_ERROR_TEMP_2431" : { -"message" : [ - "Invalid partitioning: is missing or is in a map or array." -] - }, - "_LEGACY_ERROR_TEMP_2432" : { -"message" : [ - "" -] - }, "_LEGACY_ERROR_TEMP_2433" : { "message" : [ "Only a single table generating function is allowed in a SELECT clause, found:", diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala index 594c0b666e8..9124890d4af 100644
[spark] branch master updated: [SPARK-43962][SQL] Improve error messages: `CANNOT_DECODE_URL`, `CANNOT_MERGE_INCOMPATIBLE_DATA_TYPE`, `CANNOT_PARSE_DECIMAL`, `CANNOT_READ_FILE_FOOTER`, `CANNOT_RECOGNI
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 61e6227fb62 [SPARK-43962][SQL] Improve error messages: `CANNOT_DECODE_URL`, `CANNOT_MERGE_INCOMPATIBLE_DATA_TYPE`, `CANNOT_PARSE_DECIMAL`, `CANNOT_READ_FILE_FOOTER`, `CANNOT_RECOGNIZE_HIVE_TYPE` 61e6227fb62 is described below commit 61e6227fb62c2452b01ac595c2bc43d4492686a0 Author: itholic AuthorDate: Tue Jun 6 10:25:24 2023 +0300 [SPARK-43962][SQL] Improve error messages: `CANNOT_DECODE_URL`, `CANNOT_MERGE_INCOMPATIBLE_DATA_TYPE`, `CANNOT_PARSE_DECIMAL`, `CANNOT_READ_FILE_FOOTER`, `CANNOT_RECOGNIZE_HIVE_TYPE` ### What changes were proposed in this pull request? This PR proposes to improve error messages for `CANNOT_DECODE_URL`, `CANNOT_MERGE_INCOMPATIBLE_DATA_TYPE`, `CANNOT_PARSE_DECIMAL`, `CANNOT_READ_FILE_FOOTER`, `CANNOT_RECOGNIZE_HIVE_TYPE`. **NOTE:** This PR is an experimental work that utilizes LLM to enhance error messages. The script was created using the `openai` Python library from OpenAI, and minimal review was conducted by author after executing the script. The five improved error messages were selected by the author. ### Why are the changes needed? For improving errors to make them more actionable and usable. ### Does this PR introduce _any_ user-facing change? No API changes, only error message improvement. ### How was this patch tested? The existing CI should pass. Closes #41455 from itholic/emi_1-5. Authored-by: itholic Signed-off-by: Max Gekk --- core/src/main/resources/error/error-classes.json | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/core/src/main/resources/error/error-classes.json b/core/src/main/resources/error/error-classes.json index bceea072e92..de80415d85b 100644 --- a/core/src/main/resources/error/error-classes.json +++ b/core/src/main/resources/error/error-classes.json @@ -114,7 +114,7 @@ }, "CANNOT_DECODE_URL" : { "message" : [ - "Cannot decode url : ." + "The provided URL cannot be decoded: . Please ensure that the URL is properly formatted and try again." ], "sqlState" : "22546" }, @@ -130,7 +130,7 @@ }, "CANNOT_MERGE_INCOMPATIBLE_DATA_TYPE" : { "message" : [ - "Failed to merge incompatible data types and ." + "Failed to merge incompatible data types and . Please check the data types of the columns being merged and ensure that they are compatible. If necessary, consider casting the columns to compatible data types before attempting the merge." ], "sqlState" : "42825" }, @@ -153,7 +153,7 @@ }, "CANNOT_PARSE_DECIMAL" : { "message" : [ - "Cannot parse decimal." + "Cannot parse decimal. Please ensure that the input is a valid number with optional decimal point or comma separators." ], "sqlState" : "22018" }, @@ -176,12 +176,12 @@ }, "CANNOT_READ_FILE_FOOTER" : { "message" : [ - "Could not read footer for file: ." + "Could not read footer for file: . Please ensure that the file is in either ORC or Parquet format. If not, please convert it to a valid format. If the file is in the valid format, please check if it is corrupt. If it is, you can choose to either ignore it or fix the corruption." ] }, "CANNOT_RECOGNIZE_HIVE_TYPE" : { "message" : [ - "Cannot recognize hive type string: , column: ." + "Cannot recognize hive type string: , column: . The specified data type for the field cannot be recognized by Spark SQL. Please check the data type of the specified field and ensure that it is a valid Spark SQL data type. Refer to the Spark SQL documentation for a list of valid data types and their format. If the data type is correct, please ensure that you are using a supported version of Spark SQL." ], "sqlState" : "429BB" }, - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (1df1d7661a3 -> d0fe6d4b796)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 1df1d7661a3 [SPARK-43516][ML][PYTHON] Update MLv2 Transformer interfaces add d0fe6d4b796 [SPARK-43948][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_[0050|0057|0058|0059] No new revisions were added by this update. Summary of changes: core/src/main/resources/error/error-classes.json | 47 +- .../spark/sql/errors/QueryParsingErrors.scala | 15 --- .../spark/sql/catalyst/parser/DDLParserSuite.scala | 2 +- .../spark/sql/execution/SparkSqlParser.scala | 2 +- .../command/v2/AlterTableReplaceColumnsSuite.scala | 17 +++- .../org/apache/spark/sql/sources/InsertSuite.scala | 12 +++--- 6 files changed, 61 insertions(+), 34 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (51a919ea8d6 -> 1df1d7661a3)
This is an automated email from the ASF dual-hosted git repository. weichenxu123 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 51a919ea8d6 [SPARK-43973][SS][UI] Structured Streaming UI should display failed queries correctly add 1df1d7661a3 [SPARK-43516][ML][PYTHON] Update MLv2 Transformer interfaces No new revisions were added by this update. Summary of changes: python/pyspark/mlv2/base.py | 24 - python/pyspark/mlv2/feature.py| 16 +++--- python/pyspark/mlv2/tests/test_feature.py | 6 ++ python/pyspark/mlv2/util.py | 35 --- 4 files changed, 51 insertions(+), 30 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org