[spark] branch master updated: [SPARK-41206][SQL][FOLLOWUP] Make result of `checkColumnNameDuplication` stable to fix `COLUMN_ALREADY_EXISTS` check failed with Scala 2.13
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new e42d3836af9 [SPARK-41206][SQL][FOLLOWUP] Make result of `checkColumnNameDuplication` stable to fix `COLUMN_ALREADY_EXISTS` check failed with Scala 2.13 e42d3836af9 is described below commit e42d3836af9eea881868c80f3c2cbc29e1d7b4f1 Author: yangjie01 AuthorDate: Wed Nov 23 09:13:56 2022 +0300 [SPARK-41206][SQL][FOLLOWUP] Make result of `checkColumnNameDuplication` stable to fix `COLUMN_ALREADY_EXISTS` check failed with Scala 2.13 ### What changes were proposed in this pull request? This pr add a sort when `columnAlreadyExistsError` will be thrown to make the result of `SchemaUtils#checkColumnNameDuplication` stable. ### Why are the changes needed? Fix `COLUMN_ALREADY_EXISTS` check failed with Scala 2.13 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - Pass GA - Manual test: ``` dev/change-scala-version.sh 2.13 build/sbt clean "sql/testOnly org.apache.spark.sql.DataFrameSuite" -Pscala-2.13 build/sbt "sql/testOnly org.apache.spark.sql.execution.datasources.json.JsonV1Suite" -Pscala-2.13 build/sbt "sql/testOnly org.apache.spark.sql.execution.datasources.json.JsonV2Suite" -Pscala-2.13 build/sbt "sql/testOnly org.apache.spark.sql.execution.datasources.json.JsonLegacyTimeParserSuite" -Pscala-2.13 ``` All tests passed Closes #38764 from LuciferYang/SPARK-41206. Authored-by: yangjie01 Signed-off-by: Max Gekk --- sql/catalyst/src/main/scala/org/apache/spark/sql/util/SchemaUtils.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/util/SchemaUtils.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/util/SchemaUtils.scala index aac96a9b56c..d202900381a 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/util/SchemaUtils.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/util/SchemaUtils.scala @@ -107,7 +107,7 @@ private[spark] object SchemaUtils { val names = if (caseSensitiveAnalysis) columnNames else columnNames.map(_.toLowerCase) // scalastyle:on caselocale if (names.distinct.length != names.length) { - val columnName = names.groupBy(identity).collectFirst { + val columnName = names.groupBy(identity).toSeq.sortBy(_._1).collectFirst { case (x, ys) if ys.length > 1 => x }.get throw QueryCompilationErrors.columnAlreadyExistsError(columnName) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-40948][SQL][FOLLOWUP] Restore PATH_NOT_FOUND
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 17816170316 [SPARK-40948][SQL][FOLLOWUP] Restore PATH_NOT_FOUND 17816170316 is described below commit 178161703161ccf49b37baf9a667630865367950 Author: itholic AuthorDate: Wed Nov 23 08:38:20 2022 +0300 [SPARK-40948][SQL][FOLLOWUP] Restore PATH_NOT_FOUND ### What changes were proposed in this pull request? The original PR to introduce the error class `PATH_NOT_FOUND` was reverted since it breaks the tests in different test env. This PR proposes to restore it back. ### Why are the changes needed? Restoring the reverted changes with proper fix. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? The existing CI should pass. Closes #38575 from itholic/SPARK-40948-followup. Authored-by: itholic Signed-off-by: Max Gekk --- R/pkg/tests/fulltests/test_sparkSQL.R | 14 +--- core/src/main/resources/error/error-classes.json | 10 +++--- .../spark/sql/errors/QueryCompilationErrors.scala | 2 +- .../org/apache/spark/sql/DataFrameSuite.scala | 37 -- .../execution/datasources/DataSourceSuite.scala| 28 +--- 5 files changed, 52 insertions(+), 39 deletions(-) diff --git a/R/pkg/tests/fulltests/test_sparkSQL.R b/R/pkg/tests/fulltests/test_sparkSQL.R index 534ec07abac..d2b6220b2e7 100644 --- a/R/pkg/tests/fulltests/test_sparkSQL.R +++ b/R/pkg/tests/fulltests/test_sparkSQL.R @@ -3990,12 +3990,16 @@ test_that("Call DataFrameWriter.load() API in Java without path and check argume expect_error(read.df(source = "json"), paste("Error in load : analysis error - Unable to infer schema for JSON.", "It must be specified manually")) - expect_error(read.df("arbitrary_path"), "Error in load : analysis error - Path does not exist") - expect_error(read.json("arbitrary_path"), "Error in json : analysis error - Path does not exist") - expect_error(read.text("arbitrary_path"), "Error in text : analysis error - Path does not exist") - expect_error(read.orc("arbitrary_path"), "Error in orc : analysis error - Path does not exist") + expect_error(read.df("arbitrary_path"), + "Error in load : analysis error - \\[PATH_NOT_FOUND\\].*") + expect_error(read.json("arbitrary_path"), + "Error in json : analysis error - \\[PATH_NOT_FOUND\\].*") + expect_error(read.text("arbitrary_path"), + "Error in text : analysis error - \\[PATH_NOT_FOUND\\].*") + expect_error(read.orc("arbitrary_path"), + "Error in orc : analysis error - \\[PATH_NOT_FOUND\\].*") expect_error(read.parquet("arbitrary_path"), - "Error in parquet : analysis error - Path does not exist") + "Error in parquet : analysis error - \\[PATH_NOT_FOUND\\].*") # Arguments checking in R side. expect_error(read.df(path = c(3)), diff --git a/core/src/main/resources/error/error-classes.json b/core/src/main/resources/error/error-classes.json index 77d155bfc21..12c97c2108a 100644 --- a/core/src/main/resources/error/error-classes.json +++ b/core/src/main/resources/error/error-classes.json @@ -912,6 +912,11 @@ ], "sqlState" : "42000" }, + "PATH_NOT_FOUND" : { +"message" : [ + "Path does not exist: ." +] + }, "PIVOT_VALUE_DATA_TYPE_MISMATCH" : { "message" : [ "Invalid pivot value '': value data type does not match pivot column data type " @@ -2332,11 +2337,6 @@ "Unable to infer schema for . It must be specified manually." ] }, - "_LEGACY_ERROR_TEMP_1130" : { -"message" : [ - "Path does not exist: ." -] - }, "_LEGACY_ERROR_TEMP_1131" : { "message" : [ "Data source does not support output mode." diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala index 63c912c15a1..0f245597efd 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala @@ -1378,7 +1378,7 @@ private[sql] object QueryCompilationErrors extends QueryErrorsBase { def dataPathNotExistError(path: String): Throwable = { new AnalysisException( - errorClass = "_LEGACY_ERROR_TEMP_1130", + errorClass = "PATH_NOT_FOUND", messageParameters = Map("path" -> path)) } diff --git a/sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala index aab68065319..589ee1bea27 100644 ---
[spark] branch master updated (2c8da56d6f3 -> d275a83c582)
This is an automated email from the ASF dual-hosted git repository. ruifengz pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 2c8da56d6f3 [SPARK-41201][CONNECT][PYTHON] Implement `DataFrame.SelectExpr` in Python client add d275a83c582 [SPARK-41201][CONNECT][PYTHON][TEST][FOLLOWUP] Reenable test_fill_na No new revisions were added by this update. Summary of changes: python/pyspark/sql/tests/connect/test_connect_basic.py | 1 - 1 file changed, 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (04f026caa8a -> 2c8da56d6f3)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 04f026caa8a [SPARK-41054][UI][CORE] Support RocksDB as KVStore in live UI add 2c8da56d6f3 [SPARK-41201][CONNECT][PYTHON] Implement `DataFrame.SelectExpr` in Python client No new revisions were added by this update. Summary of changes: python/pyspark/sql/connect/column.py | 16 + python/pyspark/sql/connect/dataframe.py| 24 +++ .../sql/tests/connect/test_connect_basic.py| 28 -- 3 files changed, 66 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (25133684ef9 -> 04f026caa8a)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 25133684ef9 [SPARK-35531][SQL] Update hive table stats without unnecessary convert add 04f026caa8a [SPARK-41054][UI][CORE] Support RocksDB as KVStore in live UI No new revisions were added by this update. Summary of changes: .../spark/deploy/history/FsHistoryProvider.scala | 41 +++--- .../org/apache/spark/internal/config/Status.scala | 7 +++ .../org/apache/spark/status/AppStatusStore.scala | 12 +- .../scala/org/apache/spark/status/KVUtils.scala| 50 +- .../apache/spark/status/AppStatusStoreSuite.scala | 19 ++-- docs/configuration.md | 9 6 files changed, 96 insertions(+), 42 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (3bff4f6339f -> 25133684ef9)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 3bff4f6339f [SPARK-41135][SQL] Rename `UNSUPPORTED_EMPTY_LOCATION` to `INVALID_EMPTY_LOCATION` add 25133684ef9 [SPARK-35531][SQL] Update hive table stats without unnecessary convert No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/hive/HiveExternalCatalog.scala | 15 ++- .../org/apache/spark/sql/hive/client/HiveClient.scala| 6 ++ .../apache/spark/sql/hive/client/HiveClientImpl.scala| 16 +++- 3 files changed, 27 insertions(+), 10 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-41135][SQL] Rename `UNSUPPORTED_EMPTY_LOCATION` to `INVALID_EMPTY_LOCATION`
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 3bff4f6339f [SPARK-41135][SQL] Rename `UNSUPPORTED_EMPTY_LOCATION` to `INVALID_EMPTY_LOCATION` 3bff4f6339f is described below commit 3bff4f6339f54d19362a0c03ef2b396e47881fd8 Author: itholic AuthorDate: Tue Nov 22 13:14:13 2022 +0300 [SPARK-41135][SQL] Rename `UNSUPPORTED_EMPTY_LOCATION` to `INVALID_EMPTY_LOCATION` ### What changes were proposed in this pull request? This PR proposes to rename `UNSUPPORTED_EMPTY_LOCATION` to `INVALID_EMPTY_LOCATION`. ### Why are the changes needed? Error class and its message should be clear/brief, and should not ambiguously specific when it illustrates things that possibly supported in the future. ### Does this PR introduce _any_ user-facing change? Error message changes From ``` "Unsupported empty location." ``` To ``` "The location name cannot be empty string, but `...` was given." ``` ### How was this patch tested? ``` $ build/sbt “sql/testOnly org.apache.spark.sql.SQLQueryTestSuite*” $ build/sbt "core/testOnly *SparkThrowableSuite" ``` Closes #38650 from itholic/SPARK-41135. Authored-by: itholic Signed-off-by: Max Gekk --- core/src/main/resources/error/error-classes.json | 10 +- .../org/apache/spark/sql/errors/QueryExecutionErrors.scala | 6 +++--- .../spark/sql/catalyst/analysis/ResolveSessionCatalog.scala| 4 ++-- .../sql/execution/datasources/v2/DataSourceV2Strategy.scala| 4 ++-- .../execution/command/AlterNamespaceSetLocationSuiteBase.scala | 4 ++-- .../spark/sql/execution/command/CreateNamespaceSuiteBase.scala | 4 ++-- 6 files changed, 16 insertions(+), 16 deletions(-) diff --git a/core/src/main/resources/error/error-classes.json b/core/src/main/resources/error/error-classes.json index ae76a52e40f..77d155bfc21 100644 --- a/core/src/main/resources/error/error-classes.json +++ b/core/src/main/resources/error/error-classes.json @@ -676,6 +676,11 @@ ], "sqlState" : "42000" }, + "INVALID_EMPTY_LOCATION" : { +"message" : [ + "The location name cannot be empty string, but `` was given." +] + }, "INVALID_FIELD_NAME" : { "message" : [ "Field name is invalid: is not a struct." @@ -1181,11 +1186,6 @@ } } }, - "UNSUPPORTED_EMPTY_LOCATION" : { -"message" : [ - "Unsupported empty location." -] - }, "UNSUPPORTED_FEATURE" : { "message" : [ "The feature is not supported:" diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala index 6081d9f32a5..5db54f7f4cf 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala @@ -2806,10 +2806,10 @@ private[sql] object QueryExecutionErrors extends QueryErrorsBase { "size" -> elementSize.toString)) } - def unsupportedEmptyLocationError(): SparkIllegalArgumentException = { + def invalidEmptyLocationError(location: String): SparkIllegalArgumentException = { new SparkIllegalArgumentException( - errorClass = "UNSUPPORTED_EMPTY_LOCATION", - messageParameters = Map.empty) + errorClass = "INVALID_EMPTY_LOCATION", + messageParameters = Map("location" -> location)) } def malformedProtobufMessageDetectedInMessageParsingError(e: Throwable): Throwable = { diff --git a/sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala b/sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala index d00d07150b0..d7e26b04ce4 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala @@ -134,7 +134,7 @@ class ResolveSessionCatalog(val catalogManager: CatalogManager) case SetNamespaceLocation(DatabaseInSessionCatalog(db), location) if conf.useV1Command => if (StringUtils.isEmpty(location)) { -throw QueryExecutionErrors.unsupportedEmptyLocationError() +throw QueryExecutionErrors.invalidEmptyLocationError(location) } AlterDatabaseSetLocationCommand(db, location) @@ -243,7 +243,7 @@ class ResolveSessionCatalog(val catalogManager: CatalogManager) val location = c.properties.get(SupportsNamespaces.PROP_LOCATION) val newProperties = c.properties -- CatalogV2Util.NAMESPACE_RESERVED_PROPERTIES if (location.isDefined && location.get.isEmpty) { -
[spark] branch master updated: [SPARK-41212][CONNECT][PYTHON] Implement `DataFrame.isEmpty`
This is an automated email from the ASF dual-hosted git repository. ruifengz pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new e16dd7c0cfe [SPARK-41212][CONNECT][PYTHON] Implement `DataFrame.isEmpty` e16dd7c0cfe is described below commit e16dd7c0cfed8745a49bd46c30c05fc82ac292d5 Author: Ruifeng Zheng AuthorDate: Tue Nov 22 17:15:42 2022 +0800 [SPARK-41212][CONNECT][PYTHON] Implement `DataFrame.isEmpty` ### What changes were proposed in this pull request? Implement `DataFrame.isEmpty` ### Why are the changes needed? API Coverage ### Does this PR introduce _any_ user-facing change? Yes, new api ### How was this patch tested? added UT Closes #38734 from zhengruifeng/connect_df_is_empty. Authored-by: Ruifeng Zheng Signed-off-by: Ruifeng Zheng --- python/pyspark/sql/connect/dataframe.py| 12 python/pyspark/sql/tests/connect/test_connect_basic.py | 5 + 2 files changed, 17 insertions(+) diff --git a/python/pyspark/sql/connect/dataframe.py b/python/pyspark/sql/connect/dataframe.py index b8fa640a42f..579403299fe 100644 --- a/python/pyspark/sql/connect/dataframe.py +++ b/python/pyspark/sql/connect/dataframe.py @@ -125,6 +125,18 @@ class DataFrame(object): new_frame._plan = plan return new_frame +def isEmpty(self) -> bool: +"""Returns ``True`` if this :class:`DataFrame` is empty. + +.. versionadded:: 3.4.0 + +Returns +--- +bool +Whether it's empty DataFrame or not. +""" +return len(self.take(1)) == 0 + def select(self, *cols: "ExpressionOrString") -> "DataFrame": return DataFrame.withPlan(plan.Project(self._plan, *cols), session=self._session) diff --git a/python/pyspark/sql/tests/connect/test_connect_basic.py b/python/pyspark/sql/tests/connect/test_connect_basic.py index 8bf28bf8a75..49973ba70c3 100644 --- a/python/pyspark/sql/tests/connect/test_connect_basic.py +++ b/python/pyspark/sql/tests/connect/test_connect_basic.py @@ -346,6 +346,11 @@ class SparkConnectTests(SparkConnectSQLTestCase): self.assertEqual(1, len(pdf.columns)) # one column self.assertEqual("X", pdf.columns[0]) +def test_is_empty(self): +# SPARK-41212: Test is empty +self.assertFalse(self.connect.sql("SELECT 1 AS X").isEmpty()) +self.assertTrue(self.connect.sql("SELECT 1 AS X LIMIT 0").isEmpty()) + def test_session(self): self.assertEqual(self.connect, self.connect.sql("SELECT 1").sparkSession()) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org