date:20221122

[spark] branch master updated: [SPARK-41206][SQL][FOLLOWUP] Make result of `checkColumnNameDuplication` stable to fix `COLUMN_ALREADY_EXISTS` check failed with Scala 2.13

2022-11-22 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new e42d3836af9 [SPARK-41206][SQL][FOLLOWUP] Make result of 
`checkColumnNameDuplication` stable to fix `COLUMN_ALREADY_EXISTS` check failed 
with Scala 2.13
e42d3836af9 is described below

commit e42d3836af9eea881868c80f3c2cbc29e1d7b4f1
Author: yangjie01 
AuthorDate: Wed Nov 23 09:13:56 2022 +0300

[SPARK-41206][SQL][FOLLOWUP] Make result of `checkColumnNameDuplication` 
stable to fix `COLUMN_ALREADY_EXISTS` check failed with Scala 2.13

### What changes were proposed in this pull request?
This pr add a sort when `columnAlreadyExistsError` will be thrown to make 
the result of `SchemaUtils#checkColumnNameDuplication` stable.

### Why are the changes needed?
Fix `COLUMN_ALREADY_EXISTS` check failed with Scala 2.13

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?

- Pass GA
- Manual test:

```
dev/change-scala-version.sh 2.13
build/sbt clean "sql/testOnly org.apache.spark.sql.DataFrameSuite" 
-Pscala-2.13
build/sbt  "sql/testOnly 
org.apache.spark.sql.execution.datasources.json.JsonV1Suite" -Pscala-2.13
build/sbt  "sql/testOnly 
org.apache.spark.sql.execution.datasources.json.JsonV2Suite" -Pscala-2.13
build/sbt  "sql/testOnly 
org.apache.spark.sql.execution.datasources.json.JsonLegacyTimeParserSuite" 
-Pscala-2.13
```
All tests passed

Closes #38764 from LuciferYang/SPARK-41206.

Authored-by: yangjie01 
Signed-off-by: Max Gekk 
---
 sql/catalyst/src/main/scala/org/apache/spark/sql/util/SchemaUtils.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/util/SchemaUtils.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/util/SchemaUtils.scala
index aac96a9b56c..d202900381a 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/util/SchemaUtils.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/util/SchemaUtils.scala
@@ -107,7 +107,7 @@ private[spark] object SchemaUtils {
 val names = if (caseSensitiveAnalysis) columnNames else 
columnNames.map(_.toLowerCase)
 // scalastyle:on caselocale
 if (names.distinct.length != names.length) {
-  val columnName = names.groupBy(identity).collectFirst {
+  val columnName = names.groupBy(identity).toSeq.sortBy(_._1).collectFirst 
{
 case (x, ys) if ys.length > 1 => x
   }.get
   throw QueryCompilationErrors.columnAlreadyExistsError(columnName)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-40948][SQL][FOLLOWUP] Restore PATH_NOT_FOUND

2022-11-22 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 17816170316 [SPARK-40948][SQL][FOLLOWUP] Restore PATH_NOT_FOUND
17816170316 is described below

commit 178161703161ccf49b37baf9a667630865367950
Author: itholic 
AuthorDate: Wed Nov 23 08:38:20 2022 +0300

[SPARK-40948][SQL][FOLLOWUP] Restore PATH_NOT_FOUND

### What changes were proposed in this pull request?

The original PR to introduce the error class `PATH_NOT_FOUND` was reverted 
since it breaks the tests in different test env.

This PR proposes to restore it back.

### Why are the changes needed?

Restoring the reverted changes with proper fix.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

The existing CI should pass.

Closes #38575 from itholic/SPARK-40948-followup.

Authored-by: itholic 
Signed-off-by: Max Gekk 
---
 R/pkg/tests/fulltests/test_sparkSQL.R  | 14 +---
 core/src/main/resources/error/error-classes.json   | 10 +++---
 .../spark/sql/errors/QueryCompilationErrors.scala  |  2 +-
 .../org/apache/spark/sql/DataFrameSuite.scala  | 37 --
 .../execution/datasources/DataSourceSuite.scala| 28 +---
 5 files changed, 52 insertions(+), 39 deletions(-)

diff --git a/R/pkg/tests/fulltests/test_sparkSQL.R 
b/R/pkg/tests/fulltests/test_sparkSQL.R
index 534ec07abac..d2b6220b2e7 100644
--- a/R/pkg/tests/fulltests/test_sparkSQL.R
+++ b/R/pkg/tests/fulltests/test_sparkSQL.R
@@ -3990,12 +3990,16 @@ test_that("Call DataFrameWriter.load() API in Java 
without path and check argume
   expect_error(read.df(source = "json"),
paste("Error in load : analysis error - Unable to infer schema 
for JSON.",
  "It must be specified manually"))
-  expect_error(read.df("arbitrary_path"), "Error in load : analysis error - 
Path does not exist")
-  expect_error(read.json("arbitrary_path"), "Error in json : analysis error - 
Path does not exist")
-  expect_error(read.text("arbitrary_path"), "Error in text : analysis error - 
Path does not exist")
-  expect_error(read.orc("arbitrary_path"), "Error in orc : analysis error - 
Path does not exist")
+  expect_error(read.df("arbitrary_path"),
+   "Error in load : analysis error - \\[PATH_NOT_FOUND\\].*")
+  expect_error(read.json("arbitrary_path"),
+   "Error in json : analysis error - \\[PATH_NOT_FOUND\\].*")
+  expect_error(read.text("arbitrary_path"),
+   "Error in text : analysis error - \\[PATH_NOT_FOUND\\].*")
+  expect_error(read.orc("arbitrary_path"),
+   "Error in orc : analysis error - \\[PATH_NOT_FOUND\\].*")
   expect_error(read.parquet("arbitrary_path"),
-  "Error in parquet : analysis error - Path does not exist")
+   "Error in parquet : analysis error - \\[PATH_NOT_FOUND\\].*")
 
   # Arguments checking in R side.
   expect_error(read.df(path = c(3)),
diff --git a/core/src/main/resources/error/error-classes.json 
b/core/src/main/resources/error/error-classes.json
index 77d155bfc21..12c97c2108a 100644
--- a/core/src/main/resources/error/error-classes.json
+++ b/core/src/main/resources/error/error-classes.json
@@ -912,6 +912,11 @@
 ],
 "sqlState" : "42000"
   },
+  "PATH_NOT_FOUND" : {
+"message" : [
+  "Path does not exist: ."
+]
+  },
   "PIVOT_VALUE_DATA_TYPE_MISMATCH" : {
 "message" : [
   "Invalid pivot value '': value data type  does not 
match pivot column data type "
@@ -2332,11 +2337,6 @@
   "Unable to infer schema for . It must be specified manually."
 ]
   },
-  "_LEGACY_ERROR_TEMP_1130" : {
-"message" : [
-  "Path does not exist: ."
-]
-  },
   "_LEGACY_ERROR_TEMP_1131" : {
 "message" : [
   "Data source  does not support  output mode."
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala
index 63c912c15a1..0f245597efd 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala
@@ -1378,7 +1378,7 @@ private[sql] object QueryCompilationErrors extends 
QueryErrorsBase {
 
   def dataPathNotExistError(path: String): Throwable = {
 new AnalysisException(
-  errorClass = "_LEGACY_ERROR_TEMP_1130",
+  errorClass = "PATH_NOT_FOUND",
   messageParameters = Map("path" -> path))
   }
 
diff --git a/sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala
index aab68065319..589ee1bea27 100644
---

[spark] branch master updated (2c8da56d6f3 -> d275a83c582)

2022-11-22 Thread ruifengz

This is an automated email from the ASF dual-hosted git repository.

ruifengz pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 2c8da56d6f3 [SPARK-41201][CONNECT][PYTHON] Implement 
`DataFrame.SelectExpr` in Python client
 add d275a83c582 [SPARK-41201][CONNECT][PYTHON][TEST][FOLLOWUP] Reenable 
test_fill_na

No new revisions were added by this update.

Summary of changes:
 python/pyspark/sql/tests/connect/test_connect_basic.py | 1 -
 1 file changed, 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (04f026caa8a -> 2c8da56d6f3)

2022-11-22 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 04f026caa8a [SPARK-41054][UI][CORE] Support RocksDB as KVStore in live 
UI
 add 2c8da56d6f3 [SPARK-41201][CONNECT][PYTHON] Implement 
`DataFrame.SelectExpr` in Python client

No new revisions were added by this update.

Summary of changes:
 python/pyspark/sql/connect/column.py   | 16 +
 python/pyspark/sql/connect/dataframe.py| 24 +++
 .../sql/tests/connect/test_connect_basic.py| 28 --
 3 files changed, 66 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (25133684ef9 -> 04f026caa8a)

2022-11-22 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 25133684ef9 [SPARK-35531][SQL] Update hive table stats without 
unnecessary convert
 add 04f026caa8a [SPARK-41054][UI][CORE] Support RocksDB as KVStore in live 
UI

No new revisions were added by this update.

Summary of changes:
 .../spark/deploy/history/FsHistoryProvider.scala   | 41 +++---
 .../org/apache/spark/internal/config/Status.scala  |  7 +++
 .../org/apache/spark/status/AppStatusStore.scala   | 12 +-
 .../scala/org/apache/spark/status/KVUtils.scala| 50 +-
 .../apache/spark/status/AppStatusStoreSuite.scala  | 19 ++--
 docs/configuration.md  |  9 
 6 files changed, 96 insertions(+), 42 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (3bff4f6339f -> 25133684ef9)

2022-11-22 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 3bff4f6339f [SPARK-41135][SQL] Rename `UNSUPPORTED_EMPTY_LOCATION` to 
`INVALID_EMPTY_LOCATION`
 add 25133684ef9 [SPARK-35531][SQL] Update hive table stats without 
unnecessary convert

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/hive/HiveExternalCatalog.scala  | 15 ++-
 .../org/apache/spark/sql/hive/client/HiveClient.scala|  6 ++
 .../apache/spark/sql/hive/client/HiveClientImpl.scala| 16 +++-
 3 files changed, 27 insertions(+), 10 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-41135][SQL] Rename `UNSUPPORTED_EMPTY_LOCATION` to `INVALID_EMPTY_LOCATION`

2022-11-22 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 3bff4f6339f [SPARK-41135][SQL] Rename `UNSUPPORTED_EMPTY_LOCATION` to 
`INVALID_EMPTY_LOCATION`
3bff4f6339f is described below

commit 3bff4f6339f54d19362a0c03ef2b396e47881fd8
Author: itholic 
AuthorDate: Tue Nov 22 13:14:13 2022 +0300

[SPARK-41135][SQL] Rename `UNSUPPORTED_EMPTY_LOCATION` to 
`INVALID_EMPTY_LOCATION`

### What changes were proposed in this pull request?

This PR proposes to rename `UNSUPPORTED_EMPTY_LOCATION` to 
`INVALID_EMPTY_LOCATION`.

### Why are the changes needed?

Error class and its message should be clear/brief, and should not 
ambiguously specific when it illustrates things that possibly supported in the 
future.

### Does this PR introduce _any_ user-facing change?

Error message changes

From
```
"Unsupported empty location."
```

To
```
"The location name cannot be empty string, but `...` was given."
```

### How was this patch tested?

```
$ build/sbt “sql/testOnly org.apache.spark.sql.SQLQueryTestSuite*”
$ build/sbt "core/testOnly *SparkThrowableSuite"
```

Closes #38650 from itholic/SPARK-41135.

Authored-by: itholic 
Signed-off-by: Max Gekk 
---
 core/src/main/resources/error/error-classes.json   | 10 +-
 .../org/apache/spark/sql/errors/QueryExecutionErrors.scala |  6 +++---
 .../spark/sql/catalyst/analysis/ResolveSessionCatalog.scala|  4 ++--
 .../sql/execution/datasources/v2/DataSourceV2Strategy.scala|  4 ++--
 .../execution/command/AlterNamespaceSetLocationSuiteBase.scala |  4 ++--
 .../spark/sql/execution/command/CreateNamespaceSuiteBase.scala |  4 ++--
 6 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/core/src/main/resources/error/error-classes.json 
b/core/src/main/resources/error/error-classes.json
index ae76a52e40f..77d155bfc21 100644
--- a/core/src/main/resources/error/error-classes.json
+++ b/core/src/main/resources/error/error-classes.json
@@ -676,6 +676,11 @@
 ],
 "sqlState" : "42000"
   },
+  "INVALID_EMPTY_LOCATION" : {
+"message" : [
+  "The location name cannot be empty string, but `` was given."
+]
+  },
   "INVALID_FIELD_NAME" : {
 "message" : [
   "Field name  is invalid:  is not a struct."
@@ -1181,11 +1186,6 @@
   }
 }
   },
-  "UNSUPPORTED_EMPTY_LOCATION" : {
-"message" : [
-  "Unsupported empty location."
-]
-  },
   "UNSUPPORTED_FEATURE" : {
 "message" : [
   "The feature is not supported:"
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala
index 6081d9f32a5..5db54f7f4cf 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala
@@ -2806,10 +2806,10 @@ private[sql] object QueryExecutionErrors extends 
QueryErrorsBase {
 "size" -> elementSize.toString))
   }
 
-  def unsupportedEmptyLocationError(): SparkIllegalArgumentException = {
+  def invalidEmptyLocationError(location: String): 
SparkIllegalArgumentException = {
 new SparkIllegalArgumentException(
-  errorClass = "UNSUPPORTED_EMPTY_LOCATION",
-  messageParameters = Map.empty)
+  errorClass = "INVALID_EMPTY_LOCATION",
+  messageParameters = Map("location" -> location))
   }
 
   def malformedProtobufMessageDetectedInMessageParsingError(e: Throwable): 
Throwable = {
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala
index d00d07150b0..d7e26b04ce4 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala
@@ -134,7 +134,7 @@ class ResolveSessionCatalog(val catalogManager: 
CatalogManager)
 
 case SetNamespaceLocation(DatabaseInSessionCatalog(db), location) if 
conf.useV1Command =>
   if (StringUtils.isEmpty(location)) {
-throw QueryExecutionErrors.unsupportedEmptyLocationError()
+throw QueryExecutionErrors.invalidEmptyLocationError(location)
   }
   AlterDatabaseSetLocationCommand(db, location)
 
@@ -243,7 +243,7 @@ class ResolveSessionCatalog(val catalogManager: 
CatalogManager)
   val location = c.properties.get(SupportsNamespaces.PROP_LOCATION)
   val newProperties = c.properties -- 
CatalogV2Util.NAMESPACE_RESERVED_PROPERTIES
   if (location.isDefined && location.get.isEmpty) {
-

[spark] branch master updated: [SPARK-41212][CONNECT][PYTHON] Implement `DataFrame.isEmpty`

2022-11-22 Thread ruifengz

This is an automated email from the ASF dual-hosted git repository.

ruifengz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new e16dd7c0cfe [SPARK-41212][CONNECT][PYTHON] Implement 
`DataFrame.isEmpty`
e16dd7c0cfe is described below

commit e16dd7c0cfed8745a49bd46c30c05fc82ac292d5
Author: Ruifeng Zheng 
AuthorDate: Tue Nov 22 17:15:42 2022 +0800

[SPARK-41212][CONNECT][PYTHON] Implement `DataFrame.isEmpty`

### What changes were proposed in this pull request?
Implement `DataFrame.isEmpty`

### Why are the changes needed?
API Coverage

### Does this PR introduce _any_ user-facing change?
Yes, new api

### How was this patch tested?
added UT

Closes #38734 from zhengruifeng/connect_df_is_empty.

Authored-by: Ruifeng Zheng 
Signed-off-by: Ruifeng Zheng 
---
 python/pyspark/sql/connect/dataframe.py| 12 
 python/pyspark/sql/tests/connect/test_connect_basic.py |  5 +
 2 files changed, 17 insertions(+)

diff --git a/python/pyspark/sql/connect/dataframe.py 
b/python/pyspark/sql/connect/dataframe.py
index b8fa640a42f..579403299fe 100644
--- a/python/pyspark/sql/connect/dataframe.py
+++ b/python/pyspark/sql/connect/dataframe.py
@@ -125,6 +125,18 @@ class DataFrame(object):
 new_frame._plan = plan
 return new_frame
 
+def isEmpty(self) -> bool:
+"""Returns ``True`` if this :class:`DataFrame` is empty.
+
+.. versionadded:: 3.4.0
+
+Returns
+---
+bool
+Whether it's empty DataFrame or not.
+"""
+return len(self.take(1)) == 0
+
 def select(self, *cols: "ExpressionOrString") -> "DataFrame":
 return DataFrame.withPlan(plan.Project(self._plan, *cols), 
session=self._session)
 
diff --git a/python/pyspark/sql/tests/connect/test_connect_basic.py 
b/python/pyspark/sql/tests/connect/test_connect_basic.py
index 8bf28bf8a75..49973ba70c3 100644
--- a/python/pyspark/sql/tests/connect/test_connect_basic.py
+++ b/python/pyspark/sql/tests/connect/test_connect_basic.py
@@ -346,6 +346,11 @@ class SparkConnectTests(SparkConnectSQLTestCase):
 self.assertEqual(1, len(pdf.columns))  # one column
 self.assertEqual("X", pdf.columns[0])
 
+def test_is_empty(self):
+# SPARK-41212: Test is empty
+self.assertFalse(self.connect.sql("SELECT 1 AS X").isEmpty())
+self.assertTrue(self.connect.sql("SELECT 1 AS X LIMIT 0").isEmpty())
+
 def test_session(self):
 self.assertEqual(self.connect, self.connect.sql("SELECT 
1").sparkSession())
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-41206][SQL][FOLLOWUP] Make result of `checkColumnNameDuplication` stable to fix `COLUMN_ALREADY_EXISTS` check failed with Scala 2.13

[spark] branch master updated: [SPARK-40948][SQL][FOLLOWUP] Restore PATH_NOT_FOUND

[spark] branch master updated (2c8da56d6f3 -> d275a83c582)

[spark] branch master updated (04f026caa8a -> 2c8da56d6f3)

[spark] branch master updated (25133684ef9 -> 04f026caa8a)

[spark] branch master updated (3bff4f6339f -> 25133684ef9)

[spark] branch master updated: [SPARK-41135][SQL] Rename `UNSUPPORTED_EMPTY_LOCATION` to `INVALID_EMPTY_LOCATION`

[spark] branch master updated: [SPARK-41212][CONNECT][PYTHON] Implement `DataFrame.isEmpty`

8 matches

Site Navigation

Mail list logo

Footer information