date:20221107

[spark] branch master updated (f692771444d -> 9cd55052cce)

2022-11-07 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from f692771444d [SPARK-39883][SQL][TESTS] Add DataFrame function parity 
check
 add 9cd55052cce Revert "[SPARK-40948][SQL] Introduce new error class: 
PATH_NOT_FOUND"

No new revisions were added by this update.

Summary of changes:
 R/pkg/tests/fulltests/test_sparkSQL.R  | 19 +++---
 core/src/main/resources/error/error-classes.json   | 10 ++---
 .../spark/sql/errors/QueryCompilationErrors.scala  |  2 +-
 .../org/apache/spark/sql/DataFrameSuite.scala  | 44 --
 .../execution/datasources/DataSourceSuite.scala| 28 ++
 5 files changed, 39 insertions(+), 64 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (e9503c84c4d -> f692771444d)

2022-11-07 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from e9503c84c4d [SPARK-41031][BUILD] Upgrade `xz` to 1.9 for `avro` 1.11.1
 add f692771444d [SPARK-39883][SQL][TESTS] Add DataFrame function parity 
check

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/DataFrameFunctionsSuite.scala | 82 +-
 1 file changed, 81 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.3 updated: [SPARK-41031][BUILD] Upgrade `xz` to 1.9 for `avro` 1.11.1

2022-11-07 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch branch-3.3
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.3 by this push:
 new b01dd4c7519 [SPARK-41031][BUILD] Upgrade `xz` to 1.9 for `avro` 1.11.1
b01dd4c7519 is described below

commit b01dd4c7519b5ca40969822109453cff7cdf3eff
Author: yangjie01 
AuthorDate: Mon Nov 7 19:36:29 2022 -0600

[SPARK-41031][BUILD] Upgrade `xz` to 1.9 for `avro` 1.11.1

This pr aims to upgrade `xz` to 1.9 for `avro` 1.11.1.

Spark depend on `avro` 1.11.1 and `avro` 1.11.1 use `xz` as an optional 
dependency, we need to manually check `xz` version when upgrading `avro`.


https://github.com/apache/avro/blob/3a9e5a789b5165e0c8c4da799c387fdf84bfb75e/lang/java/pom.xml#L59


https://github.com/apache/avro/blob/3a9e5a789b5165e0c8c4da799c387fdf84bfb75e/lang/java/avro/pom.xml#L238-L242

The  release notes as follows:

- https://git.tukaani.org/?p=xz-java.git;a=blob;f=NEWS;hb=HEAD

No

Pass Github Actions

Closes #38538 from LuciferYang/SPARK-41031.

Authored-by: yangjie01 
Signed-off-by: Sean Owen 
(cherry picked from commit e9503c84c4d8d4b51844a195523ebf064bdf185e)
Signed-off-by: Sean Owen 
---
 dev/deps/spark-deps-hadoop-2-hive-2.3 | 2 +-
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +-
 pom.xml   | 6 +-
 3 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-2-hive-2.3 
b/dev/deps/spark-deps-hadoop-2-hive-2.3
index d517d556feb..6bcd447dc64 100644
--- a/dev/deps/spark-deps-hadoop-2-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-2-hive-2.3
@@ -263,7 +263,7 @@ xbean-asm9-shaded/4.20//xbean-asm9-shaded-4.20.jar
 xercesImpl/2.12.2//xercesImpl-2.12.2.jar
 xml-apis/1.4.01//xml-apis-1.4.01.jar
 xmlenc/0.52//xmlenc-0.52.jar
-xz/1.8//xz-1.8.jar
+xz/1.9//xz-1.9.jar
 zjsonpatch/0.3.0//zjsonpatch-0.3.0.jar
 zookeeper-jute/3.6.2//zookeeper-jute-3.6.2.jar
 zookeeper/3.6.2//zookeeper-3.6.2.jar
diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index 54e7fe23e5b..7429ecab6b9 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -250,7 +250,7 @@ univocity-parsers/2.9.1//univocity-parsers-2.9.1.jar
 velocity/1.5//velocity-1.5.jar
 wildfly-openssl/1.0.7.Final//wildfly-openssl-1.0.7.Final.jar
 xbean-asm9-shaded/4.20//xbean-asm9-shaded-4.20.jar
-xz/1.8//xz-1.8.jar
+xz/1.9//xz-1.9.jar
 zjsonpatch/0.3.0//zjsonpatch-0.3.0.jar
 zookeeper-jute/3.6.2//zookeeper-jute-3.6.2.jar
 zookeeper/3.6.2//zookeeper-3.6.2.jar
diff --git a/pom.xml b/pom.xml
index d6b20512f6d..34043d43758 100644
--- a/pom.xml
+++ b/pom.xml
@@ -1440,10 +1440,14 @@
   
 
   
+  
   
 org.tukaani
 xz
-1.8
+1.9

[spark] branch master updated: [SPARK-41031][BUILD] Upgrade `xz` to 1.9 for `avro` 1.11.1

2022-11-07 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new e9503c84c4d [SPARK-41031][BUILD] Upgrade `xz` to 1.9 for `avro` 1.11.1
e9503c84c4d is described below

commit e9503c84c4d8d4b51844a195523ebf064bdf185e
Author: yangjie01 
AuthorDate: Mon Nov 7 19:36:29 2022 -0600

[SPARK-41031][BUILD] Upgrade `xz` to 1.9 for `avro` 1.11.1

### What changes were proposed in this pull request?
This pr aims to upgrade `xz` to 1.9 for `avro` 1.11.1.

### Why are the changes needed?
Spark depend on `avro` 1.11.1 and `avro` 1.11.1 use `xz` as an optional 
dependency, we need to manually check `xz` version when upgrading `avro`.


https://github.com/apache/avro/blob/3a9e5a789b5165e0c8c4da799c387fdf84bfb75e/lang/java/pom.xml#L59


https://github.com/apache/avro/blob/3a9e5a789b5165e0c8c4da799c387fdf84bfb75e/lang/java/avro/pom.xml#L238-L242

The  release notes as follows:

- https://git.tukaani.org/?p=xz-java.git;a=blob;f=NEWS;hb=HEAD

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass Github Actions

Closes #38538 from LuciferYang/SPARK-41031.

Authored-by: yangjie01 
Signed-off-by: Sean Owen 
---
 dev/deps/spark-deps-hadoop-2-hive-2.3 | 2 +-
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +-
 pom.xml   | 6 +-
 3 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-2-hive-2.3 
b/dev/deps/spark-deps-hadoop-2-hive-2.3
index 283d93a4e60..6b87c27d4bd 100644
--- a/dev/deps/spark-deps-hadoop-2-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-2-hive-2.3
@@ -267,7 +267,7 @@ xbean-asm9-shaded/4.22//xbean-asm9-shaded-4.22.jar
 xercesImpl/2.12.2//xercesImpl-2.12.2.jar
 xml-apis/1.4.01//xml-apis-1.4.01.jar
 xmlenc/0.52//xmlenc-0.52.jar
-xz/1.8//xz-1.8.jar
+xz/1.9//xz-1.9.jar
 zjsonpatch/0.3.0//zjsonpatch-0.3.0.jar
 zookeeper-jute/3.6.2//zookeeper-jute-3.6.2.jar
 zookeeper/3.6.2//zookeeper-3.6.2.jar
diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index fb9beebeaa0..db5af8881c2 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -252,7 +252,7 @@ univocity-parsers/2.9.1//univocity-parsers-2.9.1.jar
 velocity/1.5//velocity-1.5.jar
 wildfly-openssl/1.0.7.Final//wildfly-openssl-1.0.7.Final.jar
 xbean-asm9-shaded/4.22//xbean-asm9-shaded-4.22.jar
-xz/1.8//xz-1.8.jar
+xz/1.9//xz-1.9.jar
 zjsonpatch/0.3.0//zjsonpatch-0.3.0.jar
 zookeeper-jute/3.6.2//zookeeper-jute-3.6.2.jar
 zookeeper/3.6.2//zookeeper-3.6.2.jar
diff --git a/pom.xml b/pom.xml
index 1c494669455..38ba2b14008 100644
--- a/pom.xml
+++ b/pom.xml
@@ -1482,10 +1482,14 @@
   
 
   
+  
   
 org.tukaani
 xz
-1.8
+1.9

[spark] branch master updated: [SPARK-41007][SQL] Add missing serializer for java.math.BigInteger

2022-11-07 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 0087a2a19dd [SPARK-41007][SQL] Add missing serializer for 
java.math.BigInteger
0087a2a19dd is described below

commit 0087a2a19dd081b524e96d6a407d3940cab1f2c0
Author: Daniel Fiterman 
AuthorDate: Mon Nov 7 19:33:21 2022 -0600

[SPARK-41007][SQL] Add missing serializer for java.math.BigInteger

### What changes were proposed in this pull request?

The JavaTypeInference class used by the [Bean 
Encoder](https://spark.apache.org/docs/3.2.0/api/java/org/apache/spark/sql/Encoders.html#bean-java.lang.Class-)
 to create
serialize/deserialize a Java Bean was missing a case statement to serialize 
java.math.BigInteger. This adds the missing case statement.

### Why are the changes needed?

This fixes the bug mentioned in the description

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

- Unit Test
- Manually tested creating a new dataset with a Java Bean containing a 
java.math.BigInteger field

Closes #38500 from dfit99/SPARK-41007.

Authored-by: Daniel Fiterman 
Signed-off-by: Sean Owen 
---
 .../spark/sql/catalyst/JavaTypeInference.scala |  3 ++
 .../sql/catalyst/JavaTypeInferenceSuite.scala  | 42 ++
 2 files changed, 45 insertions(+)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/JavaTypeInference.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/JavaTypeInference.scala
index 903072ae29d..dccaf1c4835 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/JavaTypeInference.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/JavaTypeInference.scala
@@ -424,6 +424,9 @@ object JavaTypeInference {
 
 case c if c == classOf[java.time.Period] => 
createSerializerForJavaPeriod(inputObject)
 
+case c if c == classOf[java.math.BigInteger] =>
+  createSerializerForJavaBigInteger(inputObject)
+
 case c if c == classOf[java.math.BigDecimal] =>
   createSerializerForJavaBigDecimal(inputObject)
 
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/JavaTypeInferenceSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/JavaTypeInferenceSuite.scala
new file mode 100644
index 000..9c1d0c1
--- /dev/null
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/JavaTypeInferenceSuite.scala
@@ -0,0 +1,42 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst
+
+import java.math.BigInteger
+
+import scala.beans.BeanProperty
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.sql.catalyst.expressions.{CheckOverflow, Expression, 
Literal}
+import org.apache.spark.sql.types.DecimalType
+
+class DummyBean() {
+  @BeanProperty var bigInteger = null: BigInteger
+}
+
+class JavaTypeInferenceSuite extends SparkFunSuite {
+
+  test("SPARK-41007: JavaTypeInference returns the correct serializer for 
BigInteger") {
+var serializer = JavaTypeInference.serializerFor(classOf[DummyBean])
+var bigIntegerFieldName: Expression = serializer.children(0)
+assert(bigIntegerFieldName.asInstanceOf[Literal].value.toString == 
"bigInteger")
+var bigIntegerFieldExpression: Expression = serializer.children(1)
+assert(bigIntegerFieldExpression.asInstanceOf[CheckOverflow].dataType ==
+  DecimalType.BigIntDecimal)
+  }
+}


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-41026][CONNECT] Support Repartition in Connect Proto

2022-11-07 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 9695b2cb59b [SPARK-41026][CONNECT] Support Repartition in Connect Proto
9695b2cb59b is described below

commit 9695b2cb59b497709ca0050d754491d935742530
Author: Rui Wang 
AuthorDate: Tue Nov 8 09:00:12 2022 +0800

[SPARK-41026][CONNECT] Support Repartition in Connect Proto

### What changes were proposed in this pull request?

Support `Repartition` in Connect proto, which further supports two API: 
`repartition` (shuffle=true) and `coalesce` (shuffle=false).

### Why are the changes needed?

Improve API coverage.

### Does this PR introduce _any_ user-facing change?

NO

### How was this patch tested?

UT

Closes #38529 from amaliujia/support_repartition_in_proto_connect.

Authored-by: Rui Wang 
Signed-off-by: Wenchen Fan 
---
 .../main/protobuf/spark/connect/relations.proto|  13 +++
 .../org/apache/spark/sql/connect/dsl/package.scala |  18 
 .../sql/connect/planner/SparkConnectPlanner.scala  |   5 +
 .../connect/planner/SparkConnectProtoSuite.scala   |  10 ++
 python/pyspark/sql/connect/proto/relations_pb2.py  | 114 +++--
 python/pyspark/sql/connect/proto/relations_pb2.pyi |  43 
 6 files changed, 147 insertions(+), 56 deletions(-)

diff --git a/connector/connect/src/main/protobuf/spark/connect/relations.proto 
b/connector/connect/src/main/protobuf/spark/connect/relations.proto
index 8edd8911242..36113e2a30c 100644
--- a/connector/connect/src/main/protobuf/spark/connect/relations.proto
+++ b/connector/connect/src/main/protobuf/spark/connect/relations.proto
@@ -46,6 +46,7 @@ message Relation {
 Deduplicate deduplicate = 14;
 Range range = 15;
 SubqueryAlias subquery_alias = 16;
+Repartition repartition = 17;
 
 Unknown unknown = 999;
   }
@@ -241,3 +242,15 @@ message SubqueryAlias {
   // Optional. Qualifier of the alias.
   repeated string qualifier = 3;
 }
+
+// Relation repartition.
+message Repartition {
+  // Required. The input relation.
+  Relation input = 1;
+
+  // Required. Must be positive.
+  int32 num_partitions = 2;
+
+  // Optional. Default value is false.
+  bool shuffle = 3;
+}
diff --git 
a/connector/connect/src/main/scala/org/apache/spark/sql/connect/dsl/package.scala
 
b/connector/connect/src/main/scala/org/apache/spark/sql/connect/dsl/package.scala
index c40a9eed753..2755727de11 100644
--- 
a/connector/connect/src/main/scala/org/apache/spark/sql/connect/dsl/package.scala
+++ 
b/connector/connect/src/main/scala/org/apache/spark/sql/connect/dsl/package.scala
@@ -423,6 +423,24 @@ package object dsl {
   byName))
   .build()
 
+  def coalesce(num: Integer): Relation =
+Relation
+  .newBuilder()
+  .setRepartition(
+Repartition
+  .newBuilder()
+  .setInput(logicalPlan)
+  .setNumPartitions(num)
+  .setShuffle(false))
+  .build()
+
+  def repartition(num: Integer): Relation =
+Relation
+  .newBuilder()
+  .setRepartition(
+
Repartition.newBuilder().setInput(logicalPlan).setNumPartitions(num).setShuffle(true))
+  .build()
+
   private def createSetOperation(
   left: Relation,
   right: Relation,
diff --git 
a/connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala
 
b/connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala
index d2b474711ab..1615fc56ab6 100644
--- 
a/connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala
+++ 
b/connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala
@@ -72,6 +72,7 @@ class SparkConnectPlanner(plan: proto.Relation, session: 
SparkSession) {
   case proto.Relation.RelTypeCase.RANGE => transformRange(rel.getRange)
   case proto.Relation.RelTypeCase.SUBQUERY_ALIAS =>
 transformSubqueryAlias(rel.getSubqueryAlias)
+  case proto.Relation.RelTypeCase.REPARTITION => 
transformRepartition(rel.getRepartition)
   case proto.Relation.RelTypeCase.RELTYPE_NOT_SET =>
 throw new IndexOutOfBoundsException("Expected Relation to be set, but 
is empty.")
   case _ => throw InvalidPlanInput(s"${rel.getUnknown} not supported.")
@@ -107,6 +108,10 @@ class SparkConnectPlanner(plan: proto.Relation, session: 
SparkSession) {
   transformRelation(rel.getInput))
   }
 
+  private def transformRepartition(rel: proto.Repartition): LogicalPlan = {
+logical.Repartition(rel.getNumPartitions, rel.getShuffle, 
transformRelation(rel.getInput))
+  }
+
   private def transformRange(rel: proto.Range): LogicalPlan = {
 val start

[spark] branch master updated: [SPARK-41002][CONNECT][PYTHON] Compatible `take`, `head` and `first` API in Python client

2022-11-07 Thread ruifengz

This is an automated email from the ASF dual-hosted git repository.

ruifengz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 2a361b9ddfa [SPARK-41002][CONNECT][PYTHON] Compatible `take`, `head` 
and `first` API in Python client
2a361b9ddfa is described below

commit 2a361b9ddfa766c719399b35c38f4dafe68353ee
Author: Rui Wang 
AuthorDate: Tue Nov 8 08:30:49 2022 +0800

[SPARK-41002][CONNECT][PYTHON] Compatible `take`, `head` and `first` API in 
Python client

### What changes were proposed in this pull request?

1. Add `take(n)` API.
2. Change `head(n)` API to return `Union[Optional[Row], List[Row]]`.
3. Update `first()` to return `Optional[Row]`.

### Why are the changes needed?

Improve API coverage.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

UT

Closes #38488 from amaliujia/SPARK-41002.

Authored-by: Rui Wang 
Signed-off-by: Ruifeng Zheng 
---
 python/pyspark/sql/connect/dataframe.py| 61 --
 .../sql/tests/connect/test_connect_basic.py| 36 +++--
 2 files changed, 90 insertions(+), 7 deletions(-)

diff --git a/python/pyspark/sql/connect/dataframe.py 
b/python/pyspark/sql/connect/dataframe.py
index b9ba4b99ba0..9eecdbb7145 100644
--- a/python/pyspark/sql/connect/dataframe.py
+++ b/python/pyspark/sql/connect/dataframe.py
@@ -24,6 +24,7 @@ from typing import (
 Tuple,
 Union,
 TYPE_CHECKING,
+overload,
 )
 
 import pandas
@@ -211,14 +212,66 @@ class DataFrame(object):
 plan.Filter(child=self._plan, filter=condition), 
session=self._session
 )
 
-def first(self) -> Optional["pandas.DataFrame"]:
-return self.head(1)
+def first(self) -> Optional[Row]:
+"""Returns the first row as a :class:`Row`.
+
+.. versionadded:: 3.4.0
+
+Returns
+---
+:class:`Row`
+   First row if :class:`DataFrame` is not empty, otherwise ``None``.
+"""
+return self.head()
 
 def groupBy(self, *cols: "ColumnOrString") -> GroupingFrame:
 return GroupingFrame(self, *cols)
 
-def head(self, n: int) -> Optional["pandas.DataFrame"]:
-return self.limit(n).toPandas()
+@overload
+def head(self) -> Optional[Row]:
+...
+
+@overload
+def head(self, n: int) -> List[Row]:
+...
+
+def head(self, n: Optional[int] = None) -> Union[Optional[Row], List[Row]]:
+"""Returns the first ``n`` rows.
+
+.. versionadded:: 3.4.0
+
+Parameters
+--
+n : int, optional
+default 1. Number of rows to return.
+
+Returns
+---
+If n is greater than 1, return a list of :class:`Row`.
+If n is 1, return a single Row.
+"""
+if n is None:
+rs = self.head(1)
+return rs[0] if rs else None
+return self.take(n)
+
+def take(self, num: int) -> List[Row]:
+"""Returns the first ``num`` rows as a :class:`list` of :class:`Row`.
+
+.. versionadded:: 3.4.0
+
+Parameters
+--
+num : int
+Number of records to return. Will return this number of records
+or whataver number is available.
+
+Returns
+---
+list
+List of rows
+"""
+return self.limit(num).collect()
 
 # TODO: extend `on` to also be type List[ColumnRef].
 def join(
diff --git a/python/pyspark/sql/tests/connect/test_connect_basic.py 
b/python/pyspark/sql/tests/connect/test_connect_basic.py
index 18a752ee19d..a0f046907f7 100644
--- a/python/pyspark/sql/tests/connect/test_connect_basic.py
+++ b/python/pyspark/sql/tests/connect/test_connect_basic.py
@@ -46,6 +46,7 @@ class SparkConnectSQLTestCase(ReusedPySparkTestCase):
 if have_pandas:
 connect: RemoteSparkSession
 tbl_name: str
+tbl_name_empty: str
 df_text: "DataFrame"
 
 @classmethod
@@ -61,6 +62,7 @@ class SparkConnectSQLTestCase(ReusedPySparkTestCase):
 cls.df_text = cls.sc.parallelize(cls.testDataStr).toDF()
 
 cls.tbl_name = "test_connect_basic_table_1"
+cls.tbl_name_empty = "test_connect_basic_table_empty"
 
 # Cleanup test data
 cls.spark_connect_clean_up_test_data()
@@ -80,10 +82,21 @@ class SparkConnectSQLTestCase(ReusedPySparkTestCase):
 # Since we might create multiple Spark sessions, we need to create 
global temporary view
 # that is specifically maintained in the "global_temp" schema.
 df.write.saveAsTable(cls.tbl_name)
+empty_table_schema = StructType(
+[
+StructField("firstname", StringType(), True),
+StructField("middlename", StringType(), True),
+

[spark] branch master updated (8c7a8466e8e -> 4bbdca60049)

2022-11-07 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 8c7a8466e8e [MINOR][SQL] Replace `new SparkException(errorClass = 
"INTERNAL_ERROR", ...)` by `SparkException.internalError`
 add 4bbdca60049 [SPARK-41030][BUILD] Upgrade `Apache Ivy` to 2.5.1

No new revisions were added by this update.

Summary of changes:
 dev/deps/spark-deps-hadoop-2-hive-2.3 | 2 +-
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +-
 pom.xml   | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch dependabot/maven/org.apache.ivy-ivy-2.5.1 created (now 22051105d51)

2022-11-07 Thread github-bot

This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a change to branch dependabot/maven/org.apache.ivy-ivy-2.5.1
in repository https://gitbox.apache.org/repos/asf/spark.git


  at 22051105d51 Bump ivy from 2.5.0 to 2.5.1

No new revisions were added by this update.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [MINOR][SQL] Replace `new SparkException(errorClass = "INTERNAL_ERROR", ...)` by `SparkException.internalError`

2022-11-07 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 8c7a8466e8e [MINOR][SQL] Replace `new SparkException(errorClass = 
"INTERNAL_ERROR", ...)` by `SparkException.internalError`
8c7a8466e8e is described below

commit 8c7a8466e8ecd99ffc517b8cacb17d10cbc763a2
Author: panbingkun 
AuthorDate: Mon Nov 7 21:00:06 2022 +0300

[MINOR][SQL] Replace `new SparkException(errorClass = "INTERNAL_ERROR", 
...)` by `SparkException.internalError`

### What changes were proposed in this pull request?
The pr aims to replace `new SparkException(errorClass = "INTERNAL_ERROR", 
...)` with `SparkException.internalError`

### Why are the changes needed?
The changes improve the error framework.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA.

Closes #38532 from panbingkun/minor_internal_error.

Authored-by: panbingkun 
Signed-off-by: Max Gekk 
---
 .../scala/org/apache/spark/sql/execution/QueryExecution.scala | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala
index 8bf5d3d317b..798a219d243 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala
@@ -495,11 +495,9 @@ object QueryExecution {
*/
   private[sql] def toInternalError(msg: String, e: Throwable): Throwable = e 
match {
 case e @ (_: java.lang.NullPointerException | _: java.lang.AssertionError) 
=>
-  new SparkException(
-errorClass = "INTERNAL_ERROR",
-messageParameters = Map("message" -> (msg +
-  " Please, fill a bug report in, and provide the full stack trace.")),
-cause = e)
+  SparkException.internalError(
+msg + " Please, fill a bug report in, and provide the full stack 
trace.",
+e)
 case e: Throwable =>
   e
   }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (57d49255676 -> eb6d1980fa8)

2022-11-07 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 57d49255676 [SPARK-40948][SQL] Introduce new error class: 
PATH_NOT_FOUND
 add eb6d1980fa8 [SPARK-41023][BUILD] Upgrade Jackson to 2.14.0

No new revisions were added by this update.

Summary of changes:
 dev/deps/spark-deps-hadoop-2-hive-2.3 | 16 
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 16 
 pom.xml   |  4 ++--
 3 files changed, 18 insertions(+), 18 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-40948][SQL] Introduce new error class: PATH_NOT_FOUND

2022-11-07 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 57d49255676 [SPARK-40948][SQL] Introduce new error class: 
PATH_NOT_FOUND
57d49255676 is described below

commit 57d492556768eb341f525ce7eb5c934089fa9e7e
Author: itholic 
AuthorDate: Mon Nov 7 14:13:13 2022 +0300

[SPARK-40948][SQL] Introduce new error class: PATH_NOT_FOUND

### What changes were proposed in this pull request?

This PR proposes to introduce new error class `DATA_PATH_NOT_EXIST`, by 
updating the existing legacy temp error class `_LEGACY_ERROR_TEMP_1130 `.

### Why are the changes needed?

We should use appropriate error class name that matches the error message.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

The existing CI should pass.

Closes #38422 from itholic/LEGACY_MIGRATE.

Authored-by: itholic 
Signed-off-by: Max Gekk 
---
 R/pkg/tests/fulltests/test_sparkSQL.R  | 19 +++---
 core/src/main/resources/error/error-classes.json   | 10 ++---
 .../spark/sql/errors/QueryCompilationErrors.scala  |  2 +-
 .../org/apache/spark/sql/DataFrameSuite.scala  | 44 ++
 .../execution/datasources/DataSourceSuite.scala| 28 --
 5 files changed, 64 insertions(+), 39 deletions(-)

diff --git a/R/pkg/tests/fulltests/test_sparkSQL.R 
b/R/pkg/tests/fulltests/test_sparkSQL.R
index 534ec07abac..91a2c51660b 100644
--- a/R/pkg/tests/fulltests/test_sparkSQL.R
+++ b/R/pkg/tests/fulltests/test_sparkSQL.R
@@ -3990,12 +3990,21 @@ test_that("Call DataFrameWriter.load() API in Java 
without path and check argume
   expect_error(read.df(source = "json"),
paste("Error in load : analysis error - Unable to infer schema 
for JSON.",
  "It must be specified manually"))
-  expect_error(read.df("arbitrary_path"), "Error in load : analysis error - 
Path does not exist")
-  expect_error(read.json("arbitrary_path"), "Error in json : analysis error - 
Path does not exist")
-  expect_error(read.text("arbitrary_path"), "Error in text : analysis error - 
Path does not exist")
-  expect_error(read.orc("arbitrary_path"), "Error in orc : analysis error - 
Path does not exist")
+  expect_error(read.df("arbitrary_path"),
+   paste("Error in load : analysis error - [PATH_NOT_FOUND] Path 
does not exist:",
+ "file:/__w/spark/spark/arbitrary_path."), fixed = TRUE)
+  expect_error(read.json("arbitrary_path"),
+   paste("Error in json : analysis error - [PATH_NOT_FOUND] Path 
does not exist:",
+ "file:/__w/spark/spark/arbitrary_path."), fixed = TRUE)
+  expect_error(read.text("arbitrary_path"),
+   paste("Error in text : analysis error - [PATH_NOT_FOUND] Path 
does not exist:",
+ "file:/__w/spark/spark/arbitrary_path."), fixed = TRUE)
+  expect_error(read.orc("arbitrary_path"),
+   paste("Error in orc : analysis error - [PATH_NOT_FOUND] Path 
does not exist:",
+ "file:/__w/spark/spark/arbitrary_path."), fixed = TRUE)
   expect_error(read.parquet("arbitrary_path"),
-  "Error in parquet : analysis error - Path does not exist")
+   paste("Error in parquet : analysis error - [PATH_NOT_FOUND] 
Path does not exist:",
+ "file:/__w/spark/spark/arbitrary_path."), fixed = TRUE)
 
   # Arguments checking in R side.
   expect_error(read.df(path = c(3)),
diff --git a/core/src/main/resources/error/error-classes.json 
b/core/src/main/resources/error/error-classes.json
index ceb3e4ed5b1..73652a1ca78 100644
--- a/core/src/main/resources/error/error-classes.json
+++ b/core/src/main/resources/error/error-classes.json
@@ -806,6 +806,11 @@
 ],
 "sqlState" : "42000"
   },
+  "PATH_NOT_FOUND" : {
+"message" : [
+  "Path does not exist: ."
+]
+  },
   "PIVOT_VALUE_DATA_TYPE_MISMATCH" : {
 "message" : [
   "Invalid pivot value '': value data type  does not 
match pivot column data type "
@@ -2226,11 +2231,6 @@
   "Unable to infer schema for . It must be specified manually."
 ]
   },
-  "_LEGACY_ERROR_TEMP_1130" : {
-"message" : [
-  "Path does not exist: ."
-]
-  },
   "_LEGACY_ERROR_TEMP_1131" : {
 "message" : [
   "Data source  does not support  output mode."
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala
index b56e1957f77..4056052c81e 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala
@@ -1388,7

[spark] branch master updated: [SPARK-40875][CONNECT] Improve aggregate in Connect DSL

2022-11-07 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new eac736e1a62 [SPARK-40875][CONNECT] Improve aggregate in Connect DSL
eac736e1a62 is described below

commit eac736e1a62bf707cd3103a5c94df1d5a45617df
Author: Rui Wang 
AuthorDate: Mon Nov 7 18:05:59 2022 +0800

[SPARK-40875][CONNECT] Improve aggregate in Connect DSL

### What changes were proposed in this pull request?

This PR adds the aggregate expressions (or named result expressions) for 
Aggregate in Connect proto and DSL. On the server side, this PR also 
differentiates named expression (e.g. with `alias`) and non-named expression 
(so server will wraps `UnresolvedAlias` and Catalyst will generate alias for 
such expression).

### Why are the changes needed?

Improve API coverage.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

UT

Closes #38527 from amaliujia/add_aggregate_expression_to_dsl.

Authored-by: Rui Wang 
Signed-off-by: Wenchen Fan 
---
 .../main/protobuf/spark/connect/relations.proto|  7 +--
 .../org/apache/spark/sql/connect/dsl/package.scala | 12 +-
 .../sql/connect/planner/SparkConnectPlanner.scala  | 20 -
 .../connect/planner/SparkConnectPlannerSuite.scala | 15 +--
 .../connect/planner/SparkConnectProtoSuite.scala   | 17 
 python/pyspark/sql/connect/plan.py |  9 ++--
 python/pyspark/sql/connect/proto/relations_pb2.py  | 50 +++---
 python/pyspark/sql/connect/proto/relations_pb2.pyi | 31 ++
 8 files changed, 81 insertions(+), 80 deletions(-)

diff --git a/connector/connect/src/main/protobuf/spark/connect/relations.proto 
b/connector/connect/src/main/protobuf/spark/connect/relations.proto
index deb35525728..8edd8911242 100644
--- a/connector/connect/src/main/protobuf/spark/connect/relations.proto
+++ b/connector/connect/src/main/protobuf/spark/connect/relations.proto
@@ -161,12 +161,7 @@ message Offset {
 message Aggregate {
   Relation input = 1;
   repeated Expression grouping_expressions = 2;
-  repeated AggregateFunction result_expressions = 3;
-
-  message AggregateFunction {
-string name = 1;
-repeated Expression arguments = 2;
-  }
+  repeated Expression result_expressions = 3;
 }
 
 // Relation of type [[Sort]].
diff --git 
a/connector/connect/src/main/scala/org/apache/spark/sql/connect/dsl/package.scala
 
b/connector/connect/src/main/scala/org/apache/spark/sql/connect/dsl/package.scala
index e2030c9ad31..c40a9eed753 100644
--- 
a/connector/connect/src/main/scala/org/apache/spark/sql/connect/dsl/package.scala
+++ 
b/connector/connect/src/main/scala/org/apache/spark/sql/connect/dsl/package.scala
@@ -93,6 +93,13 @@ package object dsl {
   .build()
 }
 
+def proto_min(e: Expression): Expression =
+  Expression
+.newBuilder()
+.setUnresolvedFunction(
+  
Expression.UnresolvedFunction.newBuilder().addParts("min").addArguments(e))
+.build()
+
 /**
  * Create an unresolved function from name parts.
  *
@@ -383,8 +390,9 @@ package object dsl {
 for (groupingExpr <- groupingExprs) {
   agg.addGroupingExpressions(groupingExpr)
 }
-// TODO: support aggregateExprs, which is blocked by supporting any 
builtin function
-// resolution only by name in the analyzer.
+for (aggregateExpr <- aggregateExprs) {
+  agg.addResultExpressions(aggregateExpr)
+}
 Relation.newBuilder().setAggregate(agg.build()).build()
   }
 
diff --git 
a/connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala
 
b/connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala
index f5c6980290f..d2b474711ab 100644
--- 
a/connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala
+++ 
b/connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala
@@ -25,7 +25,7 @@ import org.apache.spark.sql.SparkSession
 import org.apache.spark.sql.catalyst.AliasIdentifier
 import org.apache.spark.sql.catalyst.analysis.{UnresolvedAlias, 
UnresolvedAttribute, UnresolvedFunction, UnresolvedRelation, UnresolvedStar}
 import org.apache.spark.sql.catalyst.expressions
-import org.apache.spark.sql.catalyst.expressions.{Alias, Attribute, 
AttributeReference, Expression}
+import org.apache.spark.sql.catalyst.expressions.{Alias, Attribute, 
AttributeReference, Expression, NamedExpression}
 import org.apache.spark.sql.catalyst.optimizer.CombineUnions
 import org.apache.spark.sql.catalyst.parser.CatalystSqlParser
 import org.apache.spark.sql.catalyst.plans.{logical, FullOuter, Inner, 
JoinType, LeftAnti, LeftOuter,

[spark] branch master updated: [SPARK-41019][SQL] Provide a query context to `failAnalysis()`

2022-11-07 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new d503c47c9e3 [SPARK-41019][SQL] Provide a query context to 
`failAnalysis()`
d503c47c9e3 is described below

commit d503c47c9e3a4f1e815bae3b57feadd5568ca21a
Author: Max Gekk 
AuthorDate: Mon Nov 7 13:03:17 2022 +0300

[SPARK-41019][SQL] Provide a query context to `failAnalysis()`

### What changes were proposed in this pull request?
In the PR, I propose to invoke `AnalysisErrorAt.failAnalysis()` instead of 
`CheckAnalysis.failAnalysis()` because the first one captures the query context 
and passes it to `AnalysisException`.

### Why are the changes needed?
To provide additional info as a query context to users. This should improve 
user experience with Spark SQL.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
By running the modified test suites:
```
$ PYSPARK_PYTHON=python3 build/sbt "sql/testOnly 
org.apache.spark.sql.SQLQueryTestSuite"
```

Closes #38514 from MaxGekk/provide-context-failAnalysis.

Authored-by: Max Gekk 
Signed-off-by: Max Gekk 
---
 .../sql/catalyst/analysis/CheckAnalysis.scala  | 106 ++--
 .../resources/sql-tests/results/except-all.sql.out |  18 +-
 .../sql-tests/results/group-analytics.sql.out  |  18 +-
 .../sql-tests/results/group-by-filter.sql.out  |   9 +-
 .../resources/sql-tests/results/group-by.sql.out   |  63 ++-
 .../sql-tests/results/intersect-all.sql.out|  18 +-
 .../test/resources/sql-tests/results/limit.sql.out |  45 +-
 .../sql-tests/results/percentiles.sql.out  | 108 +++-
 .../test/resources/sql-tests/results/pivot.sql.out |   9 +-
 .../results/postgreSQL/aggregates_part3.sql.out|   9 +-
 .../sql-tests/results/postgreSQL/limit.sql.out |  18 +-
 .../results/postgreSQL/select_having.sql.out   |   9 +-
 .../results/postgreSQL/window_part3.sql.out|  18 +-
 .../negative-cases/invalid-correlation.sql.out |  18 +-
 .../native/widenSetOperationTypes.sql.out  | 630 ++---
 .../results/udaf/udaf-group-by-ordinal.sql.out |   9 +-
 .../sql-tests/results/udaf/udaf-group-by.sql.out   |   9 +-
 .../udf/postgreSQL/udf-aggregates_part3.sql.out|   9 +-
 .../udf/postgreSQL/udf-select_having.sql.out   |   9 +-
 .../sql-tests/results/udf/udf-except-all.sql.out   |  18 +-
 .../results/udf/udf-group-analytics.sql.out|  18 +-
 .../sql-tests/results/udf/udf-group-by.sql.out |  54 +-
 .../results/udf/udf-intersect-all.sql.out  |  18 +-
 .../sql-tests/results/udf/udf-pivot.sql.out|   9 +-
 24 files changed, 1064 insertions(+), 185 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
index 0b688dc5f7c..544bb3cc301 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
@@ -82,24 +82,24 @@ trait CheckAnalysis extends PredicateHelper with 
LookupCatalog {
 
   private def checkLimitLikeClause(name: String, limitExpr: Expression): Unit 
= {
 limitExpr match {
-  case e if !e.foldable => failAnalysis(
+  case e if !e.foldable => limitExpr.failAnalysis(
 errorClass = "_LEGACY_ERROR_TEMP_2400",
 messageParameters = Map(
   "name" -> name,
   "limitExpr" -> limitExpr.sql))
-  case e if e.dataType != IntegerType => failAnalysis(
+  case e if e.dataType != IntegerType => limitExpr.failAnalysis(
 errorClass = "_LEGACY_ERROR_TEMP_2401",
 messageParameters = Map(
   "name" -> name,
   "dataType" -> e.dataType.catalogString))
   case e =>
 e.eval() match {
-  case null => failAnalysis(
+  case null => limitExpr.failAnalysis(
 errorClass = "_LEGACY_ERROR_TEMP_2402",
 messageParameters = Map(
   "name" -> name,
   "limitExpr" -> limitExpr.sql))
-  case v: Int if v < 0 => failAnalysis(
+  case v: Int if v < 0 => limitExpr.failAnalysis(
 errorClass = "_LEGACY_ERROR_TEMP_2403",
 messageParameters = Map(
   "name" -> name,
@@ -189,12 +189,12 @@ trait CheckAnalysis extends PredicateHelper with 
LookupCatalog {
   case r @ ResolvedTable(_, _, table, _) => table match {
 case t: SupportsPartitionManagement =>
   if (t.partitionSchema.isEmpty) {
-failAnalysis(
+r.failAnalysis(
   errorClass = "_LEGACY_ERROR_TEMP_2404",
   messageParameters =

[spark] branch master updated: [SPARK-41021][SQL][TESTS] Test some subclasses of error class DATATYPE_MISMATCH

2022-11-07 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 02ae919a6ae [SPARK-41021][SQL][TESTS] Test some subclasses of error 
class DATATYPE_MISMATCH
02ae919a6ae is described below

commit 02ae919a6ae4f37ef859188db629ab564d6d42f5
Author: panbingkun 
AuthorDate: Mon Nov 7 12:19:21 2022 +0300

[SPARK-41021][SQL][TESTS] Test some subclasses of error class 
DATATYPE_MISMATCH

### What changes were proposed in this pull request?
This PR aims to add new UT for some error classes, include:
1. DATATYPE_MISMATCH.BINARY_ARRAY_DIFF_TYPES
2. DATATYPE_MISMATCH.CANNOT_CONVERT_TO_JSON
3. DATATYPE_MISMATCH.FRAME_LESS_OFFSET_WITHOUT_FOLDABLE
4. DATATYPE_MISMATCH.MAP_FROM_ENTRIES_WRONG_TYPE
5. DATATYPE_MISMATCH.NON_STRING_TYPE
6. DATATYPE_MISMATCH.NULL_TYPE
7. DATATYPE_MISMATCH.SPECIFIED_WINDOW_FRAME_DIFF_TYPES
8. DATATYPE_MISMATCH.SPECIFIED_WINDOW_FRAME_UNACCEPTED_TYPE (already exists)

https://github.com/apache/spark/blob/7009ef0510dae444c72e7513357e681b08379603/sql/core/src/test/resources/sql-tests/results/window.sql.out#L106
10. DATATYPE_MISMATCH.SPECIFIED_WINDOW_FRAME_WITHOUT_FOLDABLE
11. DATATYPE_MISMATCH.UNSPECIFIED_FRAME

### Why are the changes needed?
The changes improve the error framework.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA.

Closes #38520 from panbingkun/DATATYPE_MISMATCH_test.

Authored-by: panbingkun 
Signed-off-by: Max Gekk 
---
 .../analysis/ExpressionTypeCheckingSuite.scala |  48 
 .../expressions/CollectionExpressionsSuite.scala   |  24 
 .../expressions/JsonExpressionsSuite.scala |  23 
 .../apache/spark/sql/DataFrameFunctionsSuite.scala | 137 ++---
 .../spark/sql/DataFrameWindowFramesSuite.scala |  45 ++-
 .../spark/sql/DataFrameWindowFunctionsSuite.scala  |  44 ++-
 .../org/apache/spark/sql/JsonFunctionsSuite.scala  |  32 +
 7 files changed, 333 insertions(+), 20 deletions(-)

diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/ExpressionTypeCheckingSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/ExpressionTypeCheckingSuite.scala
index f656131c8e7..b192f12d569 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/ExpressionTypeCheckingSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/ExpressionTypeCheckingSuite.scala
@@ -765,4 +765,52 @@ class ExpressionTypeCheckingSuite extends SparkFunSuite 
with SQLHelper with Quer
   )
 )
   }
+
+  test("check types for Lag") {
+val lag = Lag(Literal(1), NonFoldableLiteral(10), Literal(null), true)
+assert(lag.checkInputDataTypes() ==
+  DataTypeMismatch(
+errorSubClass = "FRAME_LESS_OFFSET_WITHOUT_FOLDABLE",
+messageParameters = Map("offset" -> "\"(- nonfoldableliteral())\"")
+  ))
+  }
+
+  test("check types for SpecifiedWindowFrame") {
+val swf1 = SpecifiedWindowFrame(RangeFrame, Literal(10.0), 
Literal(2147483648L))
+assert(swf1.checkInputDataTypes() ==
+  DataTypeMismatch(
+errorSubClass = "SPECIFIED_WINDOW_FRAME_DIFF_TYPES",
+messageParameters = Map(
+  "lower" -> "\"10.0\"",
+  "upper" -> "\"2147483648\"",
+  "lowerType" -> "\"DOUBLE\"",
+  "upperType" -> "\"BIGINT\""
+)
+  )
+)
+
+val swf2 = SpecifiedWindowFrame(RangeFrame, NonFoldableLiteral(10.0), 
Literal(2147483648L))
+assert(swf2.checkInputDataTypes() ==
+  DataTypeMismatch(
+errorSubClass = "SPECIFIED_WINDOW_FRAME_WITHOUT_FOLDABLE",
+messageParameters = Map(
+  "location" -> "lower",
+  "expression" -> "\"nonfoldableliteral()\""
+)
+  )
+)
+  }
+
+  test("check types for WindowSpecDefinition") {
+val wsd = WindowSpecDefinition(
+  UnresolvedAttribute("a") :: Nil,
+  SortOrder(UnresolvedAttribute("b"), Ascending) :: Nil,
+  UnspecifiedFrame)
+assert(wsd.checkInputDataTypes() ==
+  DataTypeMismatch(
+errorSubClass = "UNSPECIFIED_FRAME",
+messageParameters = Map.empty
+  )
+)
+  }
 }
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala
index 9839b784e60..676fb615e48 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala
@@ -28,6 +28,7 @@ import org.apache.spark.SparkFunSuite
 import

[spark] branch master updated (d046f0bef6e -> 93d13afe7f0)

2022-11-07 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from d046f0bef6e [MINOR][SQL] Remove unused an error class and query error 
methods
 add 93d13afe7f0 [SPARK-41020][SQL] Rename the error class 
`_LEGACY_ERROR_TEMP_1019` to `STAR_GROUP_BY_POS`

No new revisions were added by this update.

Summary of changes:
 core/src/main/resources/error/error-classes.json   | 10 +-
 .../org/apache/spark/sql/errors/QueryCompilationErrors.scala   |  2 +-
 .../test/resources/sql-tests/results/group-by-ordinal.sql.out  |  2 +-
 3 files changed, 7 insertions(+), 7 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (f692771444d -> 9cd55052cce)

[spark] branch master updated (e9503c84c4d -> f692771444d)

[spark] branch branch-3.3 updated: [SPARK-41031][BUILD] Upgrade `xz` to 1.9 for `avro` 1.11.1

[spark] branch master updated: [SPARK-41031][BUILD] Upgrade `xz` to 1.9 for `avro` 1.11.1

[spark] branch master updated: [SPARK-41007][SQL] Add missing serializer for java.math.BigInteger

[spark] branch master updated: [SPARK-41026][CONNECT] Support Repartition in Connect Proto

[spark] branch master updated: [SPARK-41002][CONNECT][PYTHON] Compatible `take`, `head` and `first` API in Python client

[spark] branch master updated (8c7a8466e8e -> 4bbdca60049)

[spark] branch dependabot/maven/org.apache.ivy-ivy-2.5.1 created (now 22051105d51)

[spark] branch master updated: [MINOR][SQL] Replace `new SparkException(errorClass = "INTERNAL_ERROR", ...)` by `SparkException.internalError`

[spark] branch master updated (57d49255676 -> eb6d1980fa8)

[spark] branch master updated: [SPARK-40948][SQL] Introduce new error class: PATH_NOT_FOUND

[spark] branch master updated: [SPARK-40875][CONNECT] Improve aggregate in Connect DSL

[spark] branch master updated: [SPARK-41019][SQL] Provide a query context to `failAnalysis()`

[spark] branch master updated: [SPARK-41021][SQL][TESTS] Test some subclasses of error class DATATYPE_MISMATCH

[spark] branch master updated (d046f0bef6e -> 93d13afe7f0)

16 matches

Site Navigation

Mail list logo

Footer information